{"title":"Risk of Re-identification from Payment Card Histories in Multiple Domains","authors":"Satoshi Ito, Reo Harada, Hiroaki Kikuchi","doi":"10.1109/AINA.2018.00137","DOIUrl":null,"url":null,"abstract":"Anonymization is the process of modifying a data set to prevent the identification of individual people from the data. However, most studies consider only the anonymization of data from a single domain. No study has been made on the risk of re-identification from combined data sets involving more than one domain. This paper proposes an evaluation of the risk of re-identification from payment card histories in multiple domains. First, we model the correlation between two histories from different usage domains in terms of information entropy and use mutual information to quantify the risk of identification from the data. Second, we describe an experiment to evaluate the risk in payment card data. The results validated the proposed method for real payment card data from 31 subjects. Metrics for the privacy and utility of 47 anonymized data items were evaluated. Overall, we found that there was a correlation between the histories of transportation and item purchases stored in the payment card data and established that most (44 of 47) of the anonymized data enabled correct identification with more than 45% accuracy for any privacy metric. This indicates that the risk of re-identification from payment card data is very high.","PeriodicalId":239730,"journal":{"name":"2018 IEEE 32nd International Conference on Advanced Information Networking and Applications (AINA)","volume":"106 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 32nd International Conference on Advanced Information Networking and Applications (AINA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AINA.2018.00137","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Anonymization is the process of modifying a data set to prevent the identification of individual people from the data. However, most studies consider only the anonymization of data from a single domain. No study has been made on the risk of re-identification from combined data sets involving more than one domain. This paper proposes an evaluation of the risk of re-identification from payment card histories in multiple domains. First, we model the correlation between two histories from different usage domains in terms of information entropy and use mutual information to quantify the risk of identification from the data. Second, we describe an experiment to evaluate the risk in payment card data. The results validated the proposed method for real payment card data from 31 subjects. Metrics for the privacy and utility of 47 anonymized data items were evaluated. Overall, we found that there was a correlation between the histories of transportation and item purchases stored in the payment card data and established that most (44 of 47) of the anonymized data enabled correct identification with more than 45% accuracy for any privacy metric. This indicates that the risk of re-identification from payment card data is very high.