{"title":"Fast text anonymization using k-anonyminity","authors":"Wakana Maeda, Yumiko Suzuki, Satoshi Nakamura","doi":"10.1145/3011141.3011217","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a method for anonymizing unstructured texts using a quasi-identifier list. In our method, the system redacts from some parts of quasi-identifiers in the texts to the alternate characters such as \"*\", in order to prevent re-identification of information which should be kept in secrecy. However, this method has a room for an improvement for keeping the information on the original text as is. If the system anonymizes the texts and keeps the original texts as much as possible, the accuracy of the outputs by data mining techniques for the anonymized texts should be useful. Our method anonymizes quasi-identifiers to remain substrings which do not contribute to re-identification, in order to keep the information on the original texts as is. Concretely, the system identifies the substrings which should be redacted to satisfy the following two conditions: 1) Any terms in the quasi-identifier list satisfies k-anonymity by redacting characters. 2) The number of redacted characters is minimized. From the quasi-identifier list, we construct the anonymization dictionary which records the two number in advance; the number of quasi-identifiers which are anonymized in the same way, and a number of redacted characters of the anonymized quasi-identifier. However, this construction step is time consuming, because the system needs to retrieve a huge number of patterns. To solve this problem, we propose an acceleration method for constructing the anonymization dictionary using several heuristics and the set theory.","PeriodicalId":247823,"journal":{"name":"Proceedings of the 18th International Conference on Information Integration and Web-based Applications and Services","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 18th International Conference on Information Integration and Web-based Applications and Services","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3011141.3011217","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
In this paper, we propose a method for anonymizing unstructured texts using a quasi-identifier list. In our method, the system redacts from some parts of quasi-identifiers in the texts to the alternate characters such as "*", in order to prevent re-identification of information which should be kept in secrecy. However, this method has a room for an improvement for keeping the information on the original text as is. If the system anonymizes the texts and keeps the original texts as much as possible, the accuracy of the outputs by data mining techniques for the anonymized texts should be useful. Our method anonymizes quasi-identifiers to remain substrings which do not contribute to re-identification, in order to keep the information on the original texts as is. Concretely, the system identifies the substrings which should be redacted to satisfy the following two conditions: 1) Any terms in the quasi-identifier list satisfies k-anonymity by redacting characters. 2) The number of redacted characters is minimized. From the quasi-identifier list, we construct the anonymization dictionary which records the two number in advance; the number of quasi-identifiers which are anonymized in the same way, and a number of redacted characters of the anonymized quasi-identifier. However, this construction step is time consuming, because the system needs to retrieve a huge number of patterns. To solve this problem, we propose an acceleration method for constructing the anonymization dictionary using several heuristics and the set theory.