使用数据匿名化增强自动检测准标识符的隐私性

Web Intell. Pub Date : 2023-03-22 DOI:10.3233/web-221823

S. Devi, R. Indhumathi

{"title":"使用数据匿名化增强自动检测准标识符的隐私性","authors":"S. Devi, R. Indhumathi","doi":"10.3233/web-221823","DOIUrl":null,"url":null,"abstract":"The fast advancement of information technology has resulted in more efficient information storage and retrieval. As a result, most organizations, businesses, and governments are releasing and exchanging a large amount of micro data among themselves for commercial or research purposes. However, incorrect data exchange will result in privacy breaches. Many methods and strategies have been developed to address privacy breaches, and Anonymization is one of them that many companies use. In order to perform anonymization, identification of the Quasi Identifier (QI) is significant. Hence this paper proposes a method called Quasi Identification Based on Tree (QIBT) for automatic QI identification. The proposed method derives the QI, based on the relationship between the numbers of distinct values assumed by the set of attributes. So, it uses the tree data structure to derive the unique and infrequent attribute values from the entire dataset with less computational cost. The proposed method consists of four phases: (i) Unique attribute value computation (ii) Tree construction and (iii) Computation of quasi-identifier from the tree (iv) Applying Anonymization Technique to the identified QI. Attributes with high risk of disclosure are identified using our proposed algorithm. Synthetic data are created exclusively for the detected QI using a partial synthetic data generating technique to improve usefulness. The suggested method’s efficiency is tested with a subset of the UCI machine learning dataset and produces superior results when compared to other current approaches.","PeriodicalId":245783,"journal":{"name":"Web Intell.","volume":"457 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancing privacy for automatically detected quasi identifier using data anonymization\",\"authors\":\"S. Devi, R. Indhumathi\",\"doi\":\"10.3233/web-221823\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The fast advancement of information technology has resulted in more efficient information storage and retrieval. As a result, most organizations, businesses, and governments are releasing and exchanging a large amount of micro data among themselves for commercial or research purposes. However, incorrect data exchange will result in privacy breaches. Many methods and strategies have been developed to address privacy breaches, and Anonymization is one of them that many companies use. In order to perform anonymization, identification of the Quasi Identifier (QI) is significant. Hence this paper proposes a method called Quasi Identification Based on Tree (QIBT) for automatic QI identification. The proposed method derives the QI, based on the relationship between the numbers of distinct values assumed by the set of attributes. So, it uses the tree data structure to derive the unique and infrequent attribute values from the entire dataset with less computational cost. The proposed method consists of four phases: (i) Unique attribute value computation (ii) Tree construction and (iii) Computation of quasi-identifier from the tree (iv) Applying Anonymization Technique to the identified QI. Attributes with high risk of disclosure are identified using our proposed algorithm. Synthetic data are created exclusively for the detected QI using a partial synthetic data generating technique to improve usefulness. The suggested method’s efficiency is tested with a subset of the UCI machine learning dataset and produces superior results when compared to other current approaches.\",\"PeriodicalId\":245783,\"journal\":{\"name\":\"Web Intell.\",\"volume\":\"457 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Web Intell.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3233/web-221823\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Web Intell.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/web-221823","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

信息技术的飞速发展使得信息的存储和检索更加高效。因此，大多数组织、企业和政府都在发布和交换大量的微数据，用于商业或研究目的。然而，不正确的数据交换将导致隐私泄露。已经开发了许多方法和策略来解决隐私泄露问题，匿名化是许多公司使用的方法之一。为了实现匿名化，准标识符(QI)的识别是非常重要的。为此，本文提出了一种基于树的准识别方法(QIBT)来实现QI的自动识别。该方法基于属性集所假定的不同值的数量之间的关系来派生QI。因此，它使用树形数据结构从整个数据集中派生出唯一的和不频繁的属性值，计算成本更低。提出的方法包括四个阶段:(i)唯一属性值计算(ii)树的构造和(iii)从树中计算准标识符(iv)将匿名化技术应用于识别的QI。使用我们提出的算法识别具有高泄露风险的属性。使用部分合成数据生成技术专门为检测到的QI创建合成数据，以提高有用性。用UCI机器学习数据集的一个子集测试了建议方法的效率，与其他当前方法相比，产生了更好的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Enhancing privacy for automatically detected quasi identifier using data anonymization

The fast advancement of information technology has resulted in more efficient information storage and retrieval. As a result, most organizations, businesses, and governments are releasing and exchanging a large amount of micro data among themselves for commercial or research purposes. However, incorrect data exchange will result in privacy breaches. Many methods and strategies have been developed to address privacy breaches, and Anonymization is one of them that many companies use. In order to perform anonymization, identification of the Quasi Identifier (QI) is significant. Hence this paper proposes a method called Quasi Identification Based on Tree (QIBT) for automatic QI identification. The proposed method derives the QI, based on the relationship between the numbers of distinct values assumed by the set of attributes. So, it uses the tree data structure to derive the unique and infrequent attribute values from the entire dataset with less computational cost. The proposed method consists of four phases: (i) Unique attribute value computation (ii) Tree construction and (iii) Computation of quasi-identifier from the tree (iv) Applying Anonymization Technique to the identified QI. Attributes with high risk of disclosure are identified using our proposed algorithm. Synthetic data are created exclusively for the detected QI using a partial synthetic data generating technique to improve usefulness. The suggested method’s efficiency is tested with a subset of the UCI machine learning dataset and produces superior results when compared to other current approaches.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Web Intell.

自引率

0.00%

发文量