Performance Metrics Evaluation Towards The Effectiveness of Data Anonymization

2023 IEEE 8th International Conference for Convergence in Technology (I2CT) Pub Date : 2023-04-07 DOI:10.1109/I2CT57861.2023.10126310

A. Raj, Rio G. L. D'Souza

{"title":"Performance Metrics Evaluation Towards The Effectiveness of Data Anonymization","authors":"A. Raj, Rio G. L. D'Souza","doi":"10.1109/I2CT57861.2023.10126310","DOIUrl":null,"url":null,"abstract":"A supplementary method for ensuring that private data is inaccessible to outside parties is data anonymization. Anonymization might affect the outcomes of data mining procedures since it may make it more difficult for commonly used algorithms to analyze the data. This practical experience report compares the performance impact of current data anonymization algorithms to the suggested k-anonymization methods utilizing both original and anonymized data in order to assess the correctness and execution time. Through the use of kanonymization, l-diversity, t-closeness, and differential privacy techniques, a sample of genuine data produced by a healthcare facility was made anonymous. Contrary to predictions, the Hadoop framework was able to handle anonymization approaches, improving accuracy and performance while speeding up execution. These findings show that data anonymization techniques, when properly implemented through Hadoop ecosystems, can help to increase the effectiveness of data anonymization. Furthermore, the suggested method can produce the data anonymization with the necessary utility and protection trade-offs and with a performance scalable to large datasets.","PeriodicalId":150346,"journal":{"name":"2023 IEEE 8th International Conference for Convergence in Technology (I2CT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 8th International Conference for Convergence in Technology (I2CT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/I2CT57861.2023.10126310","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

A supplementary method for ensuring that private data is inaccessible to outside parties is data anonymization. Anonymization might affect the outcomes of data mining procedures since it may make it more difficult for commonly used algorithms to analyze the data. This practical experience report compares the performance impact of current data anonymization algorithms to the suggested k-anonymization methods utilizing both original and anonymized data in order to assess the correctness and execution time. Through the use of kanonymization, l-diversity, t-closeness, and differential privacy techniques, a sample of genuine data produced by a healthcare facility was made anonymous. Contrary to predictions, the Hadoop framework was able to handle anonymization approaches, improving accuracy and performance while speeding up execution. These findings show that data anonymization techniques, when properly implemented through Hadoop ecosystems, can help to increase the effectiveness of data anonymization. Furthermore, the suggested method can produce the data anonymization with the necessary utility and protection trade-offs and with a performance scalable to large datasets.

查看原文本刊更多论文

数据匿名化有效性的性能指标评价

确保私有数据不被外界访问的补充方法是数据匿名化。匿名化可能会影响数据挖掘过程的结果，因为它可能使常用算法更难分析数据。本实践经验报告比较了当前数据匿名化算法与使用原始数据和匿名数据的建议k-匿名化方法的性能影响，以评估其正确性和执行时间。通过使用匿名化、l-多样性、t-接近和差异隐私技术，医疗机构生成的真实数据样本是匿名的。与预测相反，Hadoop框架能够处理匿名化方法，提高准确性和性能，同时加快执行速度。这些发现表明，数据匿名化技术，当通过Hadoop生态系统适当实施时，可以帮助提高数据匿名化的有效性。此外，所建议的方法可以产生具有必要的实用程序和保护权衡的数据匿名化，并且具有可扩展到大型数据集的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 IEEE 8th International Conference for Convergence in Technology (I2CT)

自引率

0.00%

发文量