{"title":"Performance Metrics Evaluation Towards The Effectiveness of Data Anonymization","authors":"A. Raj, Rio G. L. D'Souza","doi":"10.1109/I2CT57861.2023.10126310","DOIUrl":null,"url":null,"abstract":"A supplementary method for ensuring that private data is inaccessible to outside parties is data anonymization. Anonymization might affect the outcomes of data mining procedures since it may make it more difficult for commonly used algorithms to analyze the data. This practical experience report compares the performance impact of current data anonymization algorithms to the suggested k-anonymization methods utilizing both original and anonymized data in order to assess the correctness and execution time. Through the use of kanonymization, l-diversity, t-closeness, and differential privacy techniques, a sample of genuine data produced by a healthcare facility was made anonymous. Contrary to predictions, the Hadoop framework was able to handle anonymization approaches, improving accuracy and performance while speeding up execution. These findings show that data anonymization techniques, when properly implemented through Hadoop ecosystems, can help to increase the effectiveness of data anonymization. Furthermore, the suggested method can produce the data anonymization with the necessary utility and protection trade-offs and with a performance scalable to large datasets.","PeriodicalId":150346,"journal":{"name":"2023 IEEE 8th International Conference for Convergence in Technology (I2CT)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 8th International Conference for Convergence in Technology (I2CT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/I2CT57861.2023.10126310","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
A supplementary method for ensuring that private data is inaccessible to outside parties is data anonymization. Anonymization might affect the outcomes of data mining procedures since it may make it more difficult for commonly used algorithms to analyze the data. This practical experience report compares the performance impact of current data anonymization algorithms to the suggested k-anonymization methods utilizing both original and anonymized data in order to assess the correctness and execution time. Through the use of kanonymization, l-diversity, t-closeness, and differential privacy techniques, a sample of genuine data produced by a healthcare facility was made anonymous. Contrary to predictions, the Hadoop framework was able to handle anonymization approaches, improving accuracy and performance while speeding up execution. These findings show that data anonymization techniques, when properly implemented through Hadoop ecosystems, can help to increase the effectiveness of data anonymization. Furthermore, the suggested method can produce the data anonymization with the necessary utility and protection trade-offs and with a performance scalable to large datasets.