利用k-匿名平衡数据效用与信息丢失的数据隐私保护

Thamer Khalil Esmeel, M. Hasan, M. Kabir, Ahmad Firdaus
{"title":"利用k-匿名平衡数据效用与信息丢失的数据隐私保护","authors":"Thamer Khalil Esmeel, M. Hasan, M. Kabir, Ahmad Firdaus","doi":"10.1109/ICSPC50992.2020.9305776","DOIUrl":null,"url":null,"abstract":"Data privacy has been an important area of research in recent years. Dataset often consists of sensitive data fields, exposure of which may jeopardize interests of individuals associated with the data. In order to resolve this issue, privacy techniques can be used to hinder the identification of a person through anonymization of the sensitive data in the dataset to protect sensitive information, while the anonymized dataset can be used by the third parties for analysis purposes without obstruction. In this research, we investigated a privacy technique, k-anonymity for different values of $\\pmb{k}$ on different number $\\pmb{c}$ of columns of the dataset. Next, the information loss due to k-anonymity is computed. The anonymized files go through the classification process by some machine-learning algorithms i.e., Naive Bayes, J48 and neural network in order to check a balance between data anonymity and data utility. Based on the classification accuracy, the optimal values of $\\pmb{k}$ and $\\pmb{c}$ are obtained, and thus, the optimal $\\pmb{k}$ and $\\pmb{c}$ can be used for k-anonymity algorithm to anonymize optimal number of columns of the dataset.","PeriodicalId":273439,"journal":{"name":"2020 IEEE 8th Conference on Systems, Process and Control (ICSPC)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Balancing Data Utility versus Information Loss in Data-Privacy Protection using k-Anonymity\",\"authors\":\"Thamer Khalil Esmeel, M. Hasan, M. Kabir, Ahmad Firdaus\",\"doi\":\"10.1109/ICSPC50992.2020.9305776\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data privacy has been an important area of research in recent years. Dataset often consists of sensitive data fields, exposure of which may jeopardize interests of individuals associated with the data. In order to resolve this issue, privacy techniques can be used to hinder the identification of a person through anonymization of the sensitive data in the dataset to protect sensitive information, while the anonymized dataset can be used by the third parties for analysis purposes without obstruction. In this research, we investigated a privacy technique, k-anonymity for different values of $\\\\pmb{k}$ on different number $\\\\pmb{c}$ of columns of the dataset. Next, the information loss due to k-anonymity is computed. The anonymized files go through the classification process by some machine-learning algorithms i.e., Naive Bayes, J48 and neural network in order to check a balance between data anonymity and data utility. Based on the classification accuracy, the optimal values of $\\\\pmb{k}$ and $\\\\pmb{c}$ are obtained, and thus, the optimal $\\\\pmb{k}$ and $\\\\pmb{c}$ can be used for k-anonymity algorithm to anonymize optimal number of columns of the dataset.\",\"PeriodicalId\":273439,\"journal\":{\"name\":\"2020 IEEE 8th Conference on Systems, Process and Control (ICSPC)\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE 8th Conference on Systems, Process and Control (ICSPC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSPC50992.2020.9305776\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 8th Conference on Systems, Process and Control (ICSPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSPC50992.2020.9305776","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

近年来,数据隐私一直是一个重要的研究领域。数据集通常由敏感数据字段组成,暴露这些字段可能会危及与数据相关的个人的利益。为了解决这一问题,可以使用隐私技术通过对数据集中的敏感数据进行匿名化来阻碍个人身份的识别,以保护敏感信息,而匿名化的数据集可以被第三方不受阻碍地用于分析目的。在这项研究中,我们研究了一种隐私技术,k-匿名对数据集的不同列数$\pmb{k}$的不同值。其次,计算k-匿名导致的信息损失。匿名文件通过一些机器学习算法(如朴素贝叶斯,J48和神经网络)进行分类过程,以检查数据匿名性和数据实用性之间的平衡。基于分类精度,得到$\pmb{k}$和$\pmb{c}$的最优值,从而可将最优的$\pmb{k}$和$\pmb{c}$用于k-匿名算法对数据集的最优列数进行匿名化。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Balancing Data Utility versus Information Loss in Data-Privacy Protection using k-Anonymity
Data privacy has been an important area of research in recent years. Dataset often consists of sensitive data fields, exposure of which may jeopardize interests of individuals associated with the data. In order to resolve this issue, privacy techniques can be used to hinder the identification of a person through anonymization of the sensitive data in the dataset to protect sensitive information, while the anonymized dataset can be used by the third parties for analysis purposes without obstruction. In this research, we investigated a privacy technique, k-anonymity for different values of $\pmb{k}$ on different number $\pmb{c}$ of columns of the dataset. Next, the information loss due to k-anonymity is computed. The anonymized files go through the classification process by some machine-learning algorithms i.e., Naive Bayes, J48 and neural network in order to check a balance between data anonymity and data utility. Based on the classification accuracy, the optimal values of $\pmb{k}$ and $\pmb{c}$ are obtained, and thus, the optimal $\pmb{k}$ and $\pmb{c}$ can be used for k-anonymity algorithm to anonymize optimal number of columns of the dataset.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信