An Indexed Bottom-up Approach for Publishing Anonymized Data

A. Hoang, Minh Tran, A. Duong, I. Echizen
{"title":"An Indexed Bottom-up Approach for Publishing Anonymized Data","authors":"A. Hoang, Minh Tran, A. Duong, I. Echizen","doi":"10.1109/CIS.2012.148","DOIUrl":null,"url":null,"abstract":"Sharing information is one of the most important parts of social activities. However, sharing information can leak users' information. Removing all direct identifiers is not enough. Sweeney proposed an approach that applying k-anonymity to protect users' identities from linking attack. Sweeney`s algorithm finds out the optimal anonymized dataset through minimal distortion metric. Other authors proposed other optimal algorithms but their proposals are still impractical due to their high computational cost. Another approach is to release the minimal anonymized dataset by applying some heuristics. Wang and Fung proposed Bottom-up Generalization and Top-down Specialization (TDS) to publish a minimal anonymized dataset with information loss metric, whose performance is more efficient. However, these algorithms still have some limitations. In this paper, we propose an algorithm to publish anonymized datasets through bottom-up generalization approach and information loss data metric. Our algorithm can save time by storing statistical information for later usage. The experimental results is performanced on Adult dataset, which is used in all former algorithms. Experimental results show that our algorithm can process 949,662 records dataset in 42.219s. Classification error on anonymized data, which is created by our algorithm, is lower than Wang's algorithm 3.8%.","PeriodicalId":294394,"journal":{"name":"2012 Eighth International Conference on Computational Intelligence and Security","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 Eighth International Conference on Computational Intelligence and Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIS.2012.148","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Sharing information is one of the most important parts of social activities. However, sharing information can leak users' information. Removing all direct identifiers is not enough. Sweeney proposed an approach that applying k-anonymity to protect users' identities from linking attack. Sweeney`s algorithm finds out the optimal anonymized dataset through minimal distortion metric. Other authors proposed other optimal algorithms but their proposals are still impractical due to their high computational cost. Another approach is to release the minimal anonymized dataset by applying some heuristics. Wang and Fung proposed Bottom-up Generalization and Top-down Specialization (TDS) to publish a minimal anonymized dataset with information loss metric, whose performance is more efficient. However, these algorithms still have some limitations. In this paper, we propose an algorithm to publish anonymized datasets through bottom-up generalization approach and information loss data metric. Our algorithm can save time by storing statistical information for later usage. The experimental results is performanced on Adult dataset, which is used in all former algorithms. Experimental results show that our algorithm can process 949,662 records dataset in 42.219s. Classification error on anonymized data, which is created by our algorithm, is lower than Wang's algorithm 3.8%.
一种索引自底向上的匿名数据发布方法
分享信息是社会活动中最重要的部分之一。然而,共享信息可能会泄露用户的信息。删除所有直接标识符是不够的。Sweeney提出了一种利用k-匿名来保护用户身份免受链接攻击的方法。Sweeney算法通过最小失真度量找出最优的匿名数据集。其他作者提出了其他最优算法,但由于计算成本高,这些算法仍然不切实际。另一种方法是通过应用一些启发式方法来释放最小的匿名数据集。Wang和Fung提出了自底向上泛化和自顶向下专门化(TDS)来发布具有信息丢失度量的最小匿名数据集,其性能更高效。然而,这些算法仍然有一些局限性。本文提出了一种基于自底向上泛化和信息丢失数据度量的匿名数据集发布算法。我们的算法可以通过存储统计信息以供以后使用来节省时间。实验结果是在成人数据集上进行的,成人数据集是所有算法使用的数据集。实验结果表明,该算法可在42.219s内处理949,662条记录数据集。我们的算法在匿名数据上产生的分类错误率比Wang的算法低3.8%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信