Data Anonymization for Big Crowdsourcing Data

Xiaofeng Deng, Fan Zhang, Hai Jin
{"title":"Data Anonymization for Big Crowdsourcing Data","authors":"Xiaofeng Deng, Fan Zhang, Hai Jin","doi":"10.1109/infocomwkshps47286.2019.9093748","DOIUrl":null,"url":null,"abstract":"In traditional database systems, data anonymization has been extensively studied, it provides an effective solution for data privacy preservation, and multidimensional anonymization scheme among them is widely used. However, without delicate parameter settings, these technologies may cause uncontrollable information loss and decrease the accuracy of data analytic tasks. Furthermore, crowdsourcing data is usually huge in amount and must be distributed stored in clouds, which makes the conventional data anonymization technologies not applicable. In this paper, we propose a framework that uses MapReduce to anonymize large-scale data before disseminating them to human workers. In order to guarantee the number and distribution of data records to be similar in all nodes, our framework first redistributes the original data to all participating nodes. Then a heuristic two-phase anonymization schema, which can be seamlessly integrated into the framework, is proposed. Experimental results show that with the same objective of privacy, our approach is scalable for large-scale data and can improve the average accuracy of human worker’s analytic tasks.","PeriodicalId":321862,"journal":{"name":"IEEE INFOCOM 2019 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)","volume":"36 1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE INFOCOM 2019 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/infocomwkshps47286.2019.9093748","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In traditional database systems, data anonymization has been extensively studied, it provides an effective solution for data privacy preservation, and multidimensional anonymization scheme among them is widely used. However, without delicate parameter settings, these technologies may cause uncontrollable information loss and decrease the accuracy of data analytic tasks. Furthermore, crowdsourcing data is usually huge in amount and must be distributed stored in clouds, which makes the conventional data anonymization technologies not applicable. In this paper, we propose a framework that uses MapReduce to anonymize large-scale data before disseminating them to human workers. In order to guarantee the number and distribution of data records to be similar in all nodes, our framework first redistributes the original data to all participating nodes. Then a heuristic two-phase anonymization schema, which can be seamlessly integrated into the framework, is proposed. Experimental results show that with the same objective of privacy, our approach is scalable for large-scale data and can improve the average accuracy of human worker’s analytic tasks.
大众包数据的数据匿名化
在传统数据库系统中,数据匿名化得到了广泛的研究,它为数据隐私保护提供了一种有效的解决方案,其中多维匿名化方案得到了广泛的应用。但是,如果没有精细的参数设置,这些技术可能会造成不可控制的信息丢失,降低数据分析任务的准确性。此外,众包数据通常数量巨大,必须分散存储在云上,这使得传统的数据匿名化技术不适用。在本文中,我们提出了一个框架,该框架使用MapReduce在将大规模数据传播给人类工作人员之前对其进行匿名化。为了保证所有节点中数据记录的数量和分布相似,我们的框架首先将原始数据重新分布到所有参与节点。然后提出了一种启发式的两阶段匿名模式,该模式可以无缝集成到框架中。实验结果表明,在相同的隐私目标下,我们的方法对大规模数据具有可扩展性,可以提高人类工作人员分析任务的平均准确率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信