大众包数据的数据匿名化

IEEE INFOCOM 2019 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS) Pub Date : 2019-04-29 DOI:10.1109/infocomwkshps47286.2019.9093748

Xiaofeng Deng, Fan Zhang, Hai Jin

{"title":"大众包数据的数据匿名化","authors":"Xiaofeng Deng, Fan Zhang, Hai Jin","doi":"10.1109/infocomwkshps47286.2019.9093748","DOIUrl":null,"url":null,"abstract":"In traditional database systems, data anonymization has been extensively studied, it provides an effective solution for data privacy preservation, and multidimensional anonymization scheme among them is widely used. However, without delicate parameter settings, these technologies may cause uncontrollable information loss and decrease the accuracy of data analytic tasks. Furthermore, crowdsourcing data is usually huge in amount and must be distributed stored in clouds, which makes the conventional data anonymization technologies not applicable. In this paper, we propose a framework that uses MapReduce to anonymize large-scale data before disseminating them to human workers. In order to guarantee the number and distribution of data records to be similar in all nodes, our framework first redistributes the original data to all participating nodes. Then a heuristic two-phase anonymization schema, which can be seamlessly integrated into the framework, is proposed. Experimental results show that with the same objective of privacy, our approach is scalable for large-scale data and can improve the average accuracy of human worker’s analytic tasks.","PeriodicalId":321862,"journal":{"name":"IEEE INFOCOM 2019 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)","volume":"36 1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Data Anonymization for Big Crowdsourcing Data\",\"authors\":\"Xiaofeng Deng, Fan Zhang, Hai Jin\",\"doi\":\"10.1109/infocomwkshps47286.2019.9093748\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In traditional database systems, data anonymization has been extensively studied, it provides an effective solution for data privacy preservation, and multidimensional anonymization scheme among them is widely used. However, without delicate parameter settings, these technologies may cause uncontrollable information loss and decrease the accuracy of data analytic tasks. Furthermore, crowdsourcing data is usually huge in amount and must be distributed stored in clouds, which makes the conventional data anonymization technologies not applicable. In this paper, we propose a framework that uses MapReduce to anonymize large-scale data before disseminating them to human workers. In order to guarantee the number and distribution of data records to be similar in all nodes, our framework first redistributes the original data to all participating nodes. Then a heuristic two-phase anonymization schema, which can be seamlessly integrated into the framework, is proposed. Experimental results show that with the same objective of privacy, our approach is scalable for large-scale data and can improve the average accuracy of human worker’s analytic tasks.\",\"PeriodicalId\":321862,\"journal\":{\"name\":\"IEEE INFOCOM 2019 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)\",\"volume\":\"36 1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-04-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE INFOCOM 2019 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/infocomwkshps47286.2019.9093748\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE INFOCOM 2019 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/infocomwkshps47286.2019.9093748","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在传统数据库系统中，数据匿名化得到了广泛的研究，它为数据隐私保护提供了一种有效的解决方案，其中多维匿名化方案得到了广泛的应用。但是，如果没有精细的参数设置，这些技术可能会造成不可控制的信息丢失，降低数据分析任务的准确性。此外，众包数据通常数量巨大，必须分散存储在云上，这使得传统的数据匿名化技术不适用。在本文中，我们提出了一个框架，该框架使用MapReduce在将大规模数据传播给人类工作人员之前对其进行匿名化。为了保证所有节点中数据记录的数量和分布相似，我们的框架首先将原始数据重新分布到所有参与节点。然后提出了一种启发式的两阶段匿名模式，该模式可以无缝集成到框架中。实验结果表明，在相同的隐私目标下，我们的方法对大规模数据具有可扩展性，可以提高人类工作人员分析任务的平均准确率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Data Anonymization for Big Crowdsourcing Data

In traditional database systems, data anonymization has been extensively studied, it provides an effective solution for data privacy preservation, and multidimensional anonymization scheme among them is widely used. However, without delicate parameter settings, these technologies may cause uncontrollable information loss and decrease the accuracy of data analytic tasks. Furthermore, crowdsourcing data is usually huge in amount and must be distributed stored in clouds, which makes the conventional data anonymization technologies not applicable. In this paper, we propose a framework that uses MapReduce to anonymize large-scale data before disseminating them to human workers. In order to guarantee the number and distribution of data records to be similar in all nodes, our framework first redistributes the original data to all participating nodes. Then a heuristic two-phase anonymization schema, which can be seamlessly integrated into the framework, is proposed. Experimental results show that with the same objective of privacy, our approach is scalable for large-scale data and can improve the average accuracy of human worker’s analytic tasks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE INFOCOM 2019 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)

自引率

0.00%

发文量