{"title":"大众包数据的数据匿名化","authors":"Xiaofeng Deng, Fan Zhang, Hai Jin","doi":"10.1109/infocomwkshps47286.2019.9093748","DOIUrl":null,"url":null,"abstract":"In traditional database systems, data anonymization has been extensively studied, it provides an effective solution for data privacy preservation, and multidimensional anonymization scheme among them is widely used. However, without delicate parameter settings, these technologies may cause uncontrollable information loss and decrease the accuracy of data analytic tasks. Furthermore, crowdsourcing data is usually huge in amount and must be distributed stored in clouds, which makes the conventional data anonymization technologies not applicable. In this paper, we propose a framework that uses MapReduce to anonymize large-scale data before disseminating them to human workers. In order to guarantee the number and distribution of data records to be similar in all nodes, our framework first redistributes the original data to all participating nodes. Then a heuristic two-phase anonymization schema, which can be seamlessly integrated into the framework, is proposed. Experimental results show that with the same objective of privacy, our approach is scalable for large-scale data and can improve the average accuracy of human worker’s analytic tasks.","PeriodicalId":321862,"journal":{"name":"IEEE INFOCOM 2019 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)","volume":"36 1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Data Anonymization for Big Crowdsourcing Data\",\"authors\":\"Xiaofeng Deng, Fan Zhang, Hai Jin\",\"doi\":\"10.1109/infocomwkshps47286.2019.9093748\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In traditional database systems, data anonymization has been extensively studied, it provides an effective solution for data privacy preservation, and multidimensional anonymization scheme among them is widely used. However, without delicate parameter settings, these technologies may cause uncontrollable information loss and decrease the accuracy of data analytic tasks. Furthermore, crowdsourcing data is usually huge in amount and must be distributed stored in clouds, which makes the conventional data anonymization technologies not applicable. In this paper, we propose a framework that uses MapReduce to anonymize large-scale data before disseminating them to human workers. In order to guarantee the number and distribution of data records to be similar in all nodes, our framework first redistributes the original data to all participating nodes. Then a heuristic two-phase anonymization schema, which can be seamlessly integrated into the framework, is proposed. Experimental results show that with the same objective of privacy, our approach is scalable for large-scale data and can improve the average accuracy of human worker’s analytic tasks.\",\"PeriodicalId\":321862,\"journal\":{\"name\":\"IEEE INFOCOM 2019 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)\",\"volume\":\"36 1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-04-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE INFOCOM 2019 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/infocomwkshps47286.2019.9093748\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE INFOCOM 2019 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/infocomwkshps47286.2019.9093748","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
In traditional database systems, data anonymization has been extensively studied, it provides an effective solution for data privacy preservation, and multidimensional anonymization scheme among them is widely used. However, without delicate parameter settings, these technologies may cause uncontrollable information loss and decrease the accuracy of data analytic tasks. Furthermore, crowdsourcing data is usually huge in amount and must be distributed stored in clouds, which makes the conventional data anonymization technologies not applicable. In this paper, we propose a framework that uses MapReduce to anonymize large-scale data before disseminating them to human workers. In order to guarantee the number and distribution of data records to be similar in all nodes, our framework first redistributes the original data to all participating nodes. Then a heuristic two-phase anonymization schema, which can be seamlessly integrated into the framework, is proposed. Experimental results show that with the same objective of privacy, our approach is scalable for large-scale data and can improve the average accuracy of human worker’s analytic tasks.