H. Silva, Tânia Basso, Regina L. O. Moraes, D. Elia, S. Fiore
{"title":"数据分析平台基于风险的再识别匿名化框架","authors":"H. Silva, Tânia Basso, Regina L. O. Moraes, D. Elia, S. Fiore","doi":"10.1109/EDCC.2018.00026","DOIUrl":null,"url":null,"abstract":"Preserving individual privacy is one of the major issues in the context of Big Data, since handling huge volumes of data may contribute to the disclosure of sensitive or personally identifiable information. In fact, even when data is anonymized there is a risk of re-identification through privacy attacks. This paper presents a re-identification risk-based anonymization framework for big data analytics platforms. This framework is based on anonymization policies and allows applying anonymization techniques and models in two stages: during the ETL process and before exporting the statistical results of data analytics. This second stage evaluates the data re-identification risk and increases the anonymity level if it is necessary to reduce this risk. Although generic, the implementation of the framework reported in this work was integrated into Ophidia as a case study. Privacy attacks were performed to check the effectiveness of the re-identification process. Results are promising, showing a low probability of re-identification in two different scenarios.","PeriodicalId":129399,"journal":{"name":"2018 14th European Dependable Computing Conference (EDCC)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"A Re-Identification Risk-Based Anonymization Framework for Data Analytics Platforms\",\"authors\":\"H. Silva, Tânia Basso, Regina L. O. Moraes, D. Elia, S. Fiore\",\"doi\":\"10.1109/EDCC.2018.00026\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Preserving individual privacy is one of the major issues in the context of Big Data, since handling huge volumes of data may contribute to the disclosure of sensitive or personally identifiable information. In fact, even when data is anonymized there is a risk of re-identification through privacy attacks. This paper presents a re-identification risk-based anonymization framework for big data analytics platforms. This framework is based on anonymization policies and allows applying anonymization techniques and models in two stages: during the ETL process and before exporting the statistical results of data analytics. This second stage evaluates the data re-identification risk and increases the anonymity level if it is necessary to reduce this risk. Although generic, the implementation of the framework reported in this work was integrated into Ophidia as a case study. Privacy attacks were performed to check the effectiveness of the re-identification process. Results are promising, showing a low probability of re-identification in two different scenarios.\",\"PeriodicalId\":129399,\"journal\":{\"name\":\"2018 14th European Dependable Computing Conference (EDCC)\",\"volume\":\"43 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 14th European Dependable Computing Conference (EDCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/EDCC.2018.00026\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 14th European Dependable Computing Conference (EDCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EDCC.2018.00026","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Re-Identification Risk-Based Anonymization Framework for Data Analytics Platforms
Preserving individual privacy is one of the major issues in the context of Big Data, since handling huge volumes of data may contribute to the disclosure of sensitive or personally identifiable information. In fact, even when data is anonymized there is a risk of re-identification through privacy attacks. This paper presents a re-identification risk-based anonymization framework for big data analytics platforms. This framework is based on anonymization policies and allows applying anonymization techniques and models in two stages: during the ETL process and before exporting the statistical results of data analytics. This second stage evaluates the data re-identification risk and increases the anonymity level if it is necessary to reduce this risk. Although generic, the implementation of the framework reported in this work was integrated into Ophidia as a case study. Privacy attacks were performed to check the effectiveness of the re-identification process. Results are promising, showing a low probability of re-identification in two different scenarios.