Kevin De Boeck, Jenno Verdonck, M. Willocx, Jorn Lapon, Vincent Naessens
{"title":"有目的的数据集匿名化:资源分配用例","authors":"Kevin De Boeck, Jenno Verdonck, M. Willocx, Jorn Lapon, Vincent Naessens","doi":"10.1109/ISCSIC54682.2021.00045","DOIUrl":null,"url":null,"abstract":"Nowadays, companies are collecting huge amounts of data. Applying the collected data to optimize the business activities can significantly improve profit margins. In this context, companies often want to enhance their models by enriching the data with data from external sources. Increasingly, companies are also considering selling data as an additional source of income. Governments are also willing to share citizen data with businesses. The GDPR regulation, introduced in May 2018 provides a framework for different parties (commercial, governmental, academic) to share and sell data provided that the data is anonymized. The effect of this anonymization step on the quality of the data (and the resulting business optimization conclusions) are still unclear. Utility and quality metrics that exist are purely theoretical, and do not grasp the purpose of the anonymized data, resulting in discrepancies between the expected and the actual utility of an anonymized dataset. This work studies the practical utility of anonymized datasets. It assesses the effect of applying the K-anonymity metric and dataset sampling on the utility of the data by conducting experiments on a resource allocation use case. Practical guidelines are presented for anonymizing datasets while maintaining a high degree of practical utility.","PeriodicalId":431036,"journal":{"name":"2021 International Symposium on Computer Science and Intelligent Controls (ISCSIC)","volume":"80 6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Dataset Anonymization with Purpose: A Resource Allocation Use Case\",\"authors\":\"Kevin De Boeck, Jenno Verdonck, M. Willocx, Jorn Lapon, Vincent Naessens\",\"doi\":\"10.1109/ISCSIC54682.2021.00045\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Nowadays, companies are collecting huge amounts of data. Applying the collected data to optimize the business activities can significantly improve profit margins. In this context, companies often want to enhance their models by enriching the data with data from external sources. Increasingly, companies are also considering selling data as an additional source of income. Governments are also willing to share citizen data with businesses. The GDPR regulation, introduced in May 2018 provides a framework for different parties (commercial, governmental, academic) to share and sell data provided that the data is anonymized. The effect of this anonymization step on the quality of the data (and the resulting business optimization conclusions) are still unclear. Utility and quality metrics that exist are purely theoretical, and do not grasp the purpose of the anonymized data, resulting in discrepancies between the expected and the actual utility of an anonymized dataset. This work studies the practical utility of anonymized datasets. It assesses the effect of applying the K-anonymity metric and dataset sampling on the utility of the data by conducting experiments on a resource allocation use case. Practical guidelines are presented for anonymizing datasets while maintaining a high degree of practical utility.\",\"PeriodicalId\":431036,\"journal\":{\"name\":\"2021 International Symposium on Computer Science and Intelligent Controls (ISCSIC)\",\"volume\":\"80 6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Symposium on Computer Science and Intelligent Controls (ISCSIC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCSIC54682.2021.00045\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Symposium on Computer Science and Intelligent Controls (ISCSIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCSIC54682.2021.00045","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Dataset Anonymization with Purpose: A Resource Allocation Use Case
Nowadays, companies are collecting huge amounts of data. Applying the collected data to optimize the business activities can significantly improve profit margins. In this context, companies often want to enhance their models by enriching the data with data from external sources. Increasingly, companies are also considering selling data as an additional source of income. Governments are also willing to share citizen data with businesses. The GDPR regulation, introduced in May 2018 provides a framework for different parties (commercial, governmental, academic) to share and sell data provided that the data is anonymized. The effect of this anonymization step on the quality of the data (and the resulting business optimization conclusions) are still unclear. Utility and quality metrics that exist are purely theoretical, and do not grasp the purpose of the anonymized data, resulting in discrepancies between the expected and the actual utility of an anonymized dataset. This work studies the practical utility of anonymized datasets. It assesses the effect of applying the K-anonymity metric and dataset sampling on the utility of the data by conducting experiments on a resource allocation use case. Practical guidelines are presented for anonymizing datasets while maintaining a high degree of practical utility.