Mohanad Ajina, B. Yousefi, Jim Jones, Kathryn B. Laskey
{"title":"大型面板数据集去识别和匿名化的安全方法","authors":"Mohanad Ajina, B. Yousefi, Jim Jones, Kathryn B. Laskey","doi":"10.23919/fusion43075.2019.9011394","DOIUrl":null,"url":null,"abstract":"Government agencies, as well as private companies, may need to share private information with third party organizations for various reasons. There exist legitimate concerns about disclosing the information of individuals, sensitive details of agencies and organizations, and other private information. Consequently, information shared with external parties may be redacted to hide confidential information about individuals and companies while providing essential data required by third parties in order to perform their duties. This paper presents a method to de-identify and anonymize large-scale panel data from an organization. The method can handle a variety of data types, and it is scalable to datasets of any size. The challenge of de-identification and anonymization a large-scale and diverse dataset is to protect individual identities and retain useful data in the presence of unstructured field data and unpredictable frequency distributions. This is addressed by analyzing the dataset and applying a filtering and aggregation method. This is accompanied by a streamlined implementation and post-validation process, which ensures the security of the organization's data, and the computational efficiency of the approach when handling large-scale panel data sets.","PeriodicalId":348881,"journal":{"name":"2019 22th International Conference on Information Fusion (FUSION)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Secure Method for De-Identifying and Anonymizing Large Panel Datasets\",\"authors\":\"Mohanad Ajina, B. Yousefi, Jim Jones, Kathryn B. Laskey\",\"doi\":\"10.23919/fusion43075.2019.9011394\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Government agencies, as well as private companies, may need to share private information with third party organizations for various reasons. There exist legitimate concerns about disclosing the information of individuals, sensitive details of agencies and organizations, and other private information. Consequently, information shared with external parties may be redacted to hide confidential information about individuals and companies while providing essential data required by third parties in order to perform their duties. This paper presents a method to de-identify and anonymize large-scale panel data from an organization. The method can handle a variety of data types, and it is scalable to datasets of any size. The challenge of de-identification and anonymization a large-scale and diverse dataset is to protect individual identities and retain useful data in the presence of unstructured field data and unpredictable frequency distributions. This is addressed by analyzing the dataset and applying a filtering and aggregation method. This is accompanied by a streamlined implementation and post-validation process, which ensures the security of the organization's data, and the computational efficiency of the approach when handling large-scale panel data sets.\",\"PeriodicalId\":348881,\"journal\":{\"name\":\"2019 22th International Conference on Information Fusion (FUSION)\",\"volume\":\"52 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 22th International Conference on Information Fusion (FUSION)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/fusion43075.2019.9011394\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 22th International Conference on Information Fusion (FUSION)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/fusion43075.2019.9011394","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Secure Method for De-Identifying and Anonymizing Large Panel Datasets
Government agencies, as well as private companies, may need to share private information with third party organizations for various reasons. There exist legitimate concerns about disclosing the information of individuals, sensitive details of agencies and organizations, and other private information. Consequently, information shared with external parties may be redacted to hide confidential information about individuals and companies while providing essential data required by third parties in order to perform their duties. This paper presents a method to de-identify and anonymize large-scale panel data from an organization. The method can handle a variety of data types, and it is scalable to datasets of any size. The challenge of de-identification and anonymization a large-scale and diverse dataset is to protect individual identities and retain useful data in the presence of unstructured field data and unpredictable frequency distributions. This is addressed by analyzing the dataset and applying a filtering and aggregation method. This is accompanied by a streamlined implementation and post-validation process, which ensures the security of the organization's data, and the computational efficiency of the approach when handling large-scale panel data sets.