{"title":"基于特征的数据匿名化与切片数据发布方法","authors":"Esther Gachanga, Michael W. Kimwele, L. Nderu","doi":"10.1145/3318299.3318389","DOIUrl":null,"url":null,"abstract":"Information technology has enabled the collection and sharing of large amounts of data. This data is highly dimensional and contains sensitive information which needs to be protected. When the dimensionality of data increases, a feature selection mechanism can be used to determine a subset of the attributes that have high relevance. The information contained in features with high relevance should be preserved as much as possible. Anonymization techniques have been used to protect sensitive information in published datasets. However anonymization approaches may cause a data distortion that affects attributes with high relevance and thus affect classification accuracy. This work proposes information gain based anonymization with slicing method. We conduct experiments on real life datasets. Our results show that by reducing the amount of data distortion for features with high relevance in a dataset the privacy and quality of data can be enhanced.","PeriodicalId":164987,"journal":{"name":"International Conference on Machine Learning and Computing","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Feature Based Data Anonymization with Slicing Method for Data Publishing\",\"authors\":\"Esther Gachanga, Michael W. Kimwele, L. Nderu\",\"doi\":\"10.1145/3318299.3318389\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Information technology has enabled the collection and sharing of large amounts of data. This data is highly dimensional and contains sensitive information which needs to be protected. When the dimensionality of data increases, a feature selection mechanism can be used to determine a subset of the attributes that have high relevance. The information contained in features with high relevance should be preserved as much as possible. Anonymization techniques have been used to protect sensitive information in published datasets. However anonymization approaches may cause a data distortion that affects attributes with high relevance and thus affect classification accuracy. This work proposes information gain based anonymization with slicing method. We conduct experiments on real life datasets. Our results show that by reducing the amount of data distortion for features with high relevance in a dataset the privacy and quality of data can be enhanced.\",\"PeriodicalId\":164987,\"journal\":{\"name\":\"International Conference on Machine Learning and Computing\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-02-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Machine Learning and Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3318299.3318389\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Machine Learning and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3318299.3318389","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Feature Based Data Anonymization with Slicing Method for Data Publishing
Information technology has enabled the collection and sharing of large amounts of data. This data is highly dimensional and contains sensitive information which needs to be protected. When the dimensionality of data increases, a feature selection mechanism can be used to determine a subset of the attributes that have high relevance. The information contained in features with high relevance should be preserved as much as possible. Anonymization techniques have been used to protect sensitive information in published datasets. However anonymization approaches may cause a data distortion that affects attributes with high relevance and thus affect classification accuracy. This work proposes information gain based anonymization with slicing method. We conduct experiments on real life datasets. Our results show that by reducing the amount of data distortion for features with high relevance in a dataset the privacy and quality of data can be enhanced.