{"title":"Feature Based Data Anonymization with Slicing Method for Data Publishing","authors":"Esther Gachanga, Michael W. Kimwele, L. Nderu","doi":"10.1145/3318299.3318389","DOIUrl":null,"url":null,"abstract":"Information technology has enabled the collection and sharing of large amounts of data. This data is highly dimensional and contains sensitive information which needs to be protected. When the dimensionality of data increases, a feature selection mechanism can be used to determine a subset of the attributes that have high relevance. The information contained in features with high relevance should be preserved as much as possible. Anonymization techniques have been used to protect sensitive information in published datasets. However anonymization approaches may cause a data distortion that affects attributes with high relevance and thus affect classification accuracy. This work proposes information gain based anonymization with slicing method. We conduct experiments on real life datasets. Our results show that by reducing the amount of data distortion for features with high relevance in a dataset the privacy and quality of data can be enhanced.","PeriodicalId":164987,"journal":{"name":"International Conference on Machine Learning and Computing","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Machine Learning and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3318299.3318389","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Information technology has enabled the collection and sharing of large amounts of data. This data is highly dimensional and contains sensitive information which needs to be protected. When the dimensionality of data increases, a feature selection mechanism can be used to determine a subset of the attributes that have high relevance. The information contained in features with high relevance should be preserved as much as possible. Anonymization techniques have been used to protect sensitive information in published datasets. However anonymization approaches may cause a data distortion that affects attributes with high relevance and thus affect classification accuracy. This work proposes information gain based anonymization with slicing method. We conduct experiments on real life datasets. Our results show that by reducing the amount of data distortion for features with high relevance in a dataset the privacy and quality of data can be enhanced.