{"title":"A methodology for enhancing data quality for classification purposes using attribute-based decision graphs","authors":"J. R. Bertini","doi":"10.1109/LA-CCI.2017.8285692","DOIUrl":null,"url":null,"abstract":"The accuracy performance of a classification system strongly depends on the quality of the data used to train it. Among other issues, noise in the attribute values degrades data quality and interferes badly with the process of automatic classification. This paper proposes a novel method of data cleansing designed for enhancing classification accuracy. The cleansing procedure is based on the Attribute-based Decision Graphs, which are graphs built over the attribute space of a data set. Such graphs gather the underlying patterns from the data set and use this knowledge to check each attribute value for noise. Classification results considering four learning algorithms and five data sets with artificially added noise have shown the effectiveness of the proposed cleansing procedure.","PeriodicalId":144567,"journal":{"name":"2017 IEEE Latin American Conference on Computational Intelligence (LA-CCI)","volume":"40 6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Latin American Conference on Computational Intelligence (LA-CCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/LA-CCI.2017.8285692","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The accuracy performance of a classification system strongly depends on the quality of the data used to train it. Among other issues, noise in the attribute values degrades data quality and interferes badly with the process of automatic classification. This paper proposes a novel method of data cleansing designed for enhancing classification accuracy. The cleansing procedure is based on the Attribute-based Decision Graphs, which are graphs built over the attribute space of a data set. Such graphs gather the underlying patterns from the data set and use this knowledge to check each attribute value for noise. Classification results considering four learning algorithms and five data sets with artificially added noise have shown the effectiveness of the proposed cleansing procedure.