{"title":"错误感知数据挖掘","authors":"Xingquan Zhu, Xindong Wu","doi":"10.1109/GRC.2006.1635795","DOIUrl":null,"url":null,"abstract":"Real-world data mining applications often deal with low-quality information sources where data collection inaccuracy, device limitations, data transmission and discretization errors, or man-made perturbations frequently result in imprecise or vague data. Two common practices are to adopt either data cleansing to enhance data consistency or simply take noisy data as quality sources and feed them into the data mining algorithms. Either way may substantially sacrifice the mining performances. In this paper, we consider an error awareness data mining framework, which takes advantage of statistical error information (such as noise level and noise distribution) to improve data mining results. We assume such noise knowledge is available in advance, and propose a solution to incorporate it into the mining process. More specifically, we use noise knowledge to restore original data distributions, and then use the restored information to modify the model built from noise corrupted data. We present an Error Awareness Naive Bayes (EA_NB) classification algorithm, and provide extensive experimental comparisons to demonstrate the effectiveness of this effort.","PeriodicalId":400997,"journal":{"name":"2006 IEEE International Conference on Granular Computing","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Error awareness data mining\",\"authors\":\"Xingquan Zhu, Xindong Wu\",\"doi\":\"10.1109/GRC.2006.1635795\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Real-world data mining applications often deal with low-quality information sources where data collection inaccuracy, device limitations, data transmission and discretization errors, or man-made perturbations frequently result in imprecise or vague data. Two common practices are to adopt either data cleansing to enhance data consistency or simply take noisy data as quality sources and feed them into the data mining algorithms. Either way may substantially sacrifice the mining performances. In this paper, we consider an error awareness data mining framework, which takes advantage of statistical error information (such as noise level and noise distribution) to improve data mining results. We assume such noise knowledge is available in advance, and propose a solution to incorporate it into the mining process. More specifically, we use noise knowledge to restore original data distributions, and then use the restored information to modify the model built from noise corrupted data. We present an Error Awareness Naive Bayes (EA_NB) classification algorithm, and provide extensive experimental comparisons to demonstrate the effectiveness of this effort.\",\"PeriodicalId\":400997,\"journal\":{\"name\":\"2006 IEEE International Conference on Granular Computing\",\"volume\":\"32 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-05-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2006 IEEE International Conference on Granular Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/GRC.2006.1635795\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 IEEE International Conference on Granular Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GRC.2006.1635795","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Real-world data mining applications often deal with low-quality information sources where data collection inaccuracy, device limitations, data transmission and discretization errors, or man-made perturbations frequently result in imprecise or vague data. Two common practices are to adopt either data cleansing to enhance data consistency or simply take noisy data as quality sources and feed them into the data mining algorithms. Either way may substantially sacrifice the mining performances. In this paper, we consider an error awareness data mining framework, which takes advantage of statistical error information (such as noise level and noise distribution) to improve data mining results. We assume such noise knowledge is available in advance, and propose a solution to incorporate it into the mining process. More specifically, we use noise knowledge to restore original data distributions, and then use the restored information to modify the model built from noise corrupted data. We present an Error Awareness Naive Bayes (EA_NB) classification algorithm, and provide extensive experimental comparisons to demonstrate the effectiveness of this effort.