{"title":"Sampling learning based association rules mining algorithm","authors":"Xiao-Lan Xie, Ying Zhang, Yingtao Xu","doi":"10.1109/ICACI.2012.6463168","DOIUrl":null,"url":null,"abstract":"The view that sampling technology could improve the efficiency of data mining significantly has been widely accepted by the research community. The key to sample in data mining is how to design a sampling strategy to get a favorable sample to execute the mining algorithm at minor cost of accuracy. In this article we propose a progressive sampling algorithm based on confusion matrix to determine the optimal sample size. The novelty of this algorithm is that it can find the appropriate sample very quickly and very accurately without executing the data mining.","PeriodicalId":404759,"journal":{"name":"2012 IEEE Fifth International Conference on Advanced Computational Intelligence (ICACI)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE Fifth International Conference on Advanced Computational Intelligence (ICACI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACI.2012.6463168","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
The view that sampling technology could improve the efficiency of data mining significantly has been widely accepted by the research community. The key to sample in data mining is how to design a sampling strategy to get a favorable sample to execute the mining algorithm at minor cost of accuracy. In this article we propose a progressive sampling algorithm based on confusion matrix to determine the optimal sample size. The novelty of this algorithm is that it can find the appropriate sample very quickly and very accurately without executing the data mining.