{"title":"Decision Tree Algorithm based on Sampling","authors":"Xudong Song, Xiaolan Cheng","doi":"10.1109/NPC.2007.133","DOIUrl":null,"url":null,"abstract":"As the size of the database increases, data mining algorithm faces more demanding requirements for efficiency and accuracy. Data mining for large data sets require large amounts of time and physical resources. Sampling is introduced as an effective method. Facing large data sets, a new decision tree algorithm based on sampling is put forward. It can select small initial samples with similar distribution to the original data sets to study, and stop sampling according to the time complexity requirements and convergence criteria. Comparing with the existing flexible decision tree algorithm, the algorithm can reduce the computation time and I/O complexity, while maintaining the accuracy of the tree.","PeriodicalId":278518,"journal":{"name":"2007 IFIP International Conference on Network and Parallel Computing Workshops (NPC 2007)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 IFIP International Conference on Network and Parallel Computing Workshops (NPC 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NPC.2007.133","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
As the size of the database increases, data mining algorithm faces more demanding requirements for efficiency and accuracy. Data mining for large data sets require large amounts of time and physical resources. Sampling is introduced as an effective method. Facing large data sets, a new decision tree algorithm based on sampling is put forward. It can select small initial samples with similar distribution to the original data sets to study, and stop sampling according to the time complexity requirements and convergence criteria. Comparing with the existing flexible decision tree algorithm, the algorithm can reduce the computation time and I/O complexity, while maintaining the accuracy of the tree.