Mohammad Azad, I. Chikalov, Shahid Hussain, M. Moshkov
{"title":"Multi-pruning of decision trees for knowledge representation and classification","authors":"Mohammad Azad, I. Chikalov, Shahid Hussain, M. Moshkov","doi":"10.1109/ACPR.2015.7486574","DOIUrl":null,"url":null,"abstract":"We consider two important questions related to decision trees: first how to construct a decision tree with reasonable number of nodes and reasonable number of misclassification, and second how to improve the prediction accuracy of decision trees when they are used as classifiers. We have created a dynamic programming based approach for bi-criteria optimization of decision trees relative to the number of nodes and the number of misclassification. This approach allows us to construct the set of all Pareto optimal points and to derive, for each such point, decision trees with parameters corresponding to that point. Experiments on datasets from UCI ML Repository show that, very often, we can find a suitable Pareto optimal point and derive a decision tree with small number of nodes at the expense of small increment in number of misclassification. Based on the created approach we have proposed a multi-pruning procedure which constructs decision trees that, as classifiers, often outperform decision trees constructed by CART.","PeriodicalId":240902,"journal":{"name":"2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACPR.2015.7486574","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10
Abstract
We consider two important questions related to decision trees: first how to construct a decision tree with reasonable number of nodes and reasonable number of misclassification, and second how to improve the prediction accuracy of decision trees when they are used as classifiers. We have created a dynamic programming based approach for bi-criteria optimization of decision trees relative to the number of nodes and the number of misclassification. This approach allows us to construct the set of all Pareto optimal points and to derive, for each such point, decision trees with parameters corresponding to that point. Experiments on datasets from UCI ML Repository show that, very often, we can find a suitable Pareto optimal point and derive a decision tree with small number of nodes at the expense of small increment in number of misclassification. Based on the created approach we have proposed a multi-pruning procedure which constructs decision trees that, as classifiers, often outperform decision trees constructed by CART.
本文研究了与决策树相关的两个重要问题:一是如何构造节点数目合理、误分类数目合理的决策树;二是如何提高决策树作为分类器时的预测精度。我们创建了一种基于动态规划的方法,用于相对于节点数量和错误分类数量的决策树的双标准优化。这种方法允许我们构造所有帕累托最优点的集合,并为每个这样的点推导出具有与该点对应的参数的决策树。在UCI ML Repository的数据集上进行的实验表明,我们通常可以找到一个合适的Pareto最优点,并以少量的错误分类增量为代价,得到一个节点数较少的决策树。基于所创建的方法,我们提出了一个多剪枝过程,该过程构建的决策树作为分类器,通常优于CART构建的决策树。