一种自底向上倾斜决策树归纳算法

2011 11th International Conference on Intelligent Systems Design and Applications Pub Date : 2011-11-01 DOI:10.1109/ISDA.2011.6121697

Rodrigo C. Barros, R. Cerri, P. Jaskowiak, A. Carvalho

{"title":"一种自底向上倾斜决策树归纳算法","authors":"Rodrigo C. Barros, R. Cerri, P. Jaskowiak, A. Carvalho","doi":"10.1109/ISDA.2011.6121697","DOIUrl":null,"url":null,"abstract":"Decision tree induction algorithms are widely used in knowledge discovery and data mining, specially in scenarios where model comprehensibility is desired. A variation of the traditional univariate approach is the so-called oblique decision tree, which allows multivariate tests in its non-terminal nodes. Oblique decision trees can model decision boundaries that are oblique to the attribute axes, whereas univariate trees can only perform axis-parallel splits. The majority of the oblique and univariate decision tree induction algorithms perform a top-down strategy for growing the tree, relying on an impurity-based measure for splitting nodes. In this paper, we propose a novel bottom-up algorithm for inducing oblique trees named BUTIA. It does not require an impurity-measure for dividing nodes, since we know a priori the data resulting from each split. For generating the splitting hyperplanes, our algorithm implements a support vector machine solution, and a clustering algorithm is used for generating the initial leaves. We compare BUTIA to traditional univariate and oblique decision tree algorithms, C4.5, CART, OC1 and FT, as well as to a standard SVM implementation, using real gene expression benchmark data. Experimental results show the effectiveness of the proposed approach in several cases.","PeriodicalId":433207,"journal":{"name":"2011 11th International Conference on Intelligent Systems Design and Applications","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":"{\"title\":\"A bottom-up oblique decision tree induction algorithm\",\"authors\":\"Rodrigo C. Barros, R. Cerri, P. Jaskowiak, A. Carvalho\",\"doi\":\"10.1109/ISDA.2011.6121697\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Decision tree induction algorithms are widely used in knowledge discovery and data mining, specially in scenarios where model comprehensibility is desired. A variation of the traditional univariate approach is the so-called oblique decision tree, which allows multivariate tests in its non-terminal nodes. Oblique decision trees can model decision boundaries that are oblique to the attribute axes, whereas univariate trees can only perform axis-parallel splits. The majority of the oblique and univariate decision tree induction algorithms perform a top-down strategy for growing the tree, relying on an impurity-based measure for splitting nodes. In this paper, we propose a novel bottom-up algorithm for inducing oblique trees named BUTIA. It does not require an impurity-measure for dividing nodes, since we know a priori the data resulting from each split. For generating the splitting hyperplanes, our algorithm implements a support vector machine solution, and a clustering algorithm is used for generating the initial leaves. We compare BUTIA to traditional univariate and oblique decision tree algorithms, C4.5, CART, OC1 and FT, as well as to a standard SVM implementation, using real gene expression benchmark data. Experimental results show the effectiveness of the proposed approach in several cases.\",\"PeriodicalId\":433207,\"journal\":{\"name\":\"2011 11th International Conference on Intelligent Systems Design and Applications\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"26\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 11th International Conference on Intelligent Systems Design and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISDA.2011.6121697\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 11th International Conference on Intelligent Systems Design and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISDA.2011.6121697","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 26

摘要

决策树归纳算法在知识发现和数据挖掘中有着广泛的应用，特别是在需要模型可理解性的场景中。传统的单变量方法的一种变体是所谓的倾斜决策树，它允许在其非终端节点上进行多变量测试。倾斜决策树可以对倾斜于属性轴的决策边界建模，而单变量树只能执行轴平行分割。大多数倾斜和单变量决策树归纳算法执行自顶向下的策略来生长树，依赖于基于杂质的度量来分裂节点。本文提出了一种自底向上的斜树诱导算法BUTIA。它不需要杂质度量来划分节点，因为我们先验地知道每次划分所产生的数据。为了生成分裂超平面，我们的算法实现了支持向量机解决方案，并使用聚类算法生成初始叶。我们将BUTIA与传统的单变量和倾斜决策树算法(C4.5, CART, OC1和FT)以及使用真实基因表达基准数据的标准SVM实现进行比较。实验结果表明了该方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A bottom-up oblique decision tree induction algorithm

Decision tree induction algorithms are widely used in knowledge discovery and data mining, specially in scenarios where model comprehensibility is desired. A variation of the traditional univariate approach is the so-called oblique decision tree, which allows multivariate tests in its non-terminal nodes. Oblique decision trees can model decision boundaries that are oblique to the attribute axes, whereas univariate trees can only perform axis-parallel splits. The majority of the oblique and univariate decision tree induction algorithms perform a top-down strategy for growing the tree, relying on an impurity-based measure for splitting nodes. In this paper, we propose a novel bottom-up algorithm for inducing oblique trees named BUTIA. It does not require an impurity-measure for dividing nodes, since we know a priori the data resulting from each split. For generating the splitting hyperplanes, our algorithm implements a support vector machine solution, and a clustering algorithm is used for generating the initial leaves. We compare BUTIA to traditional univariate and oblique decision tree algorithms, C4.5, CART, OC1 and FT, as well as to a standard SVM implementation, using real gene expression benchmark data. Experimental results show the effectiveness of the proposed approach in several cases.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 11th International Conference on Intelligent Systems Design and Applications

自引率

0.00%

发文量