{"title":"Classification Acceleration via Merging Decision Trees","authors":"Chenglin Fan, P. Li","doi":"10.1145/3412815.3416886","DOIUrl":null,"url":null,"abstract":"We study the problem of merging decision trees: Given k decision trees $T_1,T_2,T_3...,T_k$, we merge these trees into one super tree T with (often) much smaller size. The resultant super tree T, which is an integration of k decision trees with each leaf having a major label, can also be considered as a (lossless) compression of a random forest. For any testing instance, it is guaranteed that the tree T gives the same prediction as the random forest consisting of $T_1,T_2,T_3...,T_k$ but it saves the computational effort needed for traversing multiple trees. The proposed method is suitable for classification problems with time constraints, for example, the online classification task such that it needs to predict a label for a new instance before the next instance arrives. Experiments on five datasets confirm that the super tree T runs significantly faster than the random forest with k trees. The merging procedure also saves space needed storing those k trees, and it makes the forest model more interpretable, since naturally one tree is easier to be interpreted than k trees.","PeriodicalId":176130,"journal":{"name":"Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3412815.3416886","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
We study the problem of merging decision trees: Given k decision trees $T_1,T_2,T_3...,T_k$, we merge these trees into one super tree T with (often) much smaller size. The resultant super tree T, which is an integration of k decision trees with each leaf having a major label, can also be considered as a (lossless) compression of a random forest. For any testing instance, it is guaranteed that the tree T gives the same prediction as the random forest consisting of $T_1,T_2,T_3...,T_k$ but it saves the computational effort needed for traversing multiple trees. The proposed method is suitable for classification problems with time constraints, for example, the online classification task such that it needs to predict a label for a new instance before the next instance arrives. Experiments on five datasets confirm that the super tree T runs significantly faster than the random forest with k trees. The merging procedure also saves space needed storing those k trees, and it makes the forest model more interpretable, since naturally one tree is easier to be interpreted than k trees.