The influence of dimensions on the complexity of computing decision trees

IF 5.1 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Stephen Kobourov , Maarten Löffler , Fabrizio Montecchiani , Marcin Pilipczuk , Ignaz Rutter , Raimund Seidel , Manuel Sorge , Jules Wulms
{"title":"The influence of dimensions on the complexity of computing decision trees","authors":"Stephen Kobourov ,&nbsp;Maarten Löffler ,&nbsp;Fabrizio Montecchiani ,&nbsp;Marcin Pilipczuk ,&nbsp;Ignaz Rutter ,&nbsp;Raimund Seidel ,&nbsp;Manuel Sorge ,&nbsp;Jules Wulms","doi":"10.1016/j.artint.2025.104322","DOIUrl":null,"url":null,"abstract":"<div><div>A decision tree recursively splits a feature space <span><math><msup><mrow><mi>R</mi></mrow><mrow><mi>d</mi></mrow></msup></math></span> and then assigns class labels based on the resulting partition. Decision trees have been part of the basic machine-learning toolkit for decades. A large body of work considers heuristic algorithms that compute a decision tree from training data, usually aiming to minimize in particular the size of the resulting tree. In contrast, little is known about the complexity of the underlying computational problem of computing a minimum-size tree for the given training data. We study this problem with respect to the number <em>d</em> of dimensions of the feature space <span><math><msup><mrow><mi>R</mi></mrow><mrow><mi>d</mi></mrow></msup></math></span>, which contains <em>n</em> training examples. We show that it can be solved in <span><math><mi>O</mi><mo>(</mo><msup><mrow><mi>n</mi></mrow><mrow><mn>2</mn><mi>d</mi><mo>+</mo><mn>1</mn></mrow></msup><mo>)</mo></math></span> time, but under reasonable complexity-theoretic assumptions it is not possible to achieve <span><math><mi>f</mi><mo>(</mo><mi>d</mi><mo>)</mo><mo>⋅</mo><msup><mrow><mi>n</mi></mrow><mrow><mi>o</mi><mo>(</mo><mi>d</mi><mo>/</mo><mi>log</mi><mo>⁡</mo><mi>d</mi><mo>)</mo></mrow></msup></math></span> running time. The problem is solvable in <span><math><msup><mrow><mo>(</mo><mi>d</mi><mi>R</mi><mo>)</mo></mrow><mrow><mi>O</mi><mo>(</mo><mi>d</mi><mi>R</mi><mo>)</mo></mrow></msup><mo>⋅</mo><msup><mrow><mi>n</mi></mrow><mrow><mn>1</mn><mo>+</mo><mi>o</mi><mo>(</mo><mn>1</mn><mo>)</mo></mrow></msup></math></span> time if there are exactly two classes and <em>R</em> is an upper bound on the number of tree leaves labeled with the first class.</div></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":"343 ","pages":"Article 104322"},"PeriodicalIF":5.1000,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0004370225000414","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

A decision tree recursively splits a feature space Rd and then assigns class labels based on the resulting partition. Decision trees have been part of the basic machine-learning toolkit for decades. A large body of work considers heuristic algorithms that compute a decision tree from training data, usually aiming to minimize in particular the size of the resulting tree. In contrast, little is known about the complexity of the underlying computational problem of computing a minimum-size tree for the given training data. We study this problem with respect to the number d of dimensions of the feature space Rd, which contains n training examples. We show that it can be solved in O(n2d+1) time, but under reasonable complexity-theoretic assumptions it is not possible to achieve f(d)no(d/logd) running time. The problem is solvable in (dR)O(dR)n1+o(1) time if there are exactly two classes and R is an upper bound on the number of tree leaves labeled with the first class.
维数对决策树计算复杂度的影响
决策树递归地分割特征空间Rd,然后根据分割结果分配类标签。几十年来,决策树一直是基本机器学习工具包的一部分。大量的工作考虑了从训练数据计算决策树的启发式算法,通常以最小化结果树的大小为目标。相比之下,对于为给定训练数据计算最小大小树的潜在计算问题的复杂性知之甚少。我们根据特征空间Rd的维数d来研究这个问题,其中包含n个训练样本。我们证明了它可以在O(n2d+1)时间内求解,但在合理的复杂性理论假设下,不可能实现f(d)⋅no(d/log (d))的运行时间。如果恰好有两类且R是标记为第一类的树叶数目的上界,则问题在(dR)O(dR)⋅n1+ O(1)时间内可解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Artificial Intelligence
Artificial Intelligence 工程技术-计算机:人工智能
CiteScore
11.20
自引率
1.40%
发文量
118
审稿时长
8 months
期刊介绍: The Journal of Artificial Intelligence (AIJ) welcomes papers covering a broad spectrum of AI topics, including cognition, automated reasoning, computer vision, machine learning, and more. Papers should demonstrate advancements in AI and propose innovative approaches to AI problems. Additionally, the journal accepts papers describing AI applications, focusing on how new methods enhance performance rather than reiterating conventional approaches. In addition to regular papers, AIJ also accepts Research Notes, Research Field Reviews, Position Papers, Book Reviews, and summary papers on AI challenges and competitions.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信