Time-penalised trees (TpT): introducing a new tree-based data mining algorithm for time-varying covariates

IF 1.2 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Annals of Mathematics and Artificial Intelligence Pub Date : 2024-08-22 DOI:10.1007/s10472-024-09950-w

Mathias Valla

{"title":"Time-penalised trees (TpT): introducing a new tree-based data mining algorithm for time-varying covariates","authors":"Mathias Valla","doi":"10.1007/s10472-024-09950-w","DOIUrl":null,"url":null,"abstract":"<div><p>This article introduces a new decision tree algorithm that accounts for time-varying covariates in the decision-making process. Traditional decision tree algorithms assume that the covariates are static and do not change over time, which can lead to inaccurate predictions in dynamic environments. Other existing methods suggest workaround solutions such as the pseudo-subject approach, discussed in the article. The proposed algorithm utilises a different structure and a time-penalised splitting criterion that allows a recursive partitioning of both the covariates space and time. Relevant historical trends are then inherently involved in the construction of a tree, and are visible and interpretable once it is fit. This approach allows for innovative and highly interpretable analysis in settings where the covariates are subject to change over time. The effectiveness of the algorithm is demonstrated through a real-world data application in life insurance. The results presented in this article can be seen as an introduction or proof-of-concept of our time-penalised approach, and the algorithm’s theoretical properties and comparison against existing approaches on datasets from various fields, including healthcare, finance, insurance, environmental monitoring, and data mining in general, will be explored in forthcoming work.</p></div>","PeriodicalId":7971,"journal":{"name":"Annals of Mathematics and Artificial Intelligence","volume":"92 6","pages":"1609 - 1661"},"PeriodicalIF":1.2000,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Mathematics and Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10472-024-09950-w","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

This article introduces a new decision tree algorithm that accounts for time-varying covariates in the decision-making process. Traditional decision tree algorithms assume that the covariates are static and do not change over time, which can lead to inaccurate predictions in dynamic environments. Other existing methods suggest workaround solutions such as the pseudo-subject approach, discussed in the article. The proposed algorithm utilises a different structure and a time-penalised splitting criterion that allows a recursive partitioning of both the covariates space and time. Relevant historical trends are then inherently involved in the construction of a tree, and are visible and interpretable once it is fit. This approach allows for innovative and highly interpretable analysis in settings where the covariates are subject to change over time. The effectiveness of the algorithm is demonstrated through a real-world data application in life insurance. The results presented in this article can be seen as an introduction or proof-of-concept of our time-penalised approach, and the algorithm’s theoretical properties and comparison against existing approaches on datasets from various fields, including healthcare, finance, insurance, environmental monitoring, and data mining in general, will be explored in forthcoming work.

Abstract Image

查看原文本刊更多论文

时变树（TpT）：为时变协变量引入一种新的基于树的数据挖掘算法

本文介绍了一种新的决策树算法，该算法在决策过程中考虑了随时间变化的协变量。传统的决策树算法假定协变量是静态的，不会随时间变化，这可能导致在动态环境中预测不准确。其他现有方法提出了变通的解决方案，如文章中讨论的伪主体方法。所提出的算法采用了不同的结构和时间分隔分割标准，允许对协变因素的空间和时间进行递归分割。这样，相关的历史趋势就会内在地参与到树的构建中，一旦树被拟合，这些趋势就会显现出来并可进行解释。在协变量随时间变化的情况下，这种方法可以进行创新的、可解释性强的分析。该算法的有效性通过人寿保险领域的实际数据应用得到了验证。本文介绍的结果可以看作是我们时间分隔方法的介绍或概念验证，而算法的理论特性以及与现有方法在医疗保健、金融、保险、环境监测和数据挖掘等不同领域数据集上的比较，将在接下来的工作中进行探讨。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Annals of Mathematics and Artificial Intelligence 工程技术-计算机：人工智能

CiteScore

3.00

自引率

8.30%

发文量

审稿时长

>12 weeks

期刊介绍： Annals of Mathematics and Artificial Intelligence presents a range of topics of concern to scholars applying quantitative, combinatorial, logical, algebraic and algorithmic methods to diverse areas of Artificial Intelligence, from decision support, automated deduction, and reasoning, to knowledge-based systems, machine learning, computer vision, robotics and planning. The journal features collections of papers appearing either in volumes (400 pages) or in separate issues (100-300 pages), which focus on one topic and have one or more guest editors. Annals of Mathematics and Artificial Intelligence hopes to influence the spawning of new areas of applied mathematics and strengthen the scientific underpinnings of Artificial Intelligence.