Boost-R:用于递归数据的梯度增强树

IF 2.6 2区工程技术 Q2 ENGINEERING, INDUSTRIAL

Journal of Quality Technology Pub Date : 2021-07-03 DOI:10.1080/00224065.2021.1948373

Xiao Liu, Rong Pan

{"title":"Boost-R:用于递归数据的梯度增强树","authors":"Xiao Liu, Rong Pan","doi":"10.1080/00224065.2021.1948373","DOIUrl":null,"url":null,"abstract":"Abstract Recurrence data arise from multi-disciplinary domains spanning reliability, cyber security, healthcare, online retailing, etc. This paper investigates an additive-tree-based approach, known as Boost-R (Boosting for Recurrence Data), for recurrent event data with both static and dynamic features. Boost-R constructs an ensemble of gradient boosted additive trees to estimate the cumulative intensity function of the recurrent event process, where a new tree is added to the ensemble by minimizing the regularized L 2 distance between the observed and predicted cumulative intensity. Unlike conventional regression trees, a time-dependent function is constructed by Boost-R on each tree leaf. The sum of these functions, from multiple trees, yields the ensemble estimator of the cumulative intensity. The divide-and-conquer nature of tree-based methods is appealing when hidden sub-populations exist within a heterogeneous population. The non-parametric nature of regression trees helps to avoid parametric assumptions on the complex interactions between event processes and features. Critical insights and advantages of Boost-R are investigated through comprehensive numerical examples. Datasets and computer code of Boost-R are made available on GitHub. To our best knowledge, Boost-R is the first gradient boosted additive-tree-based approach for modeling large-scale recurrent event data with both static and dynamic feature information.","PeriodicalId":54769,"journal":{"name":"Journal of Quality Technology","volume":"11 1","pages":"545 - 565"},"PeriodicalIF":2.6000,"publicationDate":"2021-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Boost-R: Gradient boosted trees for recurrence data\",\"authors\":\"Xiao Liu, Rong Pan\",\"doi\":\"10.1080/00224065.2021.1948373\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Recurrence data arise from multi-disciplinary domains spanning reliability, cyber security, healthcare, online retailing, etc. This paper investigates an additive-tree-based approach, known as Boost-R (Boosting for Recurrence Data), for recurrent event data with both static and dynamic features. Boost-R constructs an ensemble of gradient boosted additive trees to estimate the cumulative intensity function of the recurrent event process, where a new tree is added to the ensemble by minimizing the regularized L 2 distance between the observed and predicted cumulative intensity. Unlike conventional regression trees, a time-dependent function is constructed by Boost-R on each tree leaf. The sum of these functions, from multiple trees, yields the ensemble estimator of the cumulative intensity. The divide-and-conquer nature of tree-based methods is appealing when hidden sub-populations exist within a heterogeneous population. The non-parametric nature of regression trees helps to avoid parametric assumptions on the complex interactions between event processes and features. Critical insights and advantages of Boost-R are investigated through comprehensive numerical examples. Datasets and computer code of Boost-R are made available on GitHub. To our best knowledge, Boost-R is the first gradient boosted additive-tree-based approach for modeling large-scale recurrent event data with both static and dynamic feature information.\",\"PeriodicalId\":54769,\"journal\":{\"name\":\"Journal of Quality Technology\",\"volume\":\"11 1\",\"pages\":\"545 - 565\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2021-07-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Quality Technology\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.1080/00224065.2021.1948373\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, INDUSTRIAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Quality Technology","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1080/00224065.2021.1948373","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, INDUSTRIAL","Score":null,"Total":0}

引用次数: 2

摘要

重复数据来自多学科领域，包括可靠性、网络安全、医疗保健、在线零售等。本文研究了一种基于加性树的方法，称为Boost-R (Boosting for recurrent Data)，用于具有静态和动态特征的循环事件数据。Boost-R构建了一个梯度增强的加性树集合来估计循环事件过程的累积强度函数，其中通过最小化观测到的和预测的累积强度之间的正则化l2距离，将新树添加到集合中。与传统的回归树不同，Boost-R在每个树叶上构建了一个时间相关的函数。这些函数的和，从多个树，产生累积强度的集合估计。当隐藏的子种群存在于异质种群中时，基于树的方法的分而治之的特性很有吸引力。回归树的非参数性质有助于避免对事件过程和特征之间复杂的相互作用进行参数假设。通过全面的数值实例研究了Boost-R的关键见解和优势。Boost-R的数据集和计算机代码可在GitHub上获得。据我们所知，Boost-R是第一个基于梯度增强加性树的方法，用于对具有静态和动态特征信息的大规模循环事件数据建模。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Boost-R: Gradient boosted trees for recurrence data

Abstract Recurrence data arise from multi-disciplinary domains spanning reliability, cyber security, healthcare, online retailing, etc. This paper investigates an additive-tree-based approach, known as Boost-R (Boosting for Recurrence Data), for recurrent event data with both static and dynamic features. Boost-R constructs an ensemble of gradient boosted additive trees to estimate the cumulative intensity function of the recurrent event process, where a new tree is added to the ensemble by minimizing the regularized L 2 distance between the observed and predicted cumulative intensity. Unlike conventional regression trees, a time-dependent function is constructed by Boost-R on each tree leaf. The sum of these functions, from multiple trees, yields the ensemble estimator of the cumulative intensity. The divide-and-conquer nature of tree-based methods is appealing when hidden sub-populations exist within a heterogeneous population. The non-parametric nature of regression trees helps to avoid parametric assumptions on the complex interactions between event processes and features. Critical insights and advantages of Boost-R are investigated through comprehensive numerical examples. Datasets and computer code of Boost-R are made available on GitHub. To our best knowledge, Boost-R is the first gradient boosted additive-tree-based approach for modeling large-scale recurrent event data with both static and dynamic feature information.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Quality Technology 管理科学-工程：工业

CiteScore

5.20

自引率

4.00%

发文量

审稿时长

>12 weeks

期刊介绍： The objective of Journal of Quality Technology is to contribute to the technical advancement of the field of quality technology by publishing papers that emphasize the practical applicability of new techniques, instructive examples of the operation of existing techniques and results of historical researches. Expository, review, and tutorial papers are also acceptable if they are written in a style suitable for practicing engineers. Sample our Mathematics & Statistics journals, sign in here to start your FREE access for 14 days