Change-point detection in regression models via the max-EM algorithm

IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Modibo Diabaté , Grégory Nuel , Olivier Bouaziz
{"title":"Change-point detection in regression models via the max-EM algorithm","authors":"Modibo Diabaté ,&nbsp;Grégory Nuel ,&nbsp;Olivier Bouaziz","doi":"10.1016/j.csda.2025.108278","DOIUrl":null,"url":null,"abstract":"<div><div>The problem of breakpoint detection is considered within a regression modeling framework. A novel method, the max-EM algorithm, is introduced, combining a constrained Hidden Markov Model with the Classification-EM algorithm. This algorithm has linear complexity and provides accurate detection of breakpoints and estimation of parameters. A theoretical result is derived, showing that the likelihood of the data, as a function of the regression parameters and the breakpoints location, increases at each step of the algorithm. Two initialization methods for the breakpoints location are also presented to address local maxima issues. Finally, a statistical test in the one breakpoint situation is developed. Simulation experiments based on linear, logistic, Poisson and Accelerated Failure Time regression models show that the final method that includes the initialization procedure and the max-EM algorithm has a strong performance both in terms of parameters estimation and breakpoints detection. The statistical test is also evaluated and exhibits a correct rejection rate under the null hypothesis and a strong power under various alternatives. Two real dataset are analyzed, the UCI bike sharing and the health disease data, where the interest of the method to detect heterogeneity in the distribution of the data is illustrated.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"215 ","pages":"Article 108278"},"PeriodicalIF":1.6000,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Statistics & Data Analysis","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167947325001549","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

The problem of breakpoint detection is considered within a regression modeling framework. A novel method, the max-EM algorithm, is introduced, combining a constrained Hidden Markov Model with the Classification-EM algorithm. This algorithm has linear complexity and provides accurate detection of breakpoints and estimation of parameters. A theoretical result is derived, showing that the likelihood of the data, as a function of the regression parameters and the breakpoints location, increases at each step of the algorithm. Two initialization methods for the breakpoints location are also presented to address local maxima issues. Finally, a statistical test in the one breakpoint situation is developed. Simulation experiments based on linear, logistic, Poisson and Accelerated Failure Time regression models show that the final method that includes the initialization procedure and the max-EM algorithm has a strong performance both in terms of parameters estimation and breakpoints detection. The statistical test is also evaluated and exhibits a correct rejection rate under the null hypothesis and a strong power under various alternatives. Two real dataset are analyzed, the UCI bike sharing and the health disease data, where the interest of the method to detect heterogeneity in the distribution of the data is illustrated.
基于max-EM算法的回归模型变点检测
断点检测问题是在回归建模框架内考虑的。将约束隐马尔可夫模型与分类- em算法相结合,提出了一种新的分类- em算法。该算法具有线性复杂度,并提供了准确的断点检测和参数估计。理论结果表明,作为回归参数和断点位置的函数,数据的似然在算法的每一步都增加。为解决局部最大值问题,还提出了断点位置的两种初始化方法。最后,给出了单断点情况下的统计检验方法。基于线性、逻辑、泊松和加速失效时间回归模型的仿真实验表明,包含初始化过程和max-EM算法的最终方法在参数估计和断点检测方面都具有较强的性能。对统计检验也进行了评估,并在零假设下显示出正确的拒绝率,在各种替代方案下显示出强大的力量。分析了两个真实数据集,UCI自行车共享数据和健康疾病数据,其中说明了该方法检测数据分布异质性的兴趣。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computational Statistics & Data Analysis
Computational Statistics & Data Analysis 数学-计算机:跨学科应用
CiteScore
3.70
自引率
5.60%
发文量
167
审稿时长
60 days
期刊介绍: Computational Statistics and Data Analysis (CSDA), an Official Publication of the network Computational and Methodological Statistics (CMStatistics) and of the International Association for Statistical Computing (IASC), is an international journal dedicated to the dissemination of methodological research and applications in the areas of computational statistics and data analysis. The journal consists of four refereed sections which are divided into the following subject areas: I) Computational Statistics - Manuscripts dealing with: 1) the explicit impact of computers on statistical methodology (e.g., Bayesian computing, bioinformatics,computer graphics, computer intensive inferential methods, data exploration, data mining, expert systems, heuristics, knowledge based systems, machine learning, neural networks, numerical and optimization methods, parallel computing, statistical databases, statistical systems), and 2) the development, evaluation and validation of statistical software and algorithms. Software and algorithms can be submitted with manuscripts and will be stored together with the online article. II) Statistical Methodology for Data Analysis - Manuscripts dealing with novel and original data analytical strategies and methodologies applied in biostatistics (design and analytic methods for clinical trials, epidemiological studies, statistical genetics, or genetic/environmental interactions), chemometrics, classification, data exploration, density estimation, design of experiments, environmetrics, education, image analysis, marketing, model free data exploration, pattern recognition, psychometrics, statistical physics, image processing, robust procedures. [...] III) Special Applications - [...] IV) Annals of Statistical Data Science [...]
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信