Clustering of recurrent events data.

IF 1.1 4区数学 Q2 STATISTICS & PROBABILITY

Journal of Applied Statistics Pub Date : 2025-01-28 eCollection Date: 2025-01-01 DOI:10.1080/02664763.2025.2452966

G Babykina, V Vandewalle, J Carretero-Bravo

{"title":"Clustering of recurrent events data.","authors":"G Babykina, V Vandewalle, J Carretero-Bravo","doi":"10.1080/02664763.2025.2452966","DOIUrl":null,"url":null,"abstract":"<p><p>Nowadays data are often timestamped, thus, when analysing the events which may occur several times (recurrent events), it is desirable to model the whole dynamics of the counting process rather than to focus on a total number of events. Such kind of data can be encountered in hospital readmissions, disease recurrences or repeated failures of industrial systems. Recurrent events can be analysed in the counting process framework, as in the Andersen-Gill model, assuming that the baseline intensity depends on time and on covariates, as in the Cox model. However, observed covariates are often insufficient to explain the observed heterogeneity in the data. We propose a mixture model for recurrent events, allowing to account for the unobserved heterogeneity and to perform clustering of individuals (unsupervised classification allowing to partition of the heterogeneous data according to unobserved, or latent, variables). Within each cluster, the recurrent event process intensity is specified parametrically and is adjusted for covariates. Model parameters are estimated by maximum likelihood using the EM algorithm; the BIC criterion is adopted to choose an optimal number of clusters. The model feasibility is checked on simulated data. Real data on hospital readmissions of elderly people, which motivated the development of the proposed clustering model, are analysed. The obtained results allow a fine understanding of the recurrent event process in each cluster.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 11","pages":"2031-2059"},"PeriodicalIF":1.1000,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12404095/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Applied Statistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1080/02664763.2025.2452966","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 0

Abstract

Nowadays data are often timestamped, thus, when analysing the events which may occur several times (recurrent events), it is desirable to model the whole dynamics of the counting process rather than to focus on a total number of events. Such kind of data can be encountered in hospital readmissions, disease recurrences or repeated failures of industrial systems. Recurrent events can be analysed in the counting process framework, as in the Andersen-Gill model, assuming that the baseline intensity depends on time and on covariates, as in the Cox model. However, observed covariates are often insufficient to explain the observed heterogeneity in the data. We propose a mixture model for recurrent events, allowing to account for the unobserved heterogeneity and to perform clustering of individuals (unsupervised classification allowing to partition of the heterogeneous data according to unobserved, or latent, variables). Within each cluster, the recurrent event process intensity is specified parametrically and is adjusted for covariates. Model parameters are estimated by maximum likelihood using the EM algorithm; the BIC criterion is adopted to choose an optimal number of clusters. The model feasibility is checked on simulated data. Real data on hospital readmissions of elderly people, which motivated the development of the proposed clustering model, are analysed. The obtained results allow a fine understanding of the recurrent event process in each cluster.

Abstract Image

查看原文本刊更多论文

重复事件数据的聚类。

现在的数据通常带有时间戳，因此，在分析可能发生多次的事件（循环事件）时，最好对计数过程的整个动态建模，而不是关注事件的总数。这类数据可能在医院再入院、疾病复发或工业系统的反复故障中遇到。重复事件可以在计数过程框架中进行分析，如在Andersen-Gill模型中，假设基线强度取决于时间和协变量，如在Cox模型中。然而，观察到的协变量往往不足以解释数据中观察到的异质性。我们提出了一个循环事件的混合模型，允许考虑未观察到的异质性，并对个体进行聚类（允许根据未观察到的或潜在的变量划分异构数据的无监督分类）。在每个集群中，循环事件过程强度被参数化地指定，并根据协变量进行调整。采用最大似然算法对模型参数进行估计；采用BIC准则选择最优簇数。仿真数据验证了模型的可行性。对老年人再入院的真实数据进行了分析，这些数据推动了所提出的聚类模型的发展。得到的结果可以很好地理解每个集群中的循环事件过程。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Applied Statistics 数学-统计学与概率论

CiteScore

3.40

自引率

0.00%

发文量

126

审稿时长

6 months

期刊介绍： Journal of Applied Statistics provides a forum for communication between both applied statisticians and users of applied statistical techniques across a wide range of disciplines. These areas include business, computing, economics, ecology, education, management, medicine, operational research and sociology, but papers from other areas are also considered. The editorial policy is to publish rigorous but clear and accessible papers on applied techniques. Purely theoretical papers are avoided but those on theoretical developments which clearly demonstrate significant applied potential are welcomed. Each paper is submitted to at least two independent referees.