Temporal phenotyping and prognostic stratification of patients with sepsis through longitudinal clustering.

IF 6.1 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining Pub Date : 2025-09-26 DOI:10.1186/s13040-025-00480-7

Patrizia Ribino, Maria Mannone, Claudia Di Napoli, Giovanni Paragliola, Davide Chicco, Francesca Gasparini

{"title":"Temporal phenotyping and prognostic stratification of patients with sepsis through longitudinal clustering.","authors":"Patrizia Ribino, Maria Mannone, Claudia Di Napoli, Giovanni Paragliola, Davide Chicco, Francesca Gasparini","doi":"10.1186/s13040-025-00480-7","DOIUrl":null,"url":null,"abstract":"<p><p>Sepsis is a critical medical condition characterized by a highly variable and rapidly evolving clinical course, often necessitating early intervention and tailored treatment plans to improve patient outcomes. Due to its complexity and heterogeneity, understanding the progression of sepsis across different patient populations remains a significant challenge. In this study, we exploit a sophisticated analytical framework based on k-means multivariate longitudinal clustering to capture the diverse trajectories of sepsis. We do so by analyzing multiple clinical parameters tracked over time, providing a nuanced view of disease progression. By incorporating Dynamic Time Warping (DTW) as the distance metric, the proposed method effectively accounts for temporal misalignments and variability in the rate of disease progression, an essential capability given the unpredictable and heterogeneous nature of sepsis. This integration enhances the model's ability to detect distinct temporal patterns and phenotypic subgroups that may remain undetected using conventional analytical approaches. By leveraging sepsis-related electronic health records (EHRs), which provide rich time-series data on laboratory results along with patient demographics and underlying health conditions, the proposed method reveals distinct sepsis phenotypes that reflect variations in disease progression. We perform several experiments varying the number of clusters and clinical variable combinations, evaluating the clustering performances using Silhouette score, Caliski-Harabasz Index, and Davies-Bouldin Index, as reference quality metrics. Our results confirm the prognostic role of the Thrombin-Antigen complex and the Prothrombin Time-International Normalized Ratio for septic patients. Furthermore, to evaluate the relevance of subjects' stratification, the Adjusted Rand Index metric is used to quantify the survival prediction capability of our longitudinal clustering method, considering the 28-day death feature as the target variable. The same metric demonstrates that our proposal outperforms other longitudinal clustering algorithms available in the literature.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"64"},"PeriodicalIF":6.1000,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12465323/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biodata Mining","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13040-025-00480-7","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Sepsis is a critical medical condition characterized by a highly variable and rapidly evolving clinical course, often necessitating early intervention and tailored treatment plans to improve patient outcomes. Due to its complexity and heterogeneity, understanding the progression of sepsis across different patient populations remains a significant challenge. In this study, we exploit a sophisticated analytical framework based on k-means multivariate longitudinal clustering to capture the diverse trajectories of sepsis. We do so by analyzing multiple clinical parameters tracked over time, providing a nuanced view of disease progression. By incorporating Dynamic Time Warping (DTW) as the distance metric, the proposed method effectively accounts for temporal misalignments and variability in the rate of disease progression, an essential capability given the unpredictable and heterogeneous nature of sepsis. This integration enhances the model's ability to detect distinct temporal patterns and phenotypic subgroups that may remain undetected using conventional analytical approaches. By leveraging sepsis-related electronic health records (EHRs), which provide rich time-series data on laboratory results along with patient demographics and underlying health conditions, the proposed method reveals distinct sepsis phenotypes that reflect variations in disease progression. We perform several experiments varying the number of clusters and clinical variable combinations, evaluating the clustering performances using Silhouette score, Caliski-Harabasz Index, and Davies-Bouldin Index, as reference quality metrics. Our results confirm the prognostic role of the Thrombin-Antigen complex and the Prothrombin Time-International Normalized Ratio for septic patients. Furthermore, to evaluate the relevance of subjects' stratification, the Adjusted Rand Index metric is used to quantify the survival prediction capability of our longitudinal clustering method, considering the 28-day death feature as the target variable. The same metric demonstrates that our proposal outperforms other longitudinal clustering algorithms available in the literature.

查看原文本刊更多论文

通过纵向聚类分析脓毒症患者的时间表型和预后分层。

脓毒症是一种严重的疾病，其特点是具有高度可变和快速发展的临床过程，通常需要早期干预和量身定制的治疗计划来改善患者的预后。由于其复杂性和异质性，了解脓毒症在不同患者群体中的进展仍然是一个重大挑战。在这项研究中，我们利用基于k-means多元纵向聚类的复杂分析框架来捕捉脓毒症的不同轨迹。我们通过分析长期跟踪的多个临床参数来做到这一点，提供了疾病进展的细致入微的观点。通过将动态时间扭曲（DTW）作为距离度量，所提出的方法有效地解释了疾病进展率的时间偏差和可变性，这是考虑到败血症不可预测和异质性的基本能力。这种整合增强了模型检测不同时间模式和表型亚组的能力，而这些可能是传统分析方法无法检测到的。通过利用与败血症相关的电子健康记录（EHRs），提供丰富的实验室结果时间序列数据以及患者人口统计和潜在的健康状况，所提出的方法揭示了反映疾病进展变化的不同败血症表型。我们进行了几个实验，改变了聚类的数量和临床变量的组合，使用Silhouette评分、Caliski-Harabasz指数和Davies-Bouldin指数作为参考质量指标来评估聚类的性能。我们的研究结果证实了凝血酶-抗原复合物和凝血酶原时间-国际标准化比率对脓毒症患者的预后作用。此外，为了评估受试者分层的相关性，考虑28天死亡特征作为目标变量，使用调整后的Rand指数度量来量化纵向聚类方法的生存预测能力。同样的度量表明，我们的建议优于文献中可用的其他纵向聚类算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Biodata Mining MATHEMATICAL & COMPUTATIONAL BIOLOGY-

CiteScore

7.90

自引率

0.00%

发文量

审稿时长

23 weeks

期刊介绍： BioData Mining is an open access, open peer-reviewed journal encompassing research on all aspects of data mining applied to high-dimensional biological and biomedical data, focusing on computational aspects of knowledge discovery from large-scale genetic, transcriptomic, genomic, proteomic, and metabolomic data. Topical areas include, but are not limited to: -Development, evaluation, and application of novel data mining and machine learning algorithms. -Adaptation, evaluation, and application of traditional data mining and machine learning algorithms. -Open-source software for the application of data mining and machine learning algorithms. -Design, development and integration of databases, software and web services for the storage, management, retrieval, and analysis of data from large scale studies. -Pre-processing, post-processing, modeling, and interpretation of data mining and machine learning results for biological interpretation and knowledge discovery.