Patrizia Ribino, Maria Mannone, Claudia Di Napoli, Giovanni Paragliola, Davide Chicco, Francesca Gasparini
{"title":"Temporal phenotyping and prognostic stratification of patients with sepsis through longitudinal clustering.","authors":"Patrizia Ribino, Maria Mannone, Claudia Di Napoli, Giovanni Paragliola, Davide Chicco, Francesca Gasparini","doi":"10.1186/s13040-025-00480-7","DOIUrl":null,"url":null,"abstract":"<p><p>Sepsis is a critical medical condition characterized by a highly variable and rapidly evolving clinical course, often necessitating early intervention and tailored treatment plans to improve patient outcomes. Due to its complexity and heterogeneity, understanding the progression of sepsis across different patient populations remains a significant challenge. In this study, we exploit a sophisticated analytical framework based on k-means multivariate longitudinal clustering to capture the diverse trajectories of sepsis. We do so by analyzing multiple clinical parameters tracked over time, providing a nuanced view of disease progression. By incorporating Dynamic Time Warping (DTW) as the distance metric, the proposed method effectively accounts for temporal misalignments and variability in the rate of disease progression, an essential capability given the unpredictable and heterogeneous nature of sepsis. This integration enhances the model's ability to detect distinct temporal patterns and phenotypic subgroups that may remain undetected using conventional analytical approaches. By leveraging sepsis-related electronic health records (EHRs), which provide rich time-series data on laboratory results along with patient demographics and underlying health conditions, the proposed method reveals distinct sepsis phenotypes that reflect variations in disease progression. We perform several experiments varying the number of clusters and clinical variable combinations, evaluating the clustering performances using Silhouette score, Caliski-Harabasz Index, and Davies-Bouldin Index, as reference quality metrics. Our results confirm the prognostic role of the Thrombin-Antigen complex and the Prothrombin Time-International Normalized Ratio for septic patients. Furthermore, to evaluate the relevance of subjects' stratification, the Adjusted Rand Index metric is used to quantify the survival prediction capability of our longitudinal clustering method, considering the 28-day death feature as the target variable. The same metric demonstrates that our proposal outperforms other longitudinal clustering algorithms available in the literature.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"64"},"PeriodicalIF":6.1000,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12465323/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biodata Mining","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13040-025-00480-7","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Sepsis is a critical medical condition characterized by a highly variable and rapidly evolving clinical course, often necessitating early intervention and tailored treatment plans to improve patient outcomes. Due to its complexity and heterogeneity, understanding the progression of sepsis across different patient populations remains a significant challenge. In this study, we exploit a sophisticated analytical framework based on k-means multivariate longitudinal clustering to capture the diverse trajectories of sepsis. We do so by analyzing multiple clinical parameters tracked over time, providing a nuanced view of disease progression. By incorporating Dynamic Time Warping (DTW) as the distance metric, the proposed method effectively accounts for temporal misalignments and variability in the rate of disease progression, an essential capability given the unpredictable and heterogeneous nature of sepsis. This integration enhances the model's ability to detect distinct temporal patterns and phenotypic subgroups that may remain undetected using conventional analytical approaches. By leveraging sepsis-related electronic health records (EHRs), which provide rich time-series data on laboratory results along with patient demographics and underlying health conditions, the proposed method reveals distinct sepsis phenotypes that reflect variations in disease progression. We perform several experiments varying the number of clusters and clinical variable combinations, evaluating the clustering performances using Silhouette score, Caliski-Harabasz Index, and Davies-Bouldin Index, as reference quality metrics. Our results confirm the prognostic role of the Thrombin-Antigen complex and the Prothrombin Time-International Normalized Ratio for septic patients. Furthermore, to evaluate the relevance of subjects' stratification, the Adjusted Rand Index metric is used to quantify the survival prediction capability of our longitudinal clustering method, considering the 28-day death feature as the target variable. The same metric demonstrates that our proposal outperforms other longitudinal clustering algorithms available in the literature.
期刊介绍:
BioData Mining is an open access, open peer-reviewed journal encompassing research on all aspects of data mining applied to high-dimensional biological and biomedical data, focusing on computational aspects of knowledge discovery from large-scale genetic, transcriptomic, genomic, proteomic, and metabolomic data.
Topical areas include, but are not limited to:
-Development, evaluation, and application of novel data mining and machine learning algorithms.
-Adaptation, evaluation, and application of traditional data mining and machine learning algorithms.
-Open-source software for the application of data mining and machine learning algorithms.
-Design, development and integration of databases, software and web services for the storage, management, retrieval, and analysis of data from large scale studies.
-Pre-processing, post-processing, modeling, and interpretation of data mining and machine learning results for biological interpretation and knowledge discovery.