Richard John Woodman, Kimberly Bryant, Michael J. Sorich, Campbell H. Thompson, Patrick Russell, Alberto Pilotto, Aleksander A. Mangoni
{"title":"Phenotyping to predict 12-month health outcomes of older general medicine patients","authors":"Richard John Woodman, Kimberly Bryant, Michael J. Sorich, Campbell H. Thompson, Patrick Russell, Alberto Pilotto, Aleksander A. Mangoni","doi":"10.1007/s40520-024-02924-2","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><p>A variety of unsupervised learning algorithms have been used to phenotype older patients, enabling directed care and personalised treatment plans. However, the ability of the clusters to accurately discriminate for the risk of older patients, may vary depending on the methods employed.</p><h3>Aims</h3><p>To compare seven clustering algorithms in their ability to develop patient phenotypes that accurately predict health outcomes.</p><h3>Methods</h3><p>Data was collected for <i>N</i> = 737 older medical inpatients during their hospital stay for five different types of medical data (ICD-10 codes, ATC drug codes, laboratory, clinic and frailty data). We trialled five unsupervised learning algorithms (K-means, K-modes, hierarchical clustering, latent class analysis (LCA), and DBSCAN) and two graph-based approaches to create separate clusters for each method and datatype. These were used as input for a random forest classifier to predict eleven health outcomes: mortality at one, three, six and 12 months, in-hospital falls and delirium, length-of-stay, outpatient visits, and readmissions at one, three and six months.</p><h3>Results</h3><p>The overall median area-under-the-curve (AUC) across the eleven outcomes for the seven methods were (from highest to lowest) 0.758 (hierarchical), 0.739 (K-means), 0.722 (KG-Louvain), 0.704 (KNN-Louvain), 0.698 (LCA), 0.694 (DBSCAN) and 0.656 (K-modes). Overall, frailty data was most important data type for predicting mortality, ICD-10 disease codes for predicting readmissions, and laboratory data the most important for predicting falls.</p><h3>Conclusions</h3><p>Clusters created using hierarchical, K-means and Louvain community detection algorithms identified well-separated patient phenotypes that were consistently associated with age-related adverse health outcomes. Frailty data was the most valuable data type for predicting most health outcomes.</p></div>","PeriodicalId":7720,"journal":{"name":"Aging Clinical and Experimental Research","volume":"37 1","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2025-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s40520-024-02924-2.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Aging Clinical and Experimental Research","FirstCategoryId":"3","ListUrlMain":"https://link.springer.com/article/10.1007/s40520-024-02924-2","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GERIATRICS & GERONTOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background
A variety of unsupervised learning algorithms have been used to phenotype older patients, enabling directed care and personalised treatment plans. However, the ability of the clusters to accurately discriminate for the risk of older patients, may vary depending on the methods employed.
Aims
To compare seven clustering algorithms in their ability to develop patient phenotypes that accurately predict health outcomes.
Methods
Data was collected for N = 737 older medical inpatients during their hospital stay for five different types of medical data (ICD-10 codes, ATC drug codes, laboratory, clinic and frailty data). We trialled five unsupervised learning algorithms (K-means, K-modes, hierarchical clustering, latent class analysis (LCA), and DBSCAN) and two graph-based approaches to create separate clusters for each method and datatype. These were used as input for a random forest classifier to predict eleven health outcomes: mortality at one, three, six and 12 months, in-hospital falls and delirium, length-of-stay, outpatient visits, and readmissions at one, three and six months.
Results
The overall median area-under-the-curve (AUC) across the eleven outcomes for the seven methods were (from highest to lowest) 0.758 (hierarchical), 0.739 (K-means), 0.722 (KG-Louvain), 0.704 (KNN-Louvain), 0.698 (LCA), 0.694 (DBSCAN) and 0.656 (K-modes). Overall, frailty data was most important data type for predicting mortality, ICD-10 disease codes for predicting readmissions, and laboratory data the most important for predicting falls.
Conclusions
Clusters created using hierarchical, K-means and Louvain community detection algorithms identified well-separated patient phenotypes that were consistently associated with age-related adverse health outcomes. Frailty data was the most valuable data type for predicting most health outcomes.
期刊介绍:
Aging clinical and experimental research offers a multidisciplinary forum on the progressing field of gerontology and geriatrics. The areas covered by the journal include: biogerontology, neurosciences, epidemiology, clinical gerontology and geriatric assessment, social, economical and behavioral gerontology. “Aging clinical and experimental research” appears bimonthly and publishes review articles, original papers and case reports.