Andres Quintero, Javier Lopez-Molina, Merina Su, Patrick Long, Nicola Boulter, Cindy Weber, Ralica Dimitrova
{"title":"Identifying and characterising asthma subgroups at high risk of severe exacerbations using machine learning and longitudinal real-world data.","authors":"Andres Quintero, Javier Lopez-Molina, Merina Su, Patrick Long, Nicola Boulter, Cindy Weber, Ralica Dimitrova","doi":"10.1136/bmjhci-2024-101282","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>To identify and characterise distinct subgroups of patients with asthma with severe acute exacerbations (AEs) by using a multistep clustering methodology that combines supervised and unsupervised machine learning.</p><p><strong>Methods: </strong>This cohort study used anonymised, all-payer medical and prescription US claim data from October 2015 to May 2022. First, gradient-boosted decision trees were trained to predict AE in 4 132 973 patients with asthma, of whom 86 735 experienced AE. This model was applied to a holdout set of 86 434 patients with asthma with AE to derive SHapley Additive exPlanations (SHAP) values. SHAP values were then subjected to non-linear dimensionality reduction and density-based clustering to identify distinct subgroups among these patients. These subgroups were described using key clinical and demographic characteristics.</p><p><strong>Results: </strong>Clustering identified five distinct subgroups of patients with asthma with AE, broadly differentiated by histories of acute care encounters, healthcare utilisation, AE treatments, coded asthma severity, specialist encounters, first-hand tobacco exposure, mood disorders and patient demographics. Notably, there was considerable between-cluster variability in the predicted likelihood of AE, with some subgroups comprised of patients who posed a challenge for the predictive model and would have been missed with predictive modelling alone.</p><p><strong>Discussion: </strong>By identifying distinct subgroups among patients with asthma experiencing AE, this study highlights the heterogeneity within this population and emphasises the need for more personalised management of AE.</p><p><strong>Conclusion: </strong>Applying predictive modelling and clustering to real-world data can help identify discrete phenotypes of patients and offer an important source of information for developing risk assessment and mitigation efforts.</p>","PeriodicalId":9050,"journal":{"name":"BMJ Health & Care Informatics","volume":"32 1","pages":""},"PeriodicalIF":4.4000,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12232441/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMJ Health & Care Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1136/bmjhci-2024-101282","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Objectives: To identify and characterise distinct subgroups of patients with asthma with severe acute exacerbations (AEs) by using a multistep clustering methodology that combines supervised and unsupervised machine learning.
Methods: This cohort study used anonymised, all-payer medical and prescription US claim data from October 2015 to May 2022. First, gradient-boosted decision trees were trained to predict AE in 4 132 973 patients with asthma, of whom 86 735 experienced AE. This model was applied to a holdout set of 86 434 patients with asthma with AE to derive SHapley Additive exPlanations (SHAP) values. SHAP values were then subjected to non-linear dimensionality reduction and density-based clustering to identify distinct subgroups among these patients. These subgroups were described using key clinical and demographic characteristics.
Results: Clustering identified five distinct subgroups of patients with asthma with AE, broadly differentiated by histories of acute care encounters, healthcare utilisation, AE treatments, coded asthma severity, specialist encounters, first-hand tobacco exposure, mood disorders and patient demographics. Notably, there was considerable between-cluster variability in the predicted likelihood of AE, with some subgroups comprised of patients who posed a challenge for the predictive model and would have been missed with predictive modelling alone.
Discussion: By identifying distinct subgroups among patients with asthma experiencing AE, this study highlights the heterogeneity within this population and emphasises the need for more personalised management of AE.
Conclusion: Applying predictive modelling and clustering to real-world data can help identify discrete phenotypes of patients and offer an important source of information for developing risk assessment and mitigation efforts.