Andres Quintero, Javier Lopez-Molina, Merina Su, Patrick Long, Nicola Boulter, Cindy Weber, Ralica Dimitrova
{"title":"使用机器学习和纵向真实世界数据识别和描述严重恶化高风险哮喘亚组。","authors":"Andres Quintero, Javier Lopez-Molina, Merina Su, Patrick Long, Nicola Boulter, Cindy Weber, Ralica Dimitrova","doi":"10.1136/bmjhci-2024-101282","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>To identify and characterise distinct subgroups of patients with asthma with severe acute exacerbations (AEs) by using a multistep clustering methodology that combines supervised and unsupervised machine learning.</p><p><strong>Methods: </strong>This cohort study used anonymised, all-payer medical and prescription US claim data from October 2015 to May 2022. First, gradient-boosted decision trees were trained to predict AE in 4 132 973 patients with asthma, of whom 86 735 experienced AE. This model was applied to a holdout set of 86 434 patients with asthma with AE to derive SHapley Additive exPlanations (SHAP) values. SHAP values were then subjected to non-linear dimensionality reduction and density-based clustering to identify distinct subgroups among these patients. These subgroups were described using key clinical and demographic characteristics.</p><p><strong>Results: </strong>Clustering identified five distinct subgroups of patients with asthma with AE, broadly differentiated by histories of acute care encounters, healthcare utilisation, AE treatments, coded asthma severity, specialist encounters, first-hand tobacco exposure, mood disorders and patient demographics. Notably, there was considerable between-cluster variability in the predicted likelihood of AE, with some subgroups comprised of patients who posed a challenge for the predictive model and would have been missed with predictive modelling alone.</p><p><strong>Discussion: </strong>By identifying distinct subgroups among patients with asthma experiencing AE, this study highlights the heterogeneity within this population and emphasises the need for more personalised management of AE.</p><p><strong>Conclusion: </strong>Applying predictive modelling and clustering to real-world data can help identify discrete phenotypes of patients and offer an important source of information for developing risk assessment and mitigation efforts.</p>","PeriodicalId":9050,"journal":{"name":"BMJ Health & Care Informatics","volume":"32 1","pages":""},"PeriodicalIF":4.4000,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12232441/pdf/","citationCount":"0","resultStr":"{\"title\":\"Identifying and characterising asthma subgroups at high risk of severe exacerbations using machine learning and longitudinal real-world data.\",\"authors\":\"Andres Quintero, Javier Lopez-Molina, Merina Su, Patrick Long, Nicola Boulter, Cindy Weber, Ralica Dimitrova\",\"doi\":\"10.1136/bmjhci-2024-101282\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objectives: </strong>To identify and characterise distinct subgroups of patients with asthma with severe acute exacerbations (AEs) by using a multistep clustering methodology that combines supervised and unsupervised machine learning.</p><p><strong>Methods: </strong>This cohort study used anonymised, all-payer medical and prescription US claim data from October 2015 to May 2022. First, gradient-boosted decision trees were trained to predict AE in 4 132 973 patients with asthma, of whom 86 735 experienced AE. This model was applied to a holdout set of 86 434 patients with asthma with AE to derive SHapley Additive exPlanations (SHAP) values. SHAP values were then subjected to non-linear dimensionality reduction and density-based clustering to identify distinct subgroups among these patients. These subgroups were described using key clinical and demographic characteristics.</p><p><strong>Results: </strong>Clustering identified five distinct subgroups of patients with asthma with AE, broadly differentiated by histories of acute care encounters, healthcare utilisation, AE treatments, coded asthma severity, specialist encounters, first-hand tobacco exposure, mood disorders and patient demographics. Notably, there was considerable between-cluster variability in the predicted likelihood of AE, with some subgroups comprised of patients who posed a challenge for the predictive model and would have been missed with predictive modelling alone.</p><p><strong>Discussion: </strong>By identifying distinct subgroups among patients with asthma experiencing AE, this study highlights the heterogeneity within this population and emphasises the need for more personalised management of AE.</p><p><strong>Conclusion: </strong>Applying predictive modelling and clustering to real-world data can help identify discrete phenotypes of patients and offer an important source of information for developing risk assessment and mitigation efforts.</p>\",\"PeriodicalId\":9050,\"journal\":{\"name\":\"BMJ Health & Care Informatics\",\"volume\":\"32 1\",\"pages\":\"\"},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2025-07-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12232441/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMJ Health & Care Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1136/bmjhci-2024-101282\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMJ Health & Care Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1136/bmjhci-2024-101282","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
Identifying and characterising asthma subgroups at high risk of severe exacerbations using machine learning and longitudinal real-world data.
Objectives: To identify and characterise distinct subgroups of patients with asthma with severe acute exacerbations (AEs) by using a multistep clustering methodology that combines supervised and unsupervised machine learning.
Methods: This cohort study used anonymised, all-payer medical and prescription US claim data from October 2015 to May 2022. First, gradient-boosted decision trees were trained to predict AE in 4 132 973 patients with asthma, of whom 86 735 experienced AE. This model was applied to a holdout set of 86 434 patients with asthma with AE to derive SHapley Additive exPlanations (SHAP) values. SHAP values were then subjected to non-linear dimensionality reduction and density-based clustering to identify distinct subgroups among these patients. These subgroups were described using key clinical and demographic characteristics.
Results: Clustering identified five distinct subgroups of patients with asthma with AE, broadly differentiated by histories of acute care encounters, healthcare utilisation, AE treatments, coded asthma severity, specialist encounters, first-hand tobacco exposure, mood disorders and patient demographics. Notably, there was considerable between-cluster variability in the predicted likelihood of AE, with some subgroups comprised of patients who posed a challenge for the predictive model and would have been missed with predictive modelling alone.
Discussion: By identifying distinct subgroups among patients with asthma experiencing AE, this study highlights the heterogeneity within this population and emphasises the need for more personalised management of AE.
Conclusion: Applying predictive modelling and clustering to real-world data can help identify discrete phenotypes of patients and offer an important source of information for developing risk assessment and mitigation efforts.