Maria Pikoula, Jennifer K Quint, Constantinos Kallis, Albert Henry, Spiros Denaxas
{"title":"Identification of clinically meaningful, overlapping obstructive respiratory disease subtypes via data-driven approaches in a primary care population.","authors":"Maria Pikoula, Jennifer K Quint, Constantinos Kallis, Albert Henry, Spiros Denaxas","doi":"10.1186/s12890-025-03953-x","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Obstructive respiratory conditions, including asthma, bronchiectasis, and chronic obstructive pulmonary disease (COPD), are increasingly recognised as heterogeneous syndromes with significant overlap. Multiple disease pathways contribute to phenotypes that do not always align with textbook definitions, limiting the effectiveness of a one-size-fits-all approach. This study aims to identify, validate, and characterise clinically meaningful airway disease subtypes using electronic healthcare records (EHR) and unsupervised machine learning clustering techniques.</p><p><strong>Methods: </strong>We applied k-means clustering to 626,651 patients with a diagnosis of asthma, bronchiectasis, or COPD, using linked national structured EHRs in England. Twenty-one clinical features, including risk factors and comorbidities, were analysed, with dimensionality reduction via principal component and multiple correspondence analyses. Associations between cluster membership and exacerbations, as well as respiratory and cardiovascular mortality, were assessed. Over 3,696,962 person-years of follow-up, 102,522 deaths were recorded. Cluster stability was evaluated after five years, and genome-wide association studies (GWAS) were conducted to explore genetic associations with cluster membership.</p><p><strong>Results: </strong>Seven clusters were identified, each encompassing patients across traditional diagnostic labels. Distinct clinical patterns emerged as follows: (1) High BMI female predominant, (2) Older male-predominant with diabetes and cardiovascular disease, (3) Eosinophilic atopic, (4) Older non-comorbid, (5) Non-comorbid low BMI, (6) Neutrophilic smoker, (7) Anxious/depressed female-predominant.The cluster with cardiovascular comorbidities showed the highest rates of hospital admissions for exacerbations. Neutrophilic cluster 6 is a potential novel subtype marked by persistent neutrophilia and poor outcomes. Cluster stability over five years ranged from 38% to 78%. GWAS revealed significant genetic loci in a cluster enriched for allergic disease and eosinophilia, suggesting shared genetic mechanisms.</p><p><strong>Conclusions: </strong>This study provides a data-driven dissection of the heterogeneity underlying obstructive airway diseases in a large, real-world population. Unsupervised machine learning applied to national-scale EHR data revealed distinct and partially stable subtypes that transcend conventional diagnostic boundaries. These findings highlight the complexity and overlap of airway disease phenotypes and demonstrate the value of clustering approaches for uncovering clinically and biologically meaningful subgroups. This work lays the foundation for further exploration into mechanisms and prognosis within and across airway disease phenotypes.</p>","PeriodicalId":9148,"journal":{"name":"BMC Pulmonary Medicine","volume":"25 1","pages":"487"},"PeriodicalIF":2.8000,"publicationDate":"2025-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Pulmonary Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12890-025-03953-x","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"RESPIRATORY SYSTEM","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Obstructive respiratory conditions, including asthma, bronchiectasis, and chronic obstructive pulmonary disease (COPD), are increasingly recognised as heterogeneous syndromes with significant overlap. Multiple disease pathways contribute to phenotypes that do not always align with textbook definitions, limiting the effectiveness of a one-size-fits-all approach. This study aims to identify, validate, and characterise clinically meaningful airway disease subtypes using electronic healthcare records (EHR) and unsupervised machine learning clustering techniques.
Methods: We applied k-means clustering to 626,651 patients with a diagnosis of asthma, bronchiectasis, or COPD, using linked national structured EHRs in England. Twenty-one clinical features, including risk factors and comorbidities, were analysed, with dimensionality reduction via principal component and multiple correspondence analyses. Associations between cluster membership and exacerbations, as well as respiratory and cardiovascular mortality, were assessed. Over 3,696,962 person-years of follow-up, 102,522 deaths were recorded. Cluster stability was evaluated after five years, and genome-wide association studies (GWAS) were conducted to explore genetic associations with cluster membership.
Results: Seven clusters were identified, each encompassing patients across traditional diagnostic labels. Distinct clinical patterns emerged as follows: (1) High BMI female predominant, (2) Older male-predominant with diabetes and cardiovascular disease, (3) Eosinophilic atopic, (4) Older non-comorbid, (5) Non-comorbid low BMI, (6) Neutrophilic smoker, (7) Anxious/depressed female-predominant.The cluster with cardiovascular comorbidities showed the highest rates of hospital admissions for exacerbations. Neutrophilic cluster 6 is a potential novel subtype marked by persistent neutrophilia and poor outcomes. Cluster stability over five years ranged from 38% to 78%. GWAS revealed significant genetic loci in a cluster enriched for allergic disease and eosinophilia, suggesting shared genetic mechanisms.
Conclusions: This study provides a data-driven dissection of the heterogeneity underlying obstructive airway diseases in a large, real-world population. Unsupervised machine learning applied to national-scale EHR data revealed distinct and partially stable subtypes that transcend conventional diagnostic boundaries. These findings highlight the complexity and overlap of airway disease phenotypes and demonstrate the value of clustering approaches for uncovering clinically and biologically meaningful subgroups. This work lays the foundation for further exploration into mechanisms and prognosis within and across airway disease phenotypes.
期刊介绍:
BMC Pulmonary Medicine is an open access, peer-reviewed journal that considers articles on all aspects of the prevention, diagnosis and management of pulmonary and associated disorders, as well as related molecular genetics, pathophysiology, and epidemiology.