Identification of clinically meaningful, overlapping obstructive respiratory disease subtypes via data-driven approaches in a primary care population.

IF 2.8 3区 医学 Q2 RESPIRATORY SYSTEM
Maria Pikoula, Jennifer K Quint, Constantinos Kallis, Albert Henry, Spiros Denaxas
{"title":"Identification of clinically meaningful, overlapping obstructive respiratory disease subtypes via data-driven approaches in a primary care population.","authors":"Maria Pikoula, Jennifer K Quint, Constantinos Kallis, Albert Henry, Spiros Denaxas","doi":"10.1186/s12890-025-03953-x","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Obstructive respiratory conditions, including asthma, bronchiectasis, and chronic obstructive pulmonary disease (COPD), are increasingly recognised as heterogeneous syndromes with significant overlap. Multiple disease pathways contribute to phenotypes that do not always align with textbook definitions, limiting the effectiveness of a one-size-fits-all approach. This study aims to identify, validate, and characterise clinically meaningful airway disease subtypes using electronic healthcare records (EHR) and unsupervised machine learning clustering techniques.</p><p><strong>Methods: </strong>We applied k-means clustering to 626,651 patients with a diagnosis of asthma, bronchiectasis, or COPD, using linked national structured EHRs in England. Twenty-one clinical features, including risk factors and comorbidities, were analysed, with dimensionality reduction via principal component and multiple correspondence analyses. Associations between cluster membership and exacerbations, as well as respiratory and cardiovascular mortality, were assessed. Over 3,696,962 person-years of follow-up, 102,522 deaths were recorded. Cluster stability was evaluated after five years, and genome-wide association studies (GWAS) were conducted to explore genetic associations with cluster membership.</p><p><strong>Results: </strong>Seven clusters were identified, each encompassing patients across traditional diagnostic labels. Distinct clinical patterns emerged as follows: (1) High BMI female predominant, (2) Older male-predominant with diabetes and cardiovascular disease, (3) Eosinophilic atopic, (4) Older non-comorbid, (5) Non-comorbid low BMI, (6) Neutrophilic smoker, (7) Anxious/depressed female-predominant.The cluster with cardiovascular comorbidities showed the highest rates of hospital admissions for exacerbations. Neutrophilic cluster 6 is a potential novel subtype marked by persistent neutrophilia and poor outcomes. Cluster stability over five years ranged from 38% to 78%. GWAS revealed significant genetic loci in a cluster enriched for allergic disease and eosinophilia, suggesting shared genetic mechanisms.</p><p><strong>Conclusions: </strong>This study provides a data-driven dissection of the heterogeneity underlying obstructive airway diseases in a large, real-world population. Unsupervised machine learning applied to national-scale EHR data revealed distinct and partially stable subtypes that transcend conventional diagnostic boundaries. These findings highlight the complexity and overlap of airway disease phenotypes and demonstrate the value of clustering approaches for uncovering clinically and biologically meaningful subgroups. This work lays the foundation for further exploration into mechanisms and prognosis within and across airway disease phenotypes.</p>","PeriodicalId":9148,"journal":{"name":"BMC Pulmonary Medicine","volume":"25 1","pages":"487"},"PeriodicalIF":2.8000,"publicationDate":"2025-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Pulmonary Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12890-025-03953-x","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"RESPIRATORY SYSTEM","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Obstructive respiratory conditions, including asthma, bronchiectasis, and chronic obstructive pulmonary disease (COPD), are increasingly recognised as heterogeneous syndromes with significant overlap. Multiple disease pathways contribute to phenotypes that do not always align with textbook definitions, limiting the effectiveness of a one-size-fits-all approach. This study aims to identify, validate, and characterise clinically meaningful airway disease subtypes using electronic healthcare records (EHR) and unsupervised machine learning clustering techniques.

Methods: We applied k-means clustering to 626,651 patients with a diagnosis of asthma, bronchiectasis, or COPD, using linked national structured EHRs in England. Twenty-one clinical features, including risk factors and comorbidities, were analysed, with dimensionality reduction via principal component and multiple correspondence analyses. Associations between cluster membership and exacerbations, as well as respiratory and cardiovascular mortality, were assessed. Over 3,696,962 person-years of follow-up, 102,522 deaths were recorded. Cluster stability was evaluated after five years, and genome-wide association studies (GWAS) were conducted to explore genetic associations with cluster membership.

Results: Seven clusters were identified, each encompassing patients across traditional diagnostic labels. Distinct clinical patterns emerged as follows: (1) High BMI female predominant, (2) Older male-predominant with diabetes and cardiovascular disease, (3) Eosinophilic atopic, (4) Older non-comorbid, (5) Non-comorbid low BMI, (6) Neutrophilic smoker, (7) Anxious/depressed female-predominant.The cluster with cardiovascular comorbidities showed the highest rates of hospital admissions for exacerbations. Neutrophilic cluster 6 is a potential novel subtype marked by persistent neutrophilia and poor outcomes. Cluster stability over five years ranged from 38% to 78%. GWAS revealed significant genetic loci in a cluster enriched for allergic disease and eosinophilia, suggesting shared genetic mechanisms.

Conclusions: This study provides a data-driven dissection of the heterogeneity underlying obstructive airway diseases in a large, real-world population. Unsupervised machine learning applied to national-scale EHR data revealed distinct and partially stable subtypes that transcend conventional diagnostic boundaries. These findings highlight the complexity and overlap of airway disease phenotypes and demonstrate the value of clustering approaches for uncovering clinically and biologically meaningful subgroups. This work lays the foundation for further exploration into mechanisms and prognosis within and across airway disease phenotypes.

通过数据驱动的方法在初级保健人群中识别具有临床意义的重叠阻塞性呼吸系统疾病亚型
背景:阻塞性呼吸系统疾病,包括哮喘、支气管扩张和慢性阻塞性肺疾病(COPD),越来越多地被认为是具有显著重叠的异质性综合征。多种疾病途径导致的表型并不总是与教科书定义一致,限制了一刀切方法的有效性。本研究旨在使用电子医疗记录(EHR)和无监督机器学习聚类技术识别、验证和表征临床有意义的气道疾病亚型。方法:我们对626,651例诊断为哮喘、支气管扩张或慢性阻塞性肺病的患者应用k-均值聚类,使用英国相关的国家结构化电子病历。分析了21个临床特征,包括危险因素和合并症,并通过主成分分析和多重对应分析进行了降维。评估了集群成员与急性发作以及呼吸和心血管死亡率之间的关系。在3,696,962人-年的随访中,记录了102,522例死亡。5年后对聚类稳定性进行评估,并进行全基因组关联研究(GWAS)以探索聚类成员的遗传关联。结果:确定了七个集群,每个集群包括传统诊断标签的患者。临床表现为:(1)女性以高BMI为主,(2)老年男性以糖尿病和心血管疾病为主,(3)嗜酸性粒细胞特应性,(4)老年无合并症,(5)无合并症低BMI,(6)嗜中性粒细胞吸烟者,(7)女性以焦虑/抑郁为主。有心血管合并症的群集因急性发作住院率最高。嗜中性粒细胞簇6是一种潜在的新型亚型,其特征是持续的嗜中性粒细胞和不良的预后。5年的集群稳定性从38%到78%不等。GWAS在一个富含过敏性疾病和嗜酸性粒细胞增多的基因簇中发现了显著的遗传位点,提示有共同的遗传机制。结论:这项研究提供了一个数据驱动的分析,揭示了现实世界中大量人群中阻塞性气道疾病的异质性。应用于全国范围的电子病历数据的无监督机器学习揭示了超越传统诊断界限的独特且部分稳定的亚型。这些发现突出了气道疾病表型的复杂性和重叠性,并证明了聚类方法在发现临床和生物学上有意义的亚群方面的价值。这项工作为进一步探索气道疾病表型的机制和预后奠定了基础。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
BMC Pulmonary Medicine
BMC Pulmonary Medicine RESPIRATORY SYSTEM-
CiteScore
4.40
自引率
3.20%
发文量
423
审稿时长
6-12 weeks
期刊介绍: BMC Pulmonary Medicine is an open access, peer-reviewed journal that considers articles on all aspects of the prevention, diagnosis and management of pulmonary and associated disorders, as well as related molecular genetics, pathophysiology, and epidemiology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信