{"title":"多生物标志物机器学习方法用于类风湿关节炎间质性肺疾病的早期预测。","authors":"Jiaojiao Xu, Wei Zhang, Weili Bai, Nannan Gai, Jing Li, Yunqi Bao","doi":"10.1186/s12890-025-03855-y","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Interstitial lung disease (ILD) is a severe complication affecting 10-30% of rheumatoid arthritis (RA) patients. Current diagnostic methods typically detect ILD only after substantial lung damage has occurred. This delay emphasizes the need for early detection strategies. This study aims to develop and validate machine learning models for early RA-ILD prediction and identify key predictive biomarkers.</p><p><strong>Methods: </strong>We conducted a cross-sectional study enrolling 149 RA patients (84 with ILD, 65 without ILD) between January 2020 and December 2023. We evaluated demographic characteristics, clinical parameters, and laboratory markers, including inflammatory indicators, hematological parameters, and specific biomarkers. We developed and compared four machine learning (ML) models (XGBoost, Random Forest, Support Vector Machine, and Logistic Regression) for ILD prediction capabilities.</p><p><strong>Results: </strong>The XGBoost model demonstrated superior predictive performance (AUC = 0.891, 95% CI: 0.847-0.935). Feature importance analysis identified Krebs von den Lungen-6 (KL-6) as the strongest predictor (importance score = 0.285), followed by interleukin-6 (IL-6) and cytokeratin 19 fragment (CYFRA21-1). The ILD group exhibited significantly elevated levels of inflammatory markers and specific biomarkers, particularly KL-6 (826.4 ± 458.2 vs. 285.6 ± 124.8 U/ml, P < 0.001), alongside distinct patterns in hematological parameters.</p><p><strong>Conclusion: </strong>Machine learning approaches, particularly XGBoost, demonstrate promising potential for early RA-ILD prediction. The integration of KL-6 and other identified biomarkers into clinical screening protocols may facilitate early detection and improved patient outcomes. These findings suggest that machine learning models could serve as valuable tools for risk stratification and early intervention in RA-ILD management, providing new approaches for individualized risk assessment in clinical practice.</p>","PeriodicalId":9148,"journal":{"name":"BMC Pulmonary Medicine","volume":"25 1","pages":"394"},"PeriodicalIF":2.8000,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12355783/pdf/","citationCount":"0","resultStr":"{\"title\":\"A multi-biomarker machine learning approach for early prediction of interstitial lung disease in rheumatoid arthritis.\",\"authors\":\"Jiaojiao Xu, Wei Zhang, Weili Bai, Nannan Gai, Jing Li, Yunqi Bao\",\"doi\":\"10.1186/s12890-025-03855-y\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Interstitial lung disease (ILD) is a severe complication affecting 10-30% of rheumatoid arthritis (RA) patients. Current diagnostic methods typically detect ILD only after substantial lung damage has occurred. This delay emphasizes the need for early detection strategies. This study aims to develop and validate machine learning models for early RA-ILD prediction and identify key predictive biomarkers.</p><p><strong>Methods: </strong>We conducted a cross-sectional study enrolling 149 RA patients (84 with ILD, 65 without ILD) between January 2020 and December 2023. We evaluated demographic characteristics, clinical parameters, and laboratory markers, including inflammatory indicators, hematological parameters, and specific biomarkers. We developed and compared four machine learning (ML) models (XGBoost, Random Forest, Support Vector Machine, and Logistic Regression) for ILD prediction capabilities.</p><p><strong>Results: </strong>The XGBoost model demonstrated superior predictive performance (AUC = 0.891, 95% CI: 0.847-0.935). Feature importance analysis identified Krebs von den Lungen-6 (KL-6) as the strongest predictor (importance score = 0.285), followed by interleukin-6 (IL-6) and cytokeratin 19 fragment (CYFRA21-1). The ILD group exhibited significantly elevated levels of inflammatory markers and specific biomarkers, particularly KL-6 (826.4 ± 458.2 vs. 285.6 ± 124.8 U/ml, P < 0.001), alongside distinct patterns in hematological parameters.</p><p><strong>Conclusion: </strong>Machine learning approaches, particularly XGBoost, demonstrate promising potential for early RA-ILD prediction. The integration of KL-6 and other identified biomarkers into clinical screening protocols may facilitate early detection and improved patient outcomes. These findings suggest that machine learning models could serve as valuable tools for risk stratification and early intervention in RA-ILD management, providing new approaches for individualized risk assessment in clinical practice.</p>\",\"PeriodicalId\":9148,\"journal\":{\"name\":\"BMC Pulmonary Medicine\",\"volume\":\"25 1\",\"pages\":\"394\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2025-08-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12355783/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Pulmonary Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s12890-025-03855-y\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"RESPIRATORY SYSTEM\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Pulmonary Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12890-025-03855-y","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"RESPIRATORY SYSTEM","Score":null,"Total":0}
引用次数: 0
摘要
背景:间质性肺疾病(ILD)是影响10-30%类风湿关节炎(RA)患者的严重并发症。目前的诊断方法通常仅在发生实质性肺损伤后才能检测到ILD。这种延迟强调了早期发现战略的必要性。本研究旨在开发和验证用于RA-ILD早期预测的机器学习模型,并确定关键的预测性生物标志物。方法:我们在2020年1月至2023年12月期间进行了一项横断面研究,纳入了149例RA患者(84例有ILD, 65例无ILD)。我们评估了人口统计学特征、临床参数和实验室标志物,包括炎症指标、血液学参数和特定生物标志物。我们开发并比较了四种机器学习(ML)模型(XGBoost、随机森林、支持向量机和逻辑回归)的ILD预测能力。结果:XGBoost模型具有较好的预测性能(AUC = 0.891, 95% CI: 0.847 ~ 0.935)。特征重要性分析发现,Krebs von den Lungen-6 (KL-6)是最强的预测因子(重要性评分= 0.285),其次是白细胞介素-6 (IL-6)和细胞角蛋白19片段(CYFRA21-1)。ILD组炎症标志物和特异性生物标志物水平显著升高,尤其是KL-6(826.4±458.2 vs. 285.6±124.8 U/ml)。结论:机器学习方法,特别是XGBoost,在RA-ILD早期预测中表现出很好的潜力。将KL-6和其他已确定的生物标志物整合到临床筛查方案中,可能有助于早期发现并改善患者预后。这些发现表明,机器学习模型可以作为RA-ILD管理中风险分层和早期干预的有价值工具,为临床实践中个性化风险评估提供新方法。
A multi-biomarker machine learning approach for early prediction of interstitial lung disease in rheumatoid arthritis.
Background: Interstitial lung disease (ILD) is a severe complication affecting 10-30% of rheumatoid arthritis (RA) patients. Current diagnostic methods typically detect ILD only after substantial lung damage has occurred. This delay emphasizes the need for early detection strategies. This study aims to develop and validate machine learning models for early RA-ILD prediction and identify key predictive biomarkers.
Methods: We conducted a cross-sectional study enrolling 149 RA patients (84 with ILD, 65 without ILD) between January 2020 and December 2023. We evaluated demographic characteristics, clinical parameters, and laboratory markers, including inflammatory indicators, hematological parameters, and specific biomarkers. We developed and compared four machine learning (ML) models (XGBoost, Random Forest, Support Vector Machine, and Logistic Regression) for ILD prediction capabilities.
Results: The XGBoost model demonstrated superior predictive performance (AUC = 0.891, 95% CI: 0.847-0.935). Feature importance analysis identified Krebs von den Lungen-6 (KL-6) as the strongest predictor (importance score = 0.285), followed by interleukin-6 (IL-6) and cytokeratin 19 fragment (CYFRA21-1). The ILD group exhibited significantly elevated levels of inflammatory markers and specific biomarkers, particularly KL-6 (826.4 ± 458.2 vs. 285.6 ± 124.8 U/ml, P < 0.001), alongside distinct patterns in hematological parameters.
Conclusion: Machine learning approaches, particularly XGBoost, demonstrate promising potential for early RA-ILD prediction. The integration of KL-6 and other identified biomarkers into clinical screening protocols may facilitate early detection and improved patient outcomes. These findings suggest that machine learning models could serve as valuable tools for risk stratification and early intervention in RA-ILD management, providing new approaches for individualized risk assessment in clinical practice.
期刊介绍:
BMC Pulmonary Medicine is an open access, peer-reviewed journal that considers articles on all aspects of the prevention, diagnosis and management of pulmonary and associated disorders, as well as related molecular genetics, pathophysiology, and epidemiology.