Javier Muñoz MD, PhD , Rocío Ruíz-Cacho MD , Nerio José Fernández-Araujo MD , Alberto Candela MD , Lourdes Carmen Visedo MD , Javier Muñoz-Visedo Math, BsC
{"title":"Systematic review and meta-analysis of artificial intelligence models for diagnosing and subphenotyping ARDS in adults","authors":"Javier Muñoz MD, PhD , Rocío Ruíz-Cacho MD , Nerio José Fernández-Araujo MD , Alberto Candela MD , Lourdes Carmen Visedo MD , Javier Muñoz-Visedo Math, BsC","doi":"10.1016/j.hrtlng.2025.09.017","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Artificial intelligence (AI) has emerged as a promising tool to improve the diagnosis and characterization of ARDS, including the identification of subphenotypes.</div></div><div><h3>Objectives</h3><div>To evaluate the diagnostic performance and methodological quality of AI models for identifying ARDS and its subphenotypes in adults.</div></div><div><h3>Methods</h3><div>We conducted a systematic review and meta-analysis of 63 studies (<em>n</em> = 135,762) published between 2013 and 2024 in PubMed, Embase, and the Cochrane Library. Extracted outcomes included sensitivity, specificity, AUROC, and validation methods. Risk of bias was assessed with PROBAST, and AI-specific metrics (overfitting, generalization, interpretability, discrimination, calibration) were reported.</div></div><div><h3>Results</h3><div>Pooled sensitivity was 0.89 (95 % CI 0.84–0.93), specificity 0.88 (95 % CI 0.83–0.92), and AUROC 0.90 (95 % CI 0.86–0.94), with high heterogeneity (I² > 85 %). Twenty-two studies (31 %) were rated high quality, with sensitivity 0.86 (95 % CI 0.82–0.89) and specificity 0.82 (95 % CI 0.78–0.85). Deep learning models (<em>n</em> = 14) achieved sensitivity 0.91, while machine learning models (<em>n</em> = 19) showed 0.87. Imaging-based models (<em>n</em> = 15) outperformed non-imaging approaches. COVID-19 studies (<em>n</em> = 9) reported sensitivity 0.90 with comparable AUROC and specificity. Only seven studies (18 %) investigated subphenotyping, identifying hyperinflammatory and hypoinflammatory profiles with potential therapeutic relevance. Calibration reporting was missing in 47 % and external validation in most (29/63).</div></div><div><h3>Conclusion</h3><div>AI models for ARDS demonstrate promising diagnostic accuracy but are limited by poor calibration and scarce external validation. Subphenotyping remains exploratory but suggests opportunities for real-time patient stratification. Prospective validation and standardized reporting are essential for clinical adoption.</div></div>","PeriodicalId":55064,"journal":{"name":"Heart & Lung","volume":"75 ","pages":"Pages 144-163"},"PeriodicalIF":2.6000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Heart & Lung","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0147956325002067","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CARDIAC & CARDIOVASCULAR SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Background
Artificial intelligence (AI) has emerged as a promising tool to improve the diagnosis and characterization of ARDS, including the identification of subphenotypes.
Objectives
To evaluate the diagnostic performance and methodological quality of AI models for identifying ARDS and its subphenotypes in adults.
Methods
We conducted a systematic review and meta-analysis of 63 studies (n = 135,762) published between 2013 and 2024 in PubMed, Embase, and the Cochrane Library. Extracted outcomes included sensitivity, specificity, AUROC, and validation methods. Risk of bias was assessed with PROBAST, and AI-specific metrics (overfitting, generalization, interpretability, discrimination, calibration) were reported.
Results
Pooled sensitivity was 0.89 (95 % CI 0.84–0.93), specificity 0.88 (95 % CI 0.83–0.92), and AUROC 0.90 (95 % CI 0.86–0.94), with high heterogeneity (I² > 85 %). Twenty-two studies (31 %) were rated high quality, with sensitivity 0.86 (95 % CI 0.82–0.89) and specificity 0.82 (95 % CI 0.78–0.85). Deep learning models (n = 14) achieved sensitivity 0.91, while machine learning models (n = 19) showed 0.87. Imaging-based models (n = 15) outperformed non-imaging approaches. COVID-19 studies (n = 9) reported sensitivity 0.90 with comparable AUROC and specificity. Only seven studies (18 %) investigated subphenotyping, identifying hyperinflammatory and hypoinflammatory profiles with potential therapeutic relevance. Calibration reporting was missing in 47 % and external validation in most (29/63).
Conclusion
AI models for ARDS demonstrate promising diagnostic accuracy but are limited by poor calibration and scarce external validation. Subphenotyping remains exploratory but suggests opportunities for real-time patient stratification. Prospective validation and standardized reporting are essential for clinical adoption.
期刊介绍:
Heart & Lung: The Journal of Cardiopulmonary and Acute Care, the official publication of The American Association of Heart Failure Nurses, presents original, peer-reviewed articles on techniques, advances, investigations, and observations related to the care of patients with acute and critical illness and patients with chronic cardiac or pulmonary disorders.
The Journal''s acute care articles focus on the care of hospitalized patients, including those in the critical and acute care settings. Because most patients who are hospitalized in acute and critical care settings have chronic conditions, we are also interested in the chronically critically ill, the care of patients with chronic cardiopulmonary disorders, their rehabilitation, and disease prevention. The Journal''s heart failure articles focus on all aspects of the care of patients with this condition. Manuscripts that are relevant to populations across the human lifespan are welcome.