Ashkan Labaf, Linda Åhman-Persson, Leo Silvén Husu, J Gustav Smith, Annika Ingvarsson, Anna Werther Evaldsson
{"title":"Performance of a point-of-care ultrasound platform for artificial intelligence-enabled assessment of pulmonary B-lines.","authors":"Ashkan Labaf, Linda Åhman-Persson, Leo Silvén Husu, J Gustav Smith, Annika Ingvarsson, Anna Werther Evaldsson","doi":"10.1186/s12947-025-00338-2","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The incorporation of artificial intelligence (AI) into point-of-care ultrasound (POCUS) platforms has rapidly increased. The number of B-lines present on lung ultrasound (LUS) serve as a useful tool for the assessment of pulmonary congestion. Interpretation, however, requires experience and therefore AI automation has been pursued. This study aimed to test the agreement between the AI software embedded in a major vendor POCUS system and visual expert assessment.</p><p><strong>Methods: </strong>This single-center prospective study included 55 patients hospitalized for various respiratory symptoms, predominantly acutely decompensated heart failure. A 12-zone protocol was used. Two experts in LUS independently categorized B-lines into 0, 1-2, 3-4, and ≥ 5. The intraclass correlation coefficient (ICC) was used to determine agreement.</p><p><strong>Results: </strong>A total of 672 LUS zones were obtained, with 584 (87%) eligible for analysis. Compared with expert reviewers, the AI significantly overcounted number of B-lines per patient (23.5 vs. 2.8, p < 0.001). A greater proportion of zones with > 5 B-lines was found by the AI than by the reviewers (38% vs. 4%, p < 0.001). The ICC between the AI and reviewers was 0.28 for the total sum of B-lines and 0.37 for the zone-by-zone method. The interreviewer agreement was excellent, with ICCs of 0.92 and 0.91, respectively.</p><p><strong>Conclusion: </strong>This study demonstrated excellent interrater reliability of B-line counts from experts but poor agreement with the AI software embedded in a major vendor system, primarily due to overcounting. Our findings indicate that further development is needed to increase the accuracy of AI tools in LUS.</p>","PeriodicalId":9613,"journal":{"name":"Cardiovascular Ultrasound","volume":"23 1","pages":"3"},"PeriodicalIF":1.9000,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11874383/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cardiovascular Ultrasound","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12947-025-00338-2","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CARDIAC & CARDIOVASCULAR SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Background: The incorporation of artificial intelligence (AI) into point-of-care ultrasound (POCUS) platforms has rapidly increased. The number of B-lines present on lung ultrasound (LUS) serve as a useful tool for the assessment of pulmonary congestion. Interpretation, however, requires experience and therefore AI automation has been pursued. This study aimed to test the agreement between the AI software embedded in a major vendor POCUS system and visual expert assessment.
Methods: This single-center prospective study included 55 patients hospitalized for various respiratory symptoms, predominantly acutely decompensated heart failure. A 12-zone protocol was used. Two experts in LUS independently categorized B-lines into 0, 1-2, 3-4, and ≥ 5. The intraclass correlation coefficient (ICC) was used to determine agreement.
Results: A total of 672 LUS zones were obtained, with 584 (87%) eligible for analysis. Compared with expert reviewers, the AI significantly overcounted number of B-lines per patient (23.5 vs. 2.8, p < 0.001). A greater proportion of zones with > 5 B-lines was found by the AI than by the reviewers (38% vs. 4%, p < 0.001). The ICC between the AI and reviewers was 0.28 for the total sum of B-lines and 0.37 for the zone-by-zone method. The interreviewer agreement was excellent, with ICCs of 0.92 and 0.91, respectively.
Conclusion: This study demonstrated excellent interrater reliability of B-line counts from experts but poor agreement with the AI software embedded in a major vendor system, primarily due to overcounting. Our findings indicate that further development is needed to increase the accuracy of AI tools in LUS.
期刊介绍:
Cardiovascular Ultrasound is an online journal, publishing peer-reviewed: original research; authoritative reviews; case reports on challenging and/or unusual diagnostic aspects; and expert opinions on new techniques and technologies. We are particularly interested in articles that include relevant images or video files, which provide an additional dimension to published articles and enhance understanding.
As an open access journal, Cardiovascular Ultrasound ensures high visibility for authors in addition to providing an up-to-date and freely available resource for the community. The journal welcomes discussion, and provides a forum for publishing opinion and debate ranging from biology to engineering to clinical echocardiography, with both speed and versatility.