A quality assessment tool for focused abdominal sonography for trauma examinations using artificial intelligence.

IF 2.9 2区医学 Q2 CRITICAL CARE MEDICINE

Journal of Trauma and Acute Care Surgery Pub Date : 2025-01-01 Epub Date: 2024-12-14 DOI:10.1097/TA.0000000000004425

John Cull, Dustin Morrow, Caleb Manasco, Ashley Vaughan, John Eicken, Hudson Smith

{"title":"A quality assessment tool for focused abdominal sonography for trauma examinations using artificial intelligence.","authors":"John Cull, Dustin Morrow, Caleb Manasco, Ashley Vaughan, John Eicken, Hudson Smith","doi":"10.1097/TA.0000000000004425","DOIUrl":null,"url":null,"abstract":"Background: Current tools to review focused abdominal sonography for trauma (FAST) images for quality have poorly defined grading criteria or are developed to grade the skills of the sonographer and not the examination. The purpose of this study is to establish a grading system with substantial agreement among coders, thereby enabling the development of an automated assessment tool for FAST examinations using artificial intelligence (AI).Methods: Five coders labeled a set of FAST clips. Each coder was responsible for a different subset of clips (10% of the clips were labeled in triplicate to evaluate intercoder reliability). The clips were labeled with a quality score from 1 (lowest quality) to 5 (highest quality). Clips of 3 or greater were considered passing. An AI training model was developed to score the quality of the FAST examination. The clips were split into a training set, a validation set, and a test set. The predicted scores were rounded to the nearest quality level to distinguish passing from failing clips.Results: A total of 1,514 qualified clips (1,399 passing and 115 failing clips) were evaluated in the final data set. This final data set had a 94% agreement between pairs of coders on the pass/fail prediction, and the set had a Krippendorff α of 66%. The decision threshold can be tuned to achieve the desired tradeoff between precision and sensitivity. Without using the AI model, a reviewer would, on average, examine roughly 25 clips for every 1 failing clip identified. In contrast, using our model with a decision threshold of 0.015, a reviewer would examine roughly five clips for every one failing clip - a fivefold reduction in clips reviewed while still correctly identifying 85% of passing clips.Conclusion: Integration of AI holds significant promise in improving the accurate evaluation of FAST images while simultaneously alleviating the workload burden on expert physicians.Level of evidence: Diagnostic Test/Criteria; Level II.","PeriodicalId":17453,"journal":{"name":"Journal of Trauma and Acute Care Surgery","volume":" ","pages":"111-116"},"PeriodicalIF":2.9000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Trauma and Acute Care Surgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/TA.0000000000004425","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/12/14 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"CRITICAL CARE MEDICINE","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Current tools to review focused abdominal sonography for trauma (FAST) images for quality have poorly defined grading criteria or are developed to grade the skills of the sonographer and not the examination. The purpose of this study is to establish a grading system with substantial agreement among coders, thereby enabling the development of an automated assessment tool for FAST examinations using artificial intelligence (AI).

Methods: Five coders labeled a set of FAST clips. Each coder was responsible for a different subset of clips (10% of the clips were labeled in triplicate to evaluate intercoder reliability). The clips were labeled with a quality score from 1 (lowest quality) to 5 (highest quality). Clips of 3 or greater were considered passing. An AI training model was developed to score the quality of the FAST examination. The clips were split into a training set, a validation set, and a test set. The predicted scores were rounded to the nearest quality level to distinguish passing from failing clips.

Results: A total of 1,514 qualified clips (1,399 passing and 115 failing clips) were evaluated in the final data set. This final data set had a 94% agreement between pairs of coders on the pass/fail prediction, and the set had a Krippendorff α of 66%. The decision threshold can be tuned to achieve the desired tradeoff between precision and sensitivity. Without using the AI model, a reviewer would, on average, examine roughly 25 clips for every 1 failing clip identified. In contrast, using our model with a decision threshold of 0.015, a reviewer would examine roughly five clips for every one failing clip - a fivefold reduction in clips reviewed while still correctly identifying 85% of passing clips.

Conclusion: Integration of AI holds significant promise in improving the accurate evaluation of FAST images while simultaneously alleviating the workload burden on expert physicians.

Level of evidence: Diagnostic Test/Criteria; Level II.

查看原文本刊更多论文

利用人工智能的腹部超声聚焦创伤检查质量评估工具。

背景：目前用于审查创伤性聚焦腹部超声造影（FAST）图像质量的工具没有明确的分级标准，或者开发这些工具是为了对超声技师的技能而非检查进行分级。本研究的目的是建立一个编码员之间基本一致的分级系统，从而能够利用人工智能（AI）开发 FAST 检查的自动评估工具：方法：五名编码员对一组 FAST 片段进行标注。每个编码员负责不同的片段子集（10%的片段标记为一式三份，以评估编码员之间的可靠性）。这些片段的质量得分从 1 分（质量最低）到 5 分（质量最高）不等。3 分或以上的片段被视为合格。我们开发了一个人工智能训练模型来对 FAST 检查的质量进行评分。片段被分成训练集、验证集和测试集。预测分数四舍五入到最接近的质量水平，以区分合格和不合格的片段：最终数据集共评估了 1,514 个合格片段（1,399 个合格片段和 115 个不合格片段）。在最终数据集中，一对编码员之间在合格/不合格预测上的一致性达到 94%，该数据集的 Krippendorff α 为 66%。判定阈值可以调整，以便在精确度和灵敏度之间取得理想的平衡。在不使用人工智能模型的情况下，审稿人平均每识别出 1 个失败片段，就要检查大约 25 个片段。相比之下，如果使用我们的模型，并将决策阈值设为 0.015，那么审查员每审查一个不及格的片段，就会审查大约五个片段--审查的片段数量减少了五倍，但仍能正确识别 85% 的合格片段：结论：整合人工智能在提高 FAST 图像的准确评估方面大有可为，同时还能减轻专家医师的工作量：诊断测试/标准；II 级。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Trauma and Acute Care Surgery 医学-外科

CiteScore

6.00

自引率

11.80%

发文量

637

审稿时长

2.7 months

期刊介绍： The Journal of Trauma and Acute Care Surgery® is designed to provide the scientific basis to optimize care of the severely injured and critically ill surgical patient. Thus, the Journal has a high priority for basic and translation research to fulfill this objectives. Additionally, the Journal is enthusiastic to publish randomized prospective clinical studies to establish care predicated on a mechanistic foundation. Finally, the Journal is seeking systematic reviews, guidelines and algorithms that incorporate the best evidence available.