John Cull, Dustin Morrow, Caleb Manasco, Ashley Vaughan, John Eicken, Hudson Smith
{"title":"A quality assessment tool for focused abdominal sonography for trauma examinations using artificial intelligence.","authors":"John Cull, Dustin Morrow, Caleb Manasco, Ashley Vaughan, John Eicken, Hudson Smith","doi":"10.1097/TA.0000000000004425","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Current tools to review focused abdominal sonography for trauma (FAST) images for quality have poorly defined grading criteria or are developed to grade the skills of the sonographer and not the examination. The purpose of this study is to establish a grading system with substantial agreement among coders, thereby enabling the development of an automated assessment tool for FAST examinations using artificial intelligence (AI).</p><p><strong>Methods: </strong>Five coders labeled a set of FAST clips. Each coder was responsible for a different subset of clips (10% of the clips were labeled in triplicate to evaluate intercoder reliability). The clips were labeled with a quality score from 1 (lowest quality) to 5 (highest quality). Clips of 3 or greater were considered passing. An AI training model was developed to score the quality of the FAST examination. The clips were split into a training set, a validation set, and a test set. The predicted scores were rounded to the nearest quality level to distinguish passing from failing clips.</p><p><strong>Results: </strong>A total of 1,514 qualified clips (1,399 passing and 115 failing clips) were evaluated in the final data set. This final data set had a 94% agreement between pairs of coders on the pass/fail prediction, and the set had a Krippendorff α of 66%. The decision threshold can be tuned to achieve the desired tradeoff between precision and sensitivity. Without using the AI model, a reviewer would, on average, examine roughly 25 clips for every 1 failing clip identified. In contrast, using our model with a decision threshold of 0.015, a reviewer would examine roughly five clips for every one failing clip - a fivefold reduction in clips reviewed while still correctly identifying 85% of passing clips.</p><p><strong>Conclusion: </strong>Integration of AI holds significant promise in improving the accurate evaluation of FAST images while simultaneously alleviating the workload burden on expert physicians.</p><p><strong>Level of evidence: </strong>Diagnostic Test/Criteria; Level II.</p>","PeriodicalId":17453,"journal":{"name":"Journal of Trauma and Acute Care Surgery","volume":" ","pages":"111-116"},"PeriodicalIF":2.9000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Trauma and Acute Care Surgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/TA.0000000000004425","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/12/14 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"CRITICAL CARE MEDICINE","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Current tools to review focused abdominal sonography for trauma (FAST) images for quality have poorly defined grading criteria or are developed to grade the skills of the sonographer and not the examination. The purpose of this study is to establish a grading system with substantial agreement among coders, thereby enabling the development of an automated assessment tool for FAST examinations using artificial intelligence (AI).
Methods: Five coders labeled a set of FAST clips. Each coder was responsible for a different subset of clips (10% of the clips were labeled in triplicate to evaluate intercoder reliability). The clips were labeled with a quality score from 1 (lowest quality) to 5 (highest quality). Clips of 3 or greater were considered passing. An AI training model was developed to score the quality of the FAST examination. The clips were split into a training set, a validation set, and a test set. The predicted scores were rounded to the nearest quality level to distinguish passing from failing clips.
Results: A total of 1,514 qualified clips (1,399 passing and 115 failing clips) were evaluated in the final data set. This final data set had a 94% agreement between pairs of coders on the pass/fail prediction, and the set had a Krippendorff α of 66%. The decision threshold can be tuned to achieve the desired tradeoff between precision and sensitivity. Without using the AI model, a reviewer would, on average, examine roughly 25 clips for every 1 failing clip identified. In contrast, using our model with a decision threshold of 0.015, a reviewer would examine roughly five clips for every one failing clip - a fivefold reduction in clips reviewed while still correctly identifying 85% of passing clips.
Conclusion: Integration of AI holds significant promise in improving the accurate evaluation of FAST images while simultaneously alleviating the workload burden on expert physicians.
Level of evidence: Diagnostic Test/Criteria; Level II.
期刊介绍:
The Journal of Trauma and Acute Care Surgery® is designed to provide the scientific basis to optimize care of the severely injured and critically ill surgical patient. Thus, the Journal has a high priority for basic and translation research to fulfill this objectives. Additionally, the Journal is enthusiastic to publish randomized prospective clinical studies to establish care predicated on a mechanistic foundation. Finally, the Journal is seeking systematic reviews, guidelines and algorithms that incorporate the best evidence available.