评估大型语言模型在急性胸部疾病的胸部x线解释中的准确性

IF 2.7 3区 医学 Q1 EMERGENCY MEDICINE
Adam M. Ostrovsky
{"title":"评估大型语言模型在急性胸部疾病的胸部x线解释中的准确性","authors":"Adam M. Ostrovsky","doi":"10.1016/j.ajem.2025.03.060","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>The rapid advancement of artificial intelligence (AI) has great ability to impact healthcare. Chest X-rays are essential for diagnosing acute thoracic conditions in the emergency department (ED), but interpretation delays due to radiologist availability can impact clinical decision-making. AI models, including deep learning algorithms, have been explored for diagnostic support, but the potential of large language models (LLMs) in emergency radiology remains largely unexamined.</div></div><div><h3>Methods</h3><div>This study assessed ChatGPT's feasibility in interpreting chest X-rays for acute thoracic conditions commonly encountered in the ED. A subset of 1400 images from the NIH Chest X-ray dataset was analyzed, representing seven pathology categories: Atelectasis, Effusion, Emphysema, Pneumothorax, Pneumonia, Mass, and No Finding. ChatGPT 4.0, utilizing the “X-Ray Interpreter” add-on, was evaluated for its diagnostic performance across these categories.</div></div><div><h3>Results</h3><div>ChatGPT demonstrated high performance in identifying normal chest X-rays, with a sensitivity of 98.9 %, specificity of 93.9 %, and accuracy of 94.7 %. However, the model's performance varied across pathologies. The best results were observed in diagnosing pneumonia (sensitivity 76.2 %, specificity 93.7 %) and pneumothorax (sensitivity 77.4 %, specificity 89.1 %), while performance for atelectasis and emphysema was lower.</div></div><div><h3>Conclusion</h3><div>ChatGPT demonstrates potential as a supplementary tool for differentiating normal from abnormal chest X-rays, with promising results for certain pathologies like pneumonia. However, its diagnostic accuracy for more subtle conditions requires improvement. Further research integrating ChatGPT with specialized image recognition models could enhance its performance, offering new possibilities in medical imaging and education.</div></div>","PeriodicalId":55536,"journal":{"name":"American Journal of Emergency Medicine","volume":"93 ","pages":"Pages 99-102"},"PeriodicalIF":2.7000,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluating a large language model's accuracy in chest X-ray interpretation for acute thoracic conditions\",\"authors\":\"Adam M. Ostrovsky\",\"doi\":\"10.1016/j.ajem.2025.03.060\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><div>The rapid advancement of artificial intelligence (AI) has great ability to impact healthcare. Chest X-rays are essential for diagnosing acute thoracic conditions in the emergency department (ED), but interpretation delays due to radiologist availability can impact clinical decision-making. AI models, including deep learning algorithms, have been explored for diagnostic support, but the potential of large language models (LLMs) in emergency radiology remains largely unexamined.</div></div><div><h3>Methods</h3><div>This study assessed ChatGPT's feasibility in interpreting chest X-rays for acute thoracic conditions commonly encountered in the ED. A subset of 1400 images from the NIH Chest X-ray dataset was analyzed, representing seven pathology categories: Atelectasis, Effusion, Emphysema, Pneumothorax, Pneumonia, Mass, and No Finding. ChatGPT 4.0, utilizing the “X-Ray Interpreter” add-on, was evaluated for its diagnostic performance across these categories.</div></div><div><h3>Results</h3><div>ChatGPT demonstrated high performance in identifying normal chest X-rays, with a sensitivity of 98.9 %, specificity of 93.9 %, and accuracy of 94.7 %. However, the model's performance varied across pathologies. The best results were observed in diagnosing pneumonia (sensitivity 76.2 %, specificity 93.7 %) and pneumothorax (sensitivity 77.4 %, specificity 89.1 %), while performance for atelectasis and emphysema was lower.</div></div><div><h3>Conclusion</h3><div>ChatGPT demonstrates potential as a supplementary tool for differentiating normal from abnormal chest X-rays, with promising results for certain pathologies like pneumonia. However, its diagnostic accuracy for more subtle conditions requires improvement. Further research integrating ChatGPT with specialized image recognition models could enhance its performance, offering new possibilities in medical imaging and education.</div></div>\",\"PeriodicalId\":55536,\"journal\":{\"name\":\"American Journal of Emergency Medicine\",\"volume\":\"93 \",\"pages\":\"Pages 99-102\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2025-03-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"American Journal of Emergency Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0735675725002256\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"EMERGENCY MEDICINE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"American Journal of Emergency Medicine","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0735675725002256","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EMERGENCY MEDICINE","Score":null,"Total":0}
引用次数: 0

摘要

人工智能(AI)的快速发展对医疗保健产生了巨大的影响。胸部x光片对于急诊科(ED)诊断急性胸部疾病至关重要,但由于放射科医生的可用性导致解释延迟可能影响临床决策。人工智能模型,包括深度学习算法,已经被探索用于诊断支持,但大型语言模型(llm)在急诊放射学中的潜力在很大程度上仍未得到研究。方法:本研究评估了ChatGPT在解释急诊科常见的急性胸部疾病的胸部x射线方面的可行性。分析了来自NIH胸部x射线数据集的1400张图像的子集,代表了七种病理类别:肺不张、积液、肺气肿、气胸、肺炎、肿块和无发现。ChatGPT 4.0利用“x射线解释器”附加组件,对其在这些类别中的诊断性能进行了评估。结果schatgpt对正常胸部x线片的鉴别灵敏度为98.9%,特异度为93.9%,准确率为94.7%。然而,该模型的表现因病理而异。诊断肺炎(敏感性76.2%,特异性93.7%)和气胸(敏感性77.4%,特异性89.1%)效果最好,而诊断肺不张和肺气肿效果较差。结论chatgpt作为鉴别胸部x线正常与异常的辅助工具,在诊断肺炎等特定病理方面具有良好的效果。然而,它对更细微的疾病的诊断准确性需要提高。将ChatGPT与专门的图像识别模型相结合的进一步研究可以提高其性能,为医学成像和教育提供新的可能性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Evaluating a large language model's accuracy in chest X-ray interpretation for acute thoracic conditions

Background

The rapid advancement of artificial intelligence (AI) has great ability to impact healthcare. Chest X-rays are essential for diagnosing acute thoracic conditions in the emergency department (ED), but interpretation delays due to radiologist availability can impact clinical decision-making. AI models, including deep learning algorithms, have been explored for diagnostic support, but the potential of large language models (LLMs) in emergency radiology remains largely unexamined.

Methods

This study assessed ChatGPT's feasibility in interpreting chest X-rays for acute thoracic conditions commonly encountered in the ED. A subset of 1400 images from the NIH Chest X-ray dataset was analyzed, representing seven pathology categories: Atelectasis, Effusion, Emphysema, Pneumothorax, Pneumonia, Mass, and No Finding. ChatGPT 4.0, utilizing the “X-Ray Interpreter” add-on, was evaluated for its diagnostic performance across these categories.

Results

ChatGPT demonstrated high performance in identifying normal chest X-rays, with a sensitivity of 98.9 %, specificity of 93.9 %, and accuracy of 94.7 %. However, the model's performance varied across pathologies. The best results were observed in diagnosing pneumonia (sensitivity 76.2 %, specificity 93.7 %) and pneumothorax (sensitivity 77.4 %, specificity 89.1 %), while performance for atelectasis and emphysema was lower.

Conclusion

ChatGPT demonstrates potential as a supplementary tool for differentiating normal from abnormal chest X-rays, with promising results for certain pathologies like pneumonia. However, its diagnostic accuracy for more subtle conditions requires improvement. Further research integrating ChatGPT with specialized image recognition models could enhance its performance, offering new possibilities in medical imaging and education.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
6.00
自引率
5.60%
发文量
730
审稿时长
42 days
期刊介绍: A distinctive blend of practicality and scholarliness makes the American Journal of Emergency Medicine a key source for information on emergency medical care. Covering all activities concerned with emergency medicine, it is the journal to turn to for information to help increase the ability to understand, recognize and treat emergency conditions. Issues contain clinical articles, case reports, review articles, editorials, international notes, book reviews and more.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信