ChatGPT-4's Accuracy in Estimating Thyroid Nodule Features and Cancer Risk from Ultrasound Images.

IF 3.7 3区 医学 Q2 ENDOCRINOLOGY & METABOLISM
Esteban Cabezas, David Toro-Tobon, Thomas Johnson, Marco Álvarez, Javad R Azadi, Camilo Gonzalez-Velasquez, Naykky Singh Ospina, Oscar J Ponce, Megan E Branda, Juan P Brito
{"title":"ChatGPT-4's Accuracy in Estimating Thyroid Nodule Features and Cancer Risk from Ultrasound Images.","authors":"Esteban Cabezas, David Toro-Tobon, Thomas Johnson, Marco Álvarez, Javad R Azadi, Camilo Gonzalez-Velasquez, Naykky Singh Ospina, Oscar J Ponce, Megan E Branda, Juan P Brito","doi":"10.1016/j.eprac.2025.03.008","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>To evaluate the performance of GPT-4 and GPT-4o in accurately identifying features and categories from thyroid nodule ultrasound (TUS) images following the American College of Radiology Thyroid Imaging Reporting and Data System (TI-RADS).</p><p><strong>Methods: </strong>This comparative validation study, conducted between October 2023 and May 2024, utilized 202 thyroid ultrasound (TUS) images sourced from three open-access databases. Both complete and cropped versions of each image were independently evaluated by expert radiologists to establish a reference standard for TI-RADS features and categories. GPT-4 and GPT-4o were prompted to analyze each image, and their generated TI-RADS outputs were compared to the reference standard.</p><p><strong>Results: </strong>GPT-4 demonstrated high specificity but low sensitivity when assessing complete TUS images across most TI-RADS categories, resulting in mixed overall accuracy. For low-risk nodules (TR1), GPT-4 achieved 25.0% sensitivity, 99.5% specificity, and 93.6% accuracy. In contrast, in the higher risk TR4 category GPT-4 showed 75% sensitivity, 22.2% specificity, and 42.1% accuracy. While GPT-4 effectively identified features like smooth margins (73% vs 65% the reference standard), it struggled to identify other features like isoechoic echogenicity (5% vs, 46%), and echogenic foci (3% vs. 27%). The assessment of cropped images using both GPT-4 and GPT-4o followed similar patterns, though with slight deviations indicating a decrease in performance compared to GPT-4's assessment of complete images.</p><p><strong>Conclusion: </strong>While GPT-4 and GPT-4o models show potential for improving the efficiency of thyroid nodule triage, their performance remains suboptimal, particularly in higher-risk categories. Further refinement and validation of these models are necessary before clinical implementation.</p>","PeriodicalId":11682,"journal":{"name":"Endocrine Practice","volume":" ","pages":""},"PeriodicalIF":3.7000,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Endocrine Practice","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.eprac.2025.03.008","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}
引用次数: 0

Abstract

Objective: To evaluate the performance of GPT-4 and GPT-4o in accurately identifying features and categories from thyroid nodule ultrasound (TUS) images following the American College of Radiology Thyroid Imaging Reporting and Data System (TI-RADS).

Methods: This comparative validation study, conducted between October 2023 and May 2024, utilized 202 thyroid ultrasound (TUS) images sourced from three open-access databases. Both complete and cropped versions of each image were independently evaluated by expert radiologists to establish a reference standard for TI-RADS features and categories. GPT-4 and GPT-4o were prompted to analyze each image, and their generated TI-RADS outputs were compared to the reference standard.

Results: GPT-4 demonstrated high specificity but low sensitivity when assessing complete TUS images across most TI-RADS categories, resulting in mixed overall accuracy. For low-risk nodules (TR1), GPT-4 achieved 25.0% sensitivity, 99.5% specificity, and 93.6% accuracy. In contrast, in the higher risk TR4 category GPT-4 showed 75% sensitivity, 22.2% specificity, and 42.1% accuracy. While GPT-4 effectively identified features like smooth margins (73% vs 65% the reference standard), it struggled to identify other features like isoechoic echogenicity (5% vs, 46%), and echogenic foci (3% vs. 27%). The assessment of cropped images using both GPT-4 and GPT-4o followed similar patterns, though with slight deviations indicating a decrease in performance compared to GPT-4's assessment of complete images.

Conclusion: While GPT-4 and GPT-4o models show potential for improving the efficiency of thyroid nodule triage, their performance remains suboptimal, particularly in higher-risk categories. Further refinement and validation of these models are necessary before clinical implementation.

ChatGPT-4在超声图像中评估甲状腺结节特征和癌症风险的准确性。
目的:评价GPT-4和gpt - 40在美国放射学会甲状腺影像学报告和数据系统(TI-RADS)中准确识别甲状腺结节超声(TUS)图像特征和类别的性能。方法:本比较验证研究于2023年10月至2024年5月进行,利用来自三个开放获取数据库的202张甲状腺超声(TUS)图像。每张图像的完整版本和裁剪版本都由放射科专家独立评估,以建立TI-RADS特征和类别的参考标准。提示GPT-4和gpt - 40分析每张图像,并将其生成的TI-RADS输出与参考标准进行比较。结果:GPT-4在评估大多数TI-RADS类别的完整TUS图像时表现出高特异性但低灵敏度,导致整体准确性参差不齐。对于低风险结节(TR1), GPT-4的敏感性为25.0%,特异性为99.5%,准确性为93.6%。相比之下,在高风险的TR4类别中,GPT-4的敏感性为75%,特异性为22.2%,准确性为42.1%。虽然GPT-4有效地识别了平滑边缘(73%对65%的参考标准)等特征,但它很难识别其他特征,如等回声回声性(5%对46%)和回声灶(3%对27%)。使用GPT-4和gpt - 40对裁剪图像的评估遵循类似的模式,尽管与GPT-4对完整图像的评估相比,有轻微的偏差表明性能有所下降。结论:虽然GPT-4和gpt - 40模型显示出提高甲状腺结节分诊效率的潜力,但它们的性能仍然不理想,特别是在高风险类别中。在临床应用之前,有必要进一步完善和验证这些模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Endocrine Practice
Endocrine Practice ENDOCRINOLOGY & METABOLISM-
CiteScore
7.60
自引率
2.40%
发文量
546
审稿时长
41 days
期刊介绍: Endocrine Practice (ISSN: 1530-891X), a peer-reviewed journal published twelve times a year, is the official journal of the American Association of Clinical Endocrinologists (AACE). The primary mission of Endocrine Practice is to enhance the health care of patients with endocrine diseases through continuing education of practicing endocrinologists.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信