{"title":"Optimizing Thyroid Nodule Management With Artificial Intelligence: Multicenter Retrospective Study on Reducing Unnecessary Fine Needle Aspirations.","authors":"Jia-Hui Ni, Yun-Yun Liu, Chao Chen, Yi-Lei Shi, Xing Zhao, Xiao-Long Li, Bei-Bei Ye, Jing-Liang Hu, Li-Chao Mou, Li-Ping Sun, Hui-Jun Fu, Xiao-Xiang Zhu, Yi-Feng Zhang, Lehang Guo, Hui-Xiong Xu","doi":"10.2196/71740","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Most artificial intelligence (AI) models for thyroid nodules are designed to screen for malignancy to guide further interventions; however, these models have not yet been fully implemented in clinical practice.</p><p><strong>Objective: </strong>This study aimed to evaluate AI in real clinical settings for identifying potentially benign thyroid nodules initially deemed to be at risk for malignancy by radiologists, reducing unnecessary fine needle aspiration (FNA) and optimizing management.</p><p><strong>Methods: </strong>We retrospectively collected a validation cohort of thyroid nodules that had undergone FNA. These nodules were initially assessed as \"suspicious for malignancy\" by radiologists based on ultrasound features, following standard clinical practice, which prompted further FNA procedures. Ultrasound images of these nodules were re-evaluated using a deep learning-based AI system, and its diagnostic performance was assessed in terms of correct identification of benign nodules and error identification of malignant nodules. Performance metrics such as sensitivity, specificity, and the area under the receiver operating characteristic curve were calculated. In addition, a separate comparison cohort was retrospectively assembled to compare the AI system's ability to correctly identify benign thyroid nodules with that of radiologists.</p><p><strong>Results: </strong>The validation cohort comprised 4572 thyroid nodules (benign: n=3134, 68.5%; malignant: n=1438, 31.5%). AI correctly identified 2719 (86.8% among benign nodules) and reduced unnecessary FNAs from 68.5% (3134/4572) to 9.1% (415/4572). However, 123 malignant nodules (8.6% of malignant cases) were mistakenly identified as benign, with the majority of these being of low or intermediate suspicion. In the comparison cohort, AI successfully identified 81.4% (96/118) of benign nodules. It outperformed junior and senior radiologists, who identified only 40% and 55%, respectively. The area under the curve (AUC) for the AI model was 0.88 (95% CI 0.85-0.91), demonstrating a superior AUC compared with that of the junior radiologists (AUC=0.43, 95% CI 0.36-0.50; P=.002) and senior radiologists (AUC=0.63, 95% CI 0.55-0.70; P=.003).</p><p><strong>Conclusions: </strong>Compared with radiologists, AI can better serve as a \"goalkeeper\" in reducing unnecessary FNAs by identifying benign nodules that are initially assessed as malignant by radiologists. However, active surveillance is still necessary for all these nodules since a very small number of low-aggressiveness malignant nodules may be mistakenly identified.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e71740"},"PeriodicalIF":3.8000,"publicationDate":"2025-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12310072/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/71740","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Most artificial intelligence (AI) models for thyroid nodules are designed to screen for malignancy to guide further interventions; however, these models have not yet been fully implemented in clinical practice.
Objective: This study aimed to evaluate AI in real clinical settings for identifying potentially benign thyroid nodules initially deemed to be at risk for malignancy by radiologists, reducing unnecessary fine needle aspiration (FNA) and optimizing management.
Methods: We retrospectively collected a validation cohort of thyroid nodules that had undergone FNA. These nodules were initially assessed as "suspicious for malignancy" by radiologists based on ultrasound features, following standard clinical practice, which prompted further FNA procedures. Ultrasound images of these nodules were re-evaluated using a deep learning-based AI system, and its diagnostic performance was assessed in terms of correct identification of benign nodules and error identification of malignant nodules. Performance metrics such as sensitivity, specificity, and the area under the receiver operating characteristic curve were calculated. In addition, a separate comparison cohort was retrospectively assembled to compare the AI system's ability to correctly identify benign thyroid nodules with that of radiologists.
Results: The validation cohort comprised 4572 thyroid nodules (benign: n=3134, 68.5%; malignant: n=1438, 31.5%). AI correctly identified 2719 (86.8% among benign nodules) and reduced unnecessary FNAs from 68.5% (3134/4572) to 9.1% (415/4572). However, 123 malignant nodules (8.6% of malignant cases) were mistakenly identified as benign, with the majority of these being of low or intermediate suspicion. In the comparison cohort, AI successfully identified 81.4% (96/118) of benign nodules. It outperformed junior and senior radiologists, who identified only 40% and 55%, respectively. The area under the curve (AUC) for the AI model was 0.88 (95% CI 0.85-0.91), demonstrating a superior AUC compared with that of the junior radiologists (AUC=0.43, 95% CI 0.36-0.50; P=.002) and senior radiologists (AUC=0.63, 95% CI 0.55-0.70; P=.003).
Conclusions: Compared with radiologists, AI can better serve as a "goalkeeper" in reducing unnecessary FNAs by identifying benign nodules that are initially assessed as malignant by radiologists. However, active surveillance is still necessary for all these nodules since a very small number of low-aggressiveness malignant nodules may be mistakenly identified.
背景:大多数甲状腺结节人工智能(AI)模型旨在筛查恶性肿瘤以指导进一步干预;然而,这些模型尚未在临床实践中得到全面实施。目的:本研究旨在评估人工智能在真实临床环境中的应用,以识别放射科医生最初认为有恶性肿瘤风险的潜在良性甲状腺结节,减少不必要的细针穿刺(FNA)并优化管理。方法:我们回顾性收集了一组接受FNA治疗的甲状腺结节患者。这些结节最初由放射科医生根据超声特征评估为“可疑恶性肿瘤”,遵循标准的临床实践,这促使进一步的FNA手术。使用基于深度学习的AI系统对这些结节的超声图像进行重新评估,并从良性结节的正确识别和恶性结节的错误识别两方面评估其诊断性能。计算灵敏度、特异度和受试者工作特征曲线下面积等性能指标。此外,回顾性地收集了一个单独的比较队列,以比较人工智能系统与放射科医生正确识别良性甲状腺结节的能力。结果:验证队列包括4572个甲状腺结节(良性:n=3134, 68.5%;恶性:n=1438, 31.5%)。AI正确识别了2719个良性结节(86.8%),将不必要的fna从68.5%(3134/4572)减少到9.1%(415/4572)。然而,123例恶性结节(占恶性病例的8.6%)被误认为是良性的,其中大多数是低怀疑或中度怀疑。在比较队列中,人工智能成功识别了81.4%(96/118)的良性结节。它的表现优于初级和高级放射科医生,后者分别只有40%和55%。人工智能模型的曲线下面积(AUC)为0.88 (95% CI 0.85-0.91),与初级放射科医生(AUC=0.43, 95% CI 0.36-0.50;P=.002)和高级放射科医师(AUC=0.63, 95% CI 0.55-0.70;P = .003)。结论:与放射科医生相比,人工智能可以更好地识别被放射科医生初步评估为恶性的良性结节,在减少不必要的fna方面起到“守门员”的作用。然而,主动监测仍然是必要的所有这些结节,因为极少量的低侵袭性恶性结节可能被错误地识别。
期刊介绍:
JMIR Medical Informatics (JMI, ISSN 2291-9694) is a top-rated, tier A journal which focuses on clinical informatics, big data in health and health care, decision support for health professionals, electronic health records, ehealth infrastructures and implementation. It has a focus on applied, translational research, with a broad readership including clinicians, CIOs, engineers, industry and health informatics professionals.
Published by JMIR Publications, publisher of the Journal of Medical Internet Research (JMIR), the leading eHealth/mHealth journal (Impact Factor 2016: 5.175), JMIR Med Inform has a slightly different scope (emphasizing more on applications for clinicians and health professionals rather than consumers/citizens, which is the focus of JMIR), publishes even faster, and also allows papers which are more technical or more formative than what would be published in the Journal of Medical Internet Research.