The diagnostic value of artificial intelligence in differentiating follicular thyroid cancer from follicular thyroid adenoma: A meta-analysis.

IF 1.4 4区医学 Q2 MEDICINE, GENERAL & INTERNAL

Medicine Pub Date : 2025-10-03 DOI:10.1097/MD.0000000000044745

Di Wu, Chengfei Sun, Yilin Hou, Jiayue Sun, Xiyu Zhang, Yunfei Zhang

{"title":"The diagnostic value of artificial intelligence in differentiating follicular thyroid cancer from follicular thyroid adenoma: A meta-analysis.","authors":"Di Wu, Chengfei Sun, Yilin Hou, Jiayue Sun, Xiyu Zhang, Yunfei Zhang","doi":"10.1097/MD.0000000000044745","DOIUrl":null,"url":null,"abstract":"Background: Follicular thyroid carcinoma (FTC) is the second most common thyroid malignancy but is challenging to preoperatively distinguish from follicular adenoma. Artificial intelligence (AI) has emerged as an auxiliary diagnostic tool, yet published studies show variable performance. This meta-analysis aims to evaluate the overall diagnostic accuracy of AI in differentiating FTC from benign lesions.Methods: Literature searches were independently conducted across the PubMed, Embase & Medline (via Embase.com), Web of Science, Cochrane Library, and Ovid English medical databases. The diagnostic accuracy of AI was compared against the reference standard of histopathology. Pooled sensitivity, specificity, diagnostic odds ratio and area under the curve were calculated to assess AI accuracy. Meta-regression analyses were performed to investigate heterogeneity related to test set size, validation strategy, and machine learning model type.Results: We analyzed a total of 7 studies involving 3163 follicular thyroid neoplasms (comprising 1876 follicular thyroid adenomas [FTAs] and 1287 FTCs). The pooled sensitivity and specificity of AI for differentiating FTC from FTA were 0.73 (95% CI: 0.70-0.75) and 0.87 (95% CI: 0.86-0.89), respectively. The pooled positive and negative likelihood ratios were 6.19 (95% CI: 3.92-9.79) and 0.28 (95% CI: 0.17-0.46). The diagnostic odds ratio was 22.81 (95% CI: 10.17-51.16), and the area under the curve was 0.94. Meta-regression results indicated no significant heterogeneity associated with validation strategy (P = .25). However, test set size (P = .02) and publication year (P = .04) were identified as potential significant sources of heterogeneity. Subgroup analyses revealed that studies with a test set size > 1000 cases demonstrated superior accuracy compared to those with <1000 cases. Regarding validation strategy, studies utilizing cross-validation yielded better performance than those using holdout validation.Conclusion: Overall, AI demonstrates promising diagnostic utility in differentiating FTC and FTA. Studies employing larger test sets (>1000 cases) achieved higher accuracy than those with smaller test sets (<1000 cases). Furthermore, validation using cross-validation strategies outperformed non-cross-validation (holdout) approaches.","PeriodicalId":18549,"journal":{"name":"Medicine","volume":"104 40","pages":"e44745"},"PeriodicalIF":1.4000,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12499825/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/MD.0000000000044745","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Follicular thyroid carcinoma (FTC) is the second most common thyroid malignancy but is challenging to preoperatively distinguish from follicular adenoma. Artificial intelligence (AI) has emerged as an auxiliary diagnostic tool, yet published studies show variable performance. This meta-analysis aims to evaluate the overall diagnostic accuracy of AI in differentiating FTC from benign lesions.

Methods: Literature searches were independently conducted across the PubMed, Embase & Medline (via Embase.com), Web of Science, Cochrane Library, and Ovid English medical databases. The diagnostic accuracy of AI was compared against the reference standard of histopathology. Pooled sensitivity, specificity, diagnostic odds ratio and area under the curve were calculated to assess AI accuracy. Meta-regression analyses were performed to investigate heterogeneity related to test set size, validation strategy, and machine learning model type.

Results: We analyzed a total of 7 studies involving 3163 follicular thyroid neoplasms (comprising 1876 follicular thyroid adenomas [FTAs] and 1287 FTCs). The pooled sensitivity and specificity of AI for differentiating FTC from FTA were 0.73 (95% CI: 0.70-0.75) and 0.87 (95% CI: 0.86-0.89), respectively. The pooled positive and negative likelihood ratios were 6.19 (95% CI: 3.92-9.79) and 0.28 (95% CI: 0.17-0.46). The diagnostic odds ratio was 22.81 (95% CI: 10.17-51.16), and the area under the curve was 0.94. Meta-regression results indicated no significant heterogeneity associated with validation strategy (P = .25). However, test set size (P = .02) and publication year (P = .04) were identified as potential significant sources of heterogeneity. Subgroup analyses revealed that studies with a test set size > 1000 cases demonstrated superior accuracy compared to those with <1000 cases. Regarding validation strategy, studies utilizing cross-validation yielded better performance than those using holdout validation.

Conclusion: Overall, AI demonstrates promising diagnostic utility in differentiating FTC and FTA. Studies employing larger test sets (>1000 cases) achieved higher accuracy than those with smaller test sets (<1000 cases). Furthermore, validation using cross-validation strategies outperformed non-cross-validation (holdout) approaches.

Abstract Image

查看原文本刊更多论文

人工智能在区分滤泡性甲状腺癌和滤泡性甲状腺腺瘤中的诊断价值：荟萃分析。

背景：滤泡性甲状腺癌（FTC）是第二常见的甲状腺恶性肿瘤，但术前很难与滤泡性腺瘤区分。人工智能（AI）已成为辅助诊断工具，但已发表的研究显示其表现不一。本荟萃分析旨在评估人工智能在区分FTC与良性病变方面的总体诊断准确性。方法：通过PubMed、Embase & Medline（通过Embase.com）、Web of Science、Cochrane Library和Ovid英文医学数据库独立进行文献检索。将人工智能的诊断准确率与组织病理学参考标准进行比较。计算合并敏感性、特异性、诊断优势比和曲线下面积来评估人工智能的准确性。进行meta回归分析以调查与测试集大小、验证策略和机器学习模型类型相关的异质性。结果：我们共分析了7项研究，涉及3163例滤泡性甲状腺肿瘤（包括1876例滤泡性甲状腺腺瘤[FTAs]和1287例FTCs）。人工智能鉴别FTC和FTA的总敏感性和特异性分别为0.73 （95% CI: 0.70-0.75）和0.87 （95% CI: 0.86-0.89）。合并阳性和阴性似然比分别为6.19 （95% CI: 3.92-9.79）和0.28 （95% CI: 0.17-0.46）。诊断优势比为22.81 (95% CI: 10.17-51.16)，曲线下面积为0.94。meta回归结果显示验证策略没有显著的异质性（P = .25）。然而，测试集大小（P = .02）和出版年份（P = .04）被认为是潜在的显著异质性来源。亚组分析显示，与测试集大小为1000例的研究相比，测试集大小为1000例的研究显示出更高的准确性。结论：总体而言，人工智能在区分FTC和FTA方面具有很好的诊断效用。使用较大测试集（100 - 1000例）的研究比使用较小测试集(

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Medicine 医学-医学：内科

CiteScore

2.80

自引率

0.00%

发文量

4342

审稿时长

>12 weeks

期刊介绍： Medicine is now a fully open access journal, providing authors with a distinctive new service offering continuous publication of original research across a broad spectrum of medical scientific disciplines and sub-specialties. As an open access title, Medicine will continue to provide authors with an established, trusted platform for the publication of their work. To ensure the ongoing quality of Medicine’s content, the peer-review process will only accept content that is scientifically, technically and ethically sound, and in compliance with standard reporting guidelines.