{"title":"The diagnostic value of artificial intelligence in differentiating follicular thyroid cancer from follicular thyroid adenoma: A meta-analysis.","authors":"Di Wu, Chengfei Sun, Yilin Hou, Jiayue Sun, Xiyu Zhang, Yunfei Zhang","doi":"10.1097/MD.0000000000044745","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Follicular thyroid carcinoma (FTC) is the second most common thyroid malignancy but is challenging to preoperatively distinguish from follicular adenoma. Artificial intelligence (AI) has emerged as an auxiliary diagnostic tool, yet published studies show variable performance. This meta-analysis aims to evaluate the overall diagnostic accuracy of AI in differentiating FTC from benign lesions.</p><p><strong>Methods: </strong>Literature searches were independently conducted across the PubMed, Embase & Medline (via Embase.com), Web of Science, Cochrane Library, and Ovid English medical databases. The diagnostic accuracy of AI was compared against the reference standard of histopathology. Pooled sensitivity, specificity, diagnostic odds ratio and area under the curve were calculated to assess AI accuracy. Meta-regression analyses were performed to investigate heterogeneity related to test set size, validation strategy, and machine learning model type.</p><p><strong>Results: </strong>We analyzed a total of 7 studies involving 3163 follicular thyroid neoplasms (comprising 1876 follicular thyroid adenomas [FTAs] and 1287 FTCs). The pooled sensitivity and specificity of AI for differentiating FTC from FTA were 0.73 (95% CI: 0.70-0.75) and 0.87 (95% CI: 0.86-0.89), respectively. The pooled positive and negative likelihood ratios were 6.19 (95% CI: 3.92-9.79) and 0.28 (95% CI: 0.17-0.46). The diagnostic odds ratio was 22.81 (95% CI: 10.17-51.16), and the area under the curve was 0.94. Meta-regression results indicated no significant heterogeneity associated with validation strategy (P = .25). However, test set size (P = .02) and publication year (P = .04) were identified as potential significant sources of heterogeneity. Subgroup analyses revealed that studies with a test set size > 1000 cases demonstrated superior accuracy compared to those with <1000 cases. Regarding validation strategy, studies utilizing cross-validation yielded better performance than those using holdout validation.</p><p><strong>Conclusion: </strong>Overall, AI demonstrates promising diagnostic utility in differentiating FTC and FTA. Studies employing larger test sets (>1000 cases) achieved higher accuracy than those with smaller test sets (<1000 cases). Furthermore, validation using cross-validation strategies outperformed non-cross-validation (holdout) approaches.</p>","PeriodicalId":18549,"journal":{"name":"Medicine","volume":"104 40","pages":"e44745"},"PeriodicalIF":1.4000,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12499825/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/MD.0000000000044745","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Follicular thyroid carcinoma (FTC) is the second most common thyroid malignancy but is challenging to preoperatively distinguish from follicular adenoma. Artificial intelligence (AI) has emerged as an auxiliary diagnostic tool, yet published studies show variable performance. This meta-analysis aims to evaluate the overall diagnostic accuracy of AI in differentiating FTC from benign lesions.
Methods: Literature searches were independently conducted across the PubMed, Embase & Medline (via Embase.com), Web of Science, Cochrane Library, and Ovid English medical databases. The diagnostic accuracy of AI was compared against the reference standard of histopathology. Pooled sensitivity, specificity, diagnostic odds ratio and area under the curve were calculated to assess AI accuracy. Meta-regression analyses were performed to investigate heterogeneity related to test set size, validation strategy, and machine learning model type.
Results: We analyzed a total of 7 studies involving 3163 follicular thyroid neoplasms (comprising 1876 follicular thyroid adenomas [FTAs] and 1287 FTCs). The pooled sensitivity and specificity of AI for differentiating FTC from FTA were 0.73 (95% CI: 0.70-0.75) and 0.87 (95% CI: 0.86-0.89), respectively. The pooled positive and negative likelihood ratios were 6.19 (95% CI: 3.92-9.79) and 0.28 (95% CI: 0.17-0.46). The diagnostic odds ratio was 22.81 (95% CI: 10.17-51.16), and the area under the curve was 0.94. Meta-regression results indicated no significant heterogeneity associated with validation strategy (P = .25). However, test set size (P = .02) and publication year (P = .04) were identified as potential significant sources of heterogeneity. Subgroup analyses revealed that studies with a test set size > 1000 cases demonstrated superior accuracy compared to those with <1000 cases. Regarding validation strategy, studies utilizing cross-validation yielded better performance than those using holdout validation.
Conclusion: Overall, AI demonstrates promising diagnostic utility in differentiating FTC and FTA. Studies employing larger test sets (>1000 cases) achieved higher accuracy than those with smaller test sets (<1000 cases). Furthermore, validation using cross-validation strategies outperformed non-cross-validation (holdout) approaches.
期刊介绍:
Medicine is now a fully open access journal, providing authors with a distinctive new service offering continuous publication of original research across a broad spectrum of medical scientific disciplines and sub-specialties.
As an open access title, Medicine will continue to provide authors with an established, trusted platform for the publication of their work. To ensure the ongoing quality of Medicine’s content, the peer-review process will only accept content that is scientifically, technically and ethically sound, and in compliance with standard reporting guidelines.