Diagnostic Performance of Computed Tomography-Based Artificial Intelligence for Early Recurrence of Cholangiocarcinoma: Systematic Review and Meta-Analysis.

IF 6 2区医学 Q1 HEALTH CARE SCIENCES & SERVICES

Journal of Medical Internet Research Pub Date : 2025-09-18 DOI:10.2196/78306

Jie Chen, Jianxin Xi, Tianyu Chen, Lulu Yang, Kaijia Liu, Xiaobo Ding

{"title":"Diagnostic Performance of Computed Tomography-Based Artificial Intelligence for Early Recurrence of Cholangiocarcinoma: Systematic Review and Meta-Analysis.","authors":"Jie Chen, Jianxin Xi, Tianyu Chen, Lulu Yang, Kaijia Liu, Xiaobo Ding","doi":"10.2196/78306","DOIUrl":null,"url":null,"abstract":"Background: Despite artificial intelligence (AI) models demonstrating high predictive accuracy for early cholangiocarcinoma recurrence, their clinical application faces challenges, such as reproducibility, generalizability, hidden biases, and uncertain performance across diverse datasets and populations, raising concerns about their practical applicability.Objective: This meta-analysis aims to systematically assess the diagnostic performance of AI models using computed tomography (CT) imaging to predict early recurrence of cholangiocarcinoma.Methods: A systematic search was conducted in PubMed, Embase, and Web of Science for studies published up to May 2025. Studies were selected based on the Participants, Index test, Target condition, Reference standard, Outcomes, and Setting (PITROS) framework. Participants included patients diagnosed with cholangiocarcinoma (including intrahepatic and extrahepatic locations). The index test was AI techniques applied to CT imaging for early recurrence prediction (defined as within 1 year), while the target condition was early recurrence of cholangiocarcinoma (positive group: recurrence; negative group: no recurrence). The reference standard was pathological diagnosis or imaging follow-up confirming recurrence. Outcomes included sensitivity, specificity, diagnostic odds ratio (DOR), and area under the receiver operating characteristic curve (AUC), assessed in both internal and external validation cohorts. The setting comprised retrospective or prospective studies using hospital datasets. Methodological quality was assessed using an optimized version of the revised Quality Assessment of Diagnostic Accuracy Studies-2 tool. Heterogeneity was assessed using the I² statistic. Pooled sensitivity, specificity, DOR, and AUC were calculated using a bivariate random-effects model.Results: A total of 9 studies with 30 datasets involving 1537 patients were included. In internal validation cohorts, CT-based AI models showed a pooled sensitivity of 0.87 (95% CI 0.81-0.92), specificity of 0.85 (95% CI 0.79-0.89), DOR of 37.71 (95% CI 18.35-77.51), and AUC of 0.93 (95% CI 0.90-0.94). In external validation cohorts, pooled sensitivity was 0.87 (95% CI 0.81-0.91), specificity was 0.82 (95% CI 0.77-0.86), DOR was 30.81 (95% CI 18.79-50.52), and AUC was 0.85 (95% CI 0.82-0.88). The AUC was significantly lower in external validation cohorts compared to internal validation cohorts (P<.001).Conclusions: Our results show that CT-based AI models predict early cholangiocarcinoma recurrence with high performance in internal validation sets and moderate performance in external validation sets. However, the high heterogeneity observed may impact the robustness of these results. Future research should focus on prospective studies and establishing standardized gold standards to further validate the clinical applicability and generalizability of AI models.","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":" ","pages":"e78306"},"PeriodicalIF":6.0000,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12491900/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Internet Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/78306","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Despite artificial intelligence (AI) models demonstrating high predictive accuracy for early cholangiocarcinoma recurrence, their clinical application faces challenges, such as reproducibility, generalizability, hidden biases, and uncertain performance across diverse datasets and populations, raising concerns about their practical applicability.

Objective: This meta-analysis aims to systematically assess the diagnostic performance of AI models using computed tomography (CT) imaging to predict early recurrence of cholangiocarcinoma.

Methods: A systematic search was conducted in PubMed, Embase, and Web of Science for studies published up to May 2025. Studies were selected based on the Participants, Index test, Target condition, Reference standard, Outcomes, and Setting (PITROS) framework. Participants included patients diagnosed with cholangiocarcinoma (including intrahepatic and extrahepatic locations). The index test was AI techniques applied to CT imaging for early recurrence prediction (defined as within 1 year), while the target condition was early recurrence of cholangiocarcinoma (positive group: recurrence; negative group: no recurrence). The reference standard was pathological diagnosis or imaging follow-up confirming recurrence. Outcomes included sensitivity, specificity, diagnostic odds ratio (DOR), and area under the receiver operating characteristic curve (AUC), assessed in both internal and external validation cohorts. The setting comprised retrospective or prospective studies using hospital datasets. Methodological quality was assessed using an optimized version of the revised Quality Assessment of Diagnostic Accuracy Studies-2 tool. Heterogeneity was assessed using the I² statistic. Pooled sensitivity, specificity, DOR, and AUC were calculated using a bivariate random-effects model.

Results: A total of 9 studies with 30 datasets involving 1537 patients were included. In internal validation cohorts, CT-based AI models showed a pooled sensitivity of 0.87 (95% CI 0.81-0.92), specificity of 0.85 (95% CI 0.79-0.89), DOR of 37.71 (95% CI 18.35-77.51), and AUC of 0.93 (95% CI 0.90-0.94). In external validation cohorts, pooled sensitivity was 0.87 (95% CI 0.81-0.91), specificity was 0.82 (95% CI 0.77-0.86), DOR was 30.81 (95% CI 18.79-50.52), and AUC was 0.85 (95% CI 0.82-0.88). The AUC was significantly lower in external validation cohorts compared to internal validation cohorts (P<.001).

Conclusions: Our results show that CT-based AI models predict early cholangiocarcinoma recurrence with high performance in internal validation sets and moderate performance in external validation sets. However, the high heterogeneity observed may impact the robustness of these results. Future research should focus on prospective studies and establishing standardized gold standards to further validate the clinical applicability and generalizability of AI models.

查看原文本刊更多论文

基于ct的人工智能对胆管癌早期复发的诊断效果：系统综述和荟萃分析。

背景：尽管人工智能模型对早期胆管癌（CCA）复发具有较高的预测准确性，但其临床应用面临着可重复性、可泛化性、隐藏偏差以及在不同数据集和人群中的不确定表现等挑战，这引起了人们对其实际适用性的担忧。目的：本荟萃分析旨在系统评估人工智能（AI）模型利用计算机断层扫描（CT）成像预测CCA早期复发的诊断性能。方法：系统检索PubMed， Embase和Web of Science中截至2025年5月发表的研究。根据PIRTOS框架选择研究。参与者(P)：诊断为CCA的患者（包括肝内和肝外部位）。指标测试(I): AI技术应用于CT成像，预测早期复发（定义为1年内）。参考标准(R)：病理诊断或影像学随访证实复发。靶条件(T)：早期CCA复发（阳性组：复发，阴性组：无复发）。结果(O)：在内部和外部验证队列中评估敏感性、特异性、诊断优势比（DOR）和受试者工作特征曲线下面积（AUC）。环境(S)：使用医院数据集进行回顾性或前瞻性研究。采用经修订的QUADAS-2工具的优化版本评估方法学质量。采用I²统计量评估异质性。使用双变量随机效应模型计算合并敏感性、特异性、DOR和AUC。结果：纳入9项研究，30个数据集，涉及1537例患者。在内部验证队列中，基于ct的人工智能模型的总灵敏度为0.87 (95% CI: 0.81-0.92)，特异性为0.85 (95% CI: 0.79-0.89)， DOR为37.71 (95% CI: 18.35-77.51)， AUC为0.93 （95% CI: 0.90-0.94）。在外部验证队列中，合并敏感性为0.87 (95% CI: 0.81-0.91)，特异性为0.82 (95% CI: 0.77-0.86)， DOR为30.81 (95% CI: 18.79-50.52)， AUC为0.85 （95% CI: 0.82-0.88）。与内部验证队列相比，外部验证队列的AUC显著降低（P < 0.001）。结论：我们的研究结果表明，基于ct的AI模型预测早期CCA复发，在内部验证集中表现良好，在外部验证集中表现中等。然而，观察到的高度异质性可能会影响这些结果的稳健性。未来的研究应注重前瞻性研究，建立标准化的金标准，进一步验证人工智能模型的临床适用性和泛化性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Medical Internet Research 医学-卫生保健

CiteScore

14.40

自引率

5.40%

发文量

654

审稿时长

1 months

期刊介绍： The Journal of Medical Internet Research (JMIR) is a highly respected publication in the field of health informatics and health services. With a founding date in 1999, JMIR has been a pioneer in the field for over two decades. As a leader in the industry, the journal focuses on digital health, data science, health informatics, and emerging technologies for health, medicine, and biomedical research. It is recognized as a top publication in these disciplines, ranking in the first quartile (Q1) by Impact Factor. Notably, JMIR holds the prestigious position of being ranked #1 on Google Scholar within the "Medical Informatics" discipline.