Diagnostic Performance of Computed Tomography-Based Artificial Intelligence for Early Recurrence of Cholangiocarcinoma: Systematic Review and Meta-Analysis.
{"title":"Diagnostic Performance of Computed Tomography-Based Artificial Intelligence for Early Recurrence of Cholangiocarcinoma: Systematic Review and Meta-Analysis.","authors":"Jie Chen, Jianxin Xi, Tianyu Chen, Lulu Yang, Kaijia Liu, Xiaobo Ding","doi":"10.2196/78306","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Despite artificial intelligence (AI) models demonstrating high predictive accuracy for early cholangiocarcinoma recurrence, their clinical application faces challenges, such as reproducibility, generalizability, hidden biases, and uncertain performance across diverse datasets and populations, raising concerns about their practical applicability.</p><p><strong>Objective: </strong>This meta-analysis aims to systematically assess the diagnostic performance of AI models using computed tomography (CT) imaging to predict early recurrence of cholangiocarcinoma.</p><p><strong>Methods: </strong>A systematic search was conducted in PubMed, Embase, and Web of Science for studies published up to May 2025. Studies were selected based on the Participants, Index test, Target condition, Reference standard, Outcomes, and Setting (PITROS) framework. Participants included patients diagnosed with cholangiocarcinoma (including intrahepatic and extrahepatic locations). The index test was AI techniques applied to CT imaging for early recurrence prediction (defined as within 1 year), while the target condition was early recurrence of cholangiocarcinoma (positive group: recurrence; negative group: no recurrence). The reference standard was pathological diagnosis or imaging follow-up confirming recurrence. Outcomes included sensitivity, specificity, diagnostic odds ratio (DOR), and area under the receiver operating characteristic curve (AUC), assessed in both internal and external validation cohorts. The setting comprised retrospective or prospective studies using hospital datasets. Methodological quality was assessed using an optimized version of the revised Quality Assessment of Diagnostic Accuracy Studies-2 tool. Heterogeneity was assessed using the I² statistic. Pooled sensitivity, specificity, DOR, and AUC were calculated using a bivariate random-effects model.</p><p><strong>Results: </strong>A total of 9 studies with 30 datasets involving 1537 patients were included. In internal validation cohorts, CT-based AI models showed a pooled sensitivity of 0.87 (95% CI 0.81-0.92), specificity of 0.85 (95% CI 0.79-0.89), DOR of 37.71 (95% CI 18.35-77.51), and AUC of 0.93 (95% CI 0.90-0.94). In external validation cohorts, pooled sensitivity was 0.87 (95% CI 0.81-0.91), specificity was 0.82 (95% CI 0.77-0.86), DOR was 30.81 (95% CI 18.79-50.52), and AUC was 0.85 (95% CI 0.82-0.88). The AUC was significantly lower in external validation cohorts compared to internal validation cohorts (P<.001).</p><p><strong>Conclusions: </strong>Our results show that CT-based AI models predict early cholangiocarcinoma recurrence with high performance in internal validation sets and moderate performance in external validation sets. However, the high heterogeneity observed may impact the robustness of these results. Future research should focus on prospective studies and establishing standardized gold standards to further validate the clinical applicability and generalizability of AI models.</p>","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":" ","pages":"e78306"},"PeriodicalIF":6.0000,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12491900/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Internet Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/78306","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Despite artificial intelligence (AI) models demonstrating high predictive accuracy for early cholangiocarcinoma recurrence, their clinical application faces challenges, such as reproducibility, generalizability, hidden biases, and uncertain performance across diverse datasets and populations, raising concerns about their practical applicability.
Objective: This meta-analysis aims to systematically assess the diagnostic performance of AI models using computed tomography (CT) imaging to predict early recurrence of cholangiocarcinoma.
Methods: A systematic search was conducted in PubMed, Embase, and Web of Science for studies published up to May 2025. Studies were selected based on the Participants, Index test, Target condition, Reference standard, Outcomes, and Setting (PITROS) framework. Participants included patients diagnosed with cholangiocarcinoma (including intrahepatic and extrahepatic locations). The index test was AI techniques applied to CT imaging for early recurrence prediction (defined as within 1 year), while the target condition was early recurrence of cholangiocarcinoma (positive group: recurrence; negative group: no recurrence). The reference standard was pathological diagnosis or imaging follow-up confirming recurrence. Outcomes included sensitivity, specificity, diagnostic odds ratio (DOR), and area under the receiver operating characteristic curve (AUC), assessed in both internal and external validation cohorts. The setting comprised retrospective or prospective studies using hospital datasets. Methodological quality was assessed using an optimized version of the revised Quality Assessment of Diagnostic Accuracy Studies-2 tool. Heterogeneity was assessed using the I² statistic. Pooled sensitivity, specificity, DOR, and AUC were calculated using a bivariate random-effects model.
Results: A total of 9 studies with 30 datasets involving 1537 patients were included. In internal validation cohorts, CT-based AI models showed a pooled sensitivity of 0.87 (95% CI 0.81-0.92), specificity of 0.85 (95% CI 0.79-0.89), DOR of 37.71 (95% CI 18.35-77.51), and AUC of 0.93 (95% CI 0.90-0.94). In external validation cohorts, pooled sensitivity was 0.87 (95% CI 0.81-0.91), specificity was 0.82 (95% CI 0.77-0.86), DOR was 30.81 (95% CI 18.79-50.52), and AUC was 0.85 (95% CI 0.82-0.88). The AUC was significantly lower in external validation cohorts compared to internal validation cohorts (P<.001).
Conclusions: Our results show that CT-based AI models predict early cholangiocarcinoma recurrence with high performance in internal validation sets and moderate performance in external validation sets. However, the high heterogeneity observed may impact the robustness of these results. Future research should focus on prospective studies and establishing standardized gold standards to further validate the clinical applicability and generalizability of AI models.
期刊介绍:
The Journal of Medical Internet Research (JMIR) is a highly respected publication in the field of health informatics and health services. With a founding date in 1999, JMIR has been a pioneer in the field for over two decades.
As a leader in the industry, the journal focuses on digital health, data science, health informatics, and emerging technologies for health, medicine, and biomedical research. It is recognized as a top publication in these disciplines, ranking in the first quartile (Q1) by Impact Factor.
Notably, JMIR holds the prestigious position of being ranked #1 on Google Scholar within the "Medical Informatics" discipline.