Axel Geysels, Giulia Garofalo, Stefan Timmerman, Lasai Barreñada, Bart De Moor, Dirk Timmerman, Wouter Froyman, Ben Van Calster
{"title":"Artificial intelligence applied to ultrasound diagnosis of pelvic gynecological tumors: a systematic review and meta-analysis.","authors":"Axel Geysels, Giulia Garofalo, Stefan Timmerman, Lasai Barreñada, Bart De Moor, Dirk Timmerman, Wouter Froyman, Ben Van Calster","doi":"10.1159/000545850","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>To perform a systematic review on artificial intelligence (AI) studies focused on identifying and differentiating pelvic gynecological tumors on ultrasound scans.</p><p><strong>Methods: </strong>Studies developing or validating AI models for diagnosing gynecological pelvic tumors on ultrasound scans were eligible for inclusion. We systematically searched PubMed, Embase, Web of Science, and Cochrane Central from their database inception until April 30th, 2024. To assess the quality of the included studies, we adapted the QUADAS-2 risk of bias tool to address the unique challenges of AI in medical imaging. Using multi-level random effects models, we performed a meta-analysis to generate summary estimates of the area under the receiver operating characteristic curve (AUC), sensitivity, and specificity. To provide a reference point of current diagnostic support tools for ultrasound examiners, we descriptively compared the pooled performance to that of the well-recognized ADNEX model on external validation. Subgroup analyses were performed to explore sources of heterogeneity.</p><p><strong>Results: </strong>From 9151 records retrieved, 44 studies were eligible: 40 on ovarian, three on endometrial, and one on myometrial pathology. Overall, 95% were at high risk of bias - primarily due to inappropriate study inclusion criteria, the absence of a patient-level split of training and testing image sets, and no calibration assessment. For ovarian tumors, the summary AUC for AI models distinguishing benign from malignant tumors was 0.89 (95% CI: 0.85-0.92). In lower-risk studies (at least three low-risk domains), the summary AUC dropped to 0.87 (0.83-0.90), with deep learning models outperforming radiomics-based machine learning approaches in this subset. Only five studies included an external validation, and six evaluated calibration performance. In a recent systematic review of external validation studies, the ADNEX model had a pooled AUC of 0.93 (0.91-0.94) in studies at low risk of bias. Studies on endometrial and myometrial pathologies were reported individually.</p><p><strong>Conclusion: </strong>Although AI models show promising discriminative performances for diagnosing gynecological tumors on ultrasound, most studies have methodological shortcomings that result in a high risk of bias. In addition, the ADNEX model appears to outperform most AI approaches for ovarian tumors. Future research should emphasize robust study designs - ideally large, multicenter, and prospective cohorts that mirror real-world populations - along with external validation, proper calibration, and standardized reporting.</p><p><strong>Registration: </strong>This study was pre-registered with Open Science Framework (OSF): https://doi.org/10.17605/osf.io/bhkst.</p>","PeriodicalId":12952,"journal":{"name":"Gynecologic and Obstetric Investigation","volume":" ","pages":"1-29"},"PeriodicalIF":2.0000,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Gynecologic and Obstetric Investigation","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1159/000545850","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"OBSTETRICS & GYNECOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: To perform a systematic review on artificial intelligence (AI) studies focused on identifying and differentiating pelvic gynecological tumors on ultrasound scans.
Methods: Studies developing or validating AI models for diagnosing gynecological pelvic tumors on ultrasound scans were eligible for inclusion. We systematically searched PubMed, Embase, Web of Science, and Cochrane Central from their database inception until April 30th, 2024. To assess the quality of the included studies, we adapted the QUADAS-2 risk of bias tool to address the unique challenges of AI in medical imaging. Using multi-level random effects models, we performed a meta-analysis to generate summary estimates of the area under the receiver operating characteristic curve (AUC), sensitivity, and specificity. To provide a reference point of current diagnostic support tools for ultrasound examiners, we descriptively compared the pooled performance to that of the well-recognized ADNEX model on external validation. Subgroup analyses were performed to explore sources of heterogeneity.
Results: From 9151 records retrieved, 44 studies were eligible: 40 on ovarian, three on endometrial, and one on myometrial pathology. Overall, 95% were at high risk of bias - primarily due to inappropriate study inclusion criteria, the absence of a patient-level split of training and testing image sets, and no calibration assessment. For ovarian tumors, the summary AUC for AI models distinguishing benign from malignant tumors was 0.89 (95% CI: 0.85-0.92). In lower-risk studies (at least three low-risk domains), the summary AUC dropped to 0.87 (0.83-0.90), with deep learning models outperforming radiomics-based machine learning approaches in this subset. Only five studies included an external validation, and six evaluated calibration performance. In a recent systematic review of external validation studies, the ADNEX model had a pooled AUC of 0.93 (0.91-0.94) in studies at low risk of bias. Studies on endometrial and myometrial pathologies were reported individually.
Conclusion: Although AI models show promising discriminative performances for diagnosing gynecological tumors on ultrasound, most studies have methodological shortcomings that result in a high risk of bias. In addition, the ADNEX model appears to outperform most AI approaches for ovarian tumors. Future research should emphasize robust study designs - ideally large, multicenter, and prospective cohorts that mirror real-world populations - along with external validation, proper calibration, and standardized reporting.
Registration: This study was pre-registered with Open Science Framework (OSF): https://doi.org/10.17605/osf.io/bhkst.
目的:对人工智能(AI)在妇科盆腔肿瘤超声识别与鉴别方面的研究进行系统综述。方法:开发或验证用于超声扫描诊断妇科盆腔肿瘤的人工智能模型的研究符合纳入条件。我们系统地检索了PubMed, Embase, Web of Science和Cochrane Central,从他们的数据库建立到2024年4月30日。为了评估纳入研究的质量,我们采用了QUADAS-2偏倚风险工具来解决人工智能在医学成像中的独特挑战。使用多层级随机效应模型,我们进行了一项荟萃分析,以生成接受者工作特征曲线下面积(AUC)、敏感性和特异性的汇总估计。为了为超声检查人员提供当前诊断支持工具的参考点,我们描述性地比较了外部验证中公认的ADNEX模型的综合性能。进行亚组分析以探索异质性的来源。结果:从检索到的9151项记录中,有44项研究符合条件:40项关于卵巢,3项关于子宫内膜,1项关于子宫肌瘤病理。总体而言,95%的研究存在高偏倚风险——主要是由于不适当的研究纳入标准,缺乏训练和测试图像集的患者水平分割,以及没有校准评估。对于卵巢肿瘤,人工智能模型区分良恶性肿瘤的总AUC为0.89 (95% CI: 0.85-0.92)。在低风险研究(至少三个低风险领域)中,总结AUC降至0.87(0.83-0.90),深度学习模型在该子集中优于基于放射组学的机器学习方法。只有5项研究包括外部验证,6项研究评估了校准性能。在最近对外部验证研究的系统回顾中,ADNEX模型在低偏倚风险研究中的合并AUC为0.93(0.91-0.94)。分别报道了子宫内膜和子宫肌瘤病理的研究。结论:尽管人工智能模型在超声诊断妇科肿瘤方面表现出良好的鉴别性能,但大多数研究都存在方法学上的缺陷,导致偏倚风险较高。此外,ADNEX模型在卵巢肿瘤方面的表现似乎优于大多数人工智能方法。未来的研究应强调可靠的研究设计——理想的大型、多中心、反映现实世界人群的前瞻性队列——以及外部验证、适当校准和标准化报告。注册:本研究已在开放科学框架(OSF)上预先注册:https://doi.org/10.17605/osf.io/bhkst。
期刊介绍:
This journal covers the most active and promising areas of current research in gynecology and obstetrics. Invited, well-referenced reviews by noted experts keep readers in touch with the general framework and direction of international study. Original papers report selected experimental and clinical investigations in all fields related to gynecology, obstetrics and reproduction. Short communications are published to allow immediate discussion of new data. The international and interdisciplinary character of this periodical provides an avenue to less accessible sources and to worldwide research for investigators and practitioners.