Ziyan Huang , Zhongying Deng , Jin Ye , Haoyu Wang , Yanzhou Su , Tianbin Li , Hui Sun , Junlong Cheng , Jianpin Chen , Junjun He , Yun Gu , Shaoting Zhang , Lixu Gu , Yu Qiao
{"title":"A-Eval: A benchmark for cross-dataset and cross-modality evaluation of abdominal multi-organ segmentation","authors":"Ziyan Huang , Zhongying Deng , Jin Ye , Haoyu Wang , Yanzhou Su , Tianbin Li , Hui Sun , Junlong Cheng , Jianpin Chen , Junjun He , Yun Gu , Shaoting Zhang , Lixu Gu , Yu Qiao","doi":"10.1016/j.media.2025.103499","DOIUrl":null,"url":null,"abstract":"<div><div>Although deep learning has revolutionized abdominal multi-organ segmentation, its models often struggle with generalization due to training on small-scale, specific datasets and modalities. The recent emergence of large-scale datasets may mitigate this issue, but some important questions remain unsolved: <strong>Can models trained on these large datasets generalize well across different datasets and imaging modalities? If yes/no, how can we further improve their generalizability?</strong> To address these questions, we introduce A-Eval, a benchmark for the cross-dataset and cross-modality Evaluation (’Eval’) of Abdominal (’A’) multi-organ segmentation, integrating seven datasets across CT and MRI modalities. Our evaluations indicate that significant domain gaps persist despite larger data scales. While increased datasets improve generalization, model performance on unseen data remains inconsistent. Joint training across multiple datasets and modalities enhances generalization, though annotation inconsistencies pose challenges. These findings highlight the need for diverse and well-curated training data across various clinical scenarios and modalities to develop robust medical imaging models. The code and pre-trained models are available at <span><span>https://github.com/uni-medical/A-Eval</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"101 ","pages":"Article 103499"},"PeriodicalIF":10.7000,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical image analysis","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1361841525000477","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Although deep learning has revolutionized abdominal multi-organ segmentation, its models often struggle with generalization due to training on small-scale, specific datasets and modalities. The recent emergence of large-scale datasets may mitigate this issue, but some important questions remain unsolved: Can models trained on these large datasets generalize well across different datasets and imaging modalities? If yes/no, how can we further improve their generalizability? To address these questions, we introduce A-Eval, a benchmark for the cross-dataset and cross-modality Evaluation (’Eval’) of Abdominal (’A’) multi-organ segmentation, integrating seven datasets across CT and MRI modalities. Our evaluations indicate that significant domain gaps persist despite larger data scales. While increased datasets improve generalization, model performance on unseen data remains inconsistent. Joint training across multiple datasets and modalities enhances generalization, though annotation inconsistencies pose challenges. These findings highlight the need for diverse and well-curated training data across various clinical scenarios and modalities to develop robust medical imaging models. The code and pre-trained models are available at https://github.com/uni-medical/A-Eval.
期刊介绍:
Medical Image Analysis serves as a platform for sharing new research findings in the realm of medical and biological image analysis, with a focus on applications of computer vision, virtual reality, and robotics to biomedical imaging challenges. The journal prioritizes the publication of high-quality, original papers contributing to the fundamental science of processing, analyzing, and utilizing medical and biological images. It welcomes approaches utilizing biomedical image datasets across all spatial scales, from molecular/cellular imaging to tissue/organ imaging.