In Silico Digital Breast Tomosynthesis Dataset for the Comparative Analysis of Deep Learning Models in Tumor Segmentation.

Journal of imaging informatics in medicine Pub Date : 2025-08-04 DOI:10.1007/s10278-025-01626-z

Cristina Alfaro Vergara, Nicolás Araya Caro, Domingo Mery Quiroz, Claudia Prieto Vasquez

{"title":"In Silico Digital Breast Tomosynthesis Dataset for the Comparative Analysis of Deep Learning Models in Tumor Segmentation.","authors":"Cristina Alfaro Vergara, Nicolás Araya Caro, Domingo Mery Quiroz, Claudia Prieto Vasquez","doi":"10.1007/s10278-025-01626-z","DOIUrl":null,"url":null,"abstract":"<p><p>The scarcity of publicly available digital breast tomosynthesis (DBT) datasets significantly limits the development of robust deep learning (DL) models for breast tumor segmentation. In this exploratory proof-of-concept study, we assess the viability of in silico-generated DBT data as a training source for tumor segmentation. A dataset of 230 two-dimensional (2D) regions of interest (ROIs) derived from FDA-cleared software and encompassing a spectrum of breast densities and tumor complexities, was used to train 13 DL models, including U-Net, FCN, DeepLabv3, and DeepLabv3 + architectures. Each model was trained either from scratch or fine-tuned using COCO-pretrained weights (ResNet50/101 backbones). Performance was evaluated using F1-score, intersection over union (IoU), precision, and recall. Among all models, U-Net trained from scratch and DeepLabv3 + fine-tuned with ResNet50 achieved the highest and most consistent results (F1-scores of 82.52% and 84.98%, and per-image IoUs of 78.49% and 83.77%, respectively). No statistically significant differences were found using the Wilcoxon signed-rank test and post hoc Bonferroni correction (α > 0.0042). To evaluate generalization across domains, the baseline U-Net model was retrained from scratch on a hybrid dataset combining in silico and real-world DBT ROIs, yielding promising results (F1-score of 79%). Despite the domain shift, these findings support the utility of in silico DBT as a complementary resource for training and benchmarking DL models, particularly in data-limited environments. This study provides foundational experimental evidence for integrating computationally generated in silico data into AI-based DBT tumor segmentation research workflows.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of imaging informatics in medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s10278-025-01626-z","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The scarcity of publicly available digital breast tomosynthesis (DBT) datasets significantly limits the development of robust deep learning (DL) models for breast tumor segmentation. In this exploratory proof-of-concept study, we assess the viability of in silico-generated DBT data as a training source for tumor segmentation. A dataset of 230 two-dimensional (2D) regions of interest (ROIs) derived from FDA-cleared software and encompassing a spectrum of breast densities and tumor complexities, was used to train 13 DL models, including U-Net, FCN, DeepLabv3, and DeepLabv3 + architectures. Each model was trained either from scratch or fine-tuned using COCO-pretrained weights (ResNet50/101 backbones). Performance was evaluated using F1-score, intersection over union (IoU), precision, and recall. Among all models, U-Net trained from scratch and DeepLabv3 + fine-tuned with ResNet50 achieved the highest and most consistent results (F1-scores of 82.52% and 84.98%, and per-image IoUs of 78.49% and 83.77%, respectively). No statistically significant differences were found using the Wilcoxon signed-rank test and post hoc Bonferroni correction (α > 0.0042). To evaluate generalization across domains, the baseline U-Net model was retrained from scratch on a hybrid dataset combining in silico and real-world DBT ROIs, yielding promising results (F1-score of 79%). Despite the domain shift, these findings support the utility of in silico DBT as a complementary resource for training and benchmarking DL models, particularly in data-limited environments. This study provides foundational experimental evidence for integrating computationally generated in silico data into AI-based DBT tumor segmentation research workflows.

查看原文本刊更多论文

基于数字乳腺断层合成数据集的肿瘤分割深度学习模型对比分析。

公开可用的数字乳腺断层合成（DBT）数据集的稀缺性极大地限制了用于乳腺肿瘤分割的鲁棒深度学习（DL）模型的发展。在这个探索性的概念验证研究中，我们评估了在硅中生成的DBT数据作为肿瘤分割的训练源的可行性。由230个二维感兴趣区域（roi）组成的数据集来自fda批准的软件，包含乳腺密度和肿瘤复杂性的谱，用于训练13个深度学习模型，包括U-Net、FCN、DeepLabv3和DeepLabv3 +架构。每个模型要么从零开始训练，要么使用coco预训练的权重（ResNet50/101主干）进行微调。性能评估使用f1评分，交叉优于联合（IoU），精度和召回率。在所有模型中，从头开始训练的U-Net和使用ResNet50进行微调的DeepLabv3 +获得了最高和最一致的结果（f1得分分别为82.52%和84.98%，每张图像IoUs分别为78.49%和83.77%）。经Wilcoxon符号秩检验和事后Bonferroni校正（α > 0.0042），两组间差异无统计学意义。为了评估跨域的泛化，基线U-Net模型在结合了计算机和现实DBT roi的混合数据集上从零开始重新训练，产生了有希望的结果（f1得分为79%）。尽管领域发生了变化，但这些发现支持了DBT作为DL模型训练和基准测试的补充资源的实用性，特别是在数据有限的环境中。本研究为将计算机生成的数据集成到基于人工智能的DBT肿瘤分割研究工作流程中提供了基础实验证据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of imaging informatics in medicine

自引率

0.00%

发文量