{"title":"In Silico Digital Breast Tomosynthesis Dataset for the Comparative Analysis of Deep Learning Models in Tumor Segmentation.","authors":"Cristina Alfaro Vergara, Nicolás Araya Caro, Domingo Mery Quiroz, Claudia Prieto Vasquez","doi":"10.1007/s10278-025-01626-z","DOIUrl":null,"url":null,"abstract":"<p><p>The scarcity of publicly available digital breast tomosynthesis (DBT) datasets significantly limits the development of robust deep learning (DL) models for breast tumor segmentation. In this exploratory proof-of-concept study, we assess the viability of in silico-generated DBT data as a training source for tumor segmentation. A dataset of 230 two-dimensional (2D) regions of interest (ROIs) derived from FDA-cleared software and encompassing a spectrum of breast densities and tumor complexities, was used to train 13 DL models, including U-Net, FCN, DeepLabv3, and DeepLabv3 + architectures. Each model was trained either from scratch or fine-tuned using COCO-pretrained weights (ResNet50/101 backbones). Performance was evaluated using F1-score, intersection over union (IoU), precision, and recall. Among all models, U-Net trained from scratch and DeepLabv3 + fine-tuned with ResNet50 achieved the highest and most consistent results (F1-scores of 82.52% and 84.98%, and per-image IoUs of 78.49% and 83.77%, respectively). No statistically significant differences were found using the Wilcoxon signed-rank test and post hoc Bonferroni correction (α > 0.0042). To evaluate generalization across domains, the baseline U-Net model was retrained from scratch on a hybrid dataset combining in silico and real-world DBT ROIs, yielding promising results (F1-score of 79%). Despite the domain shift, these findings support the utility of in silico DBT as a complementary resource for training and benchmarking DL models, particularly in data-limited environments. This study provides foundational experimental evidence for integrating computationally generated in silico data into AI-based DBT tumor segmentation research workflows.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of imaging informatics in medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s10278-025-01626-z","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The scarcity of publicly available digital breast tomosynthesis (DBT) datasets significantly limits the development of robust deep learning (DL) models for breast tumor segmentation. In this exploratory proof-of-concept study, we assess the viability of in silico-generated DBT data as a training source for tumor segmentation. A dataset of 230 two-dimensional (2D) regions of interest (ROIs) derived from FDA-cleared software and encompassing a spectrum of breast densities and tumor complexities, was used to train 13 DL models, including U-Net, FCN, DeepLabv3, and DeepLabv3 + architectures. Each model was trained either from scratch or fine-tuned using COCO-pretrained weights (ResNet50/101 backbones). Performance was evaluated using F1-score, intersection over union (IoU), precision, and recall. Among all models, U-Net trained from scratch and DeepLabv3 + fine-tuned with ResNet50 achieved the highest and most consistent results (F1-scores of 82.52% and 84.98%, and per-image IoUs of 78.49% and 83.77%, respectively). No statistically significant differences were found using the Wilcoxon signed-rank test and post hoc Bonferroni correction (α > 0.0042). To evaluate generalization across domains, the baseline U-Net model was retrained from scratch on a hybrid dataset combining in silico and real-world DBT ROIs, yielding promising results (F1-score of 79%). Despite the domain shift, these findings support the utility of in silico DBT as a complementary resource for training and benchmarking DL models, particularly in data-limited environments. This study provides foundational experimental evidence for integrating computationally generated in silico data into AI-based DBT tumor segmentation research workflows.