{"title":"Transformer-based multi-modal learning for breast cancer screening: Merging imaging and genetic data","authors":"Mingshuang Fang , Binxiong Xu","doi":"10.1016/j.jrras.2025.101586","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>This study addresses the clinical need for more accurate breast cancer screening by developing a transformer-based, multi-modal BI-RADS classification framework that integrates mammographic radiomics, deep imaging features, and RNA-Seq-derived genetic biomarkers.</div></div><div><h3>Materials and methods</h3><div>Lesion auto-segmentation was performed using Swin-UNETR and nnU-Net on a dataset of 4265 patients collected from five medical centers. Radiomics and deep features were extracted using ResNet50 and Vision Transformer (ViT) architectures, and RNA-Seq genetic features were obtained via DNABERT and TabTransformer models. The dataset included BI-RADS distributions as follows: BI-RADS 1 (853), BI-RADS 2 (1066), BI-RADS 3 (853), BI-RADS 4 (853), and BI-RADS 5 (640) patients. Prior to classification, the reliability of extracted features was evaluated via Intraclass Correlation Coefficient (ICC) analysis, and dimensionality reduction was conducted using Principal Component Analysis (PCA), followed by feature selection methods including Least Absolute Shrinkage and Selection Operator (LASSO), Recursive Feature Elimination (RFE), and Analysis of Variance (ANOVA). The refined feature set was subsequently classified using machine learning algorithms such as XGBoost, CatBoost, SVM, and Random Forest. Model performance was assessed using metrics including accuracy, area under the curve (AUC), and recall, with five-fold cross-validation and an external test set utilized to confirm generalization.</div></div><div><h3>Results</h3><div>Swin-UNETR demonstrated superior segmentation performance compared to nnU-Net (DSC = 0.94 versus 0.88). Feature-based classification leveraging radiomics and deep learning features attained a peak accuracy of 89.22 % when utilizing ViT in combination with Swin-UNETR. The integration of radiomics, deep, and genetic features further enhanced classification outcomes, with the LASSO-XGBoost model achieving 96.17 % accuracy, an AUC of 97.22 %, and a recall rate of 95.28 %. Moreover, the end-to-end deep learning approach also yielded strong results, with the ViT model (based on Swin-UNETR segmentation) attaining an accuracy of 92.68 % and an AUC of 94.81 %.</div></div><div><h3>Conclusions</h3><div>Multi-modal integration significantly outperformed unimodal approaches, demonstrating strong generalization and robustness.</div></div>","PeriodicalId":16920,"journal":{"name":"Journal of Radiation Research and Applied Sciences","volume":"18 3","pages":"Article 101586"},"PeriodicalIF":1.7000,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Radiation Research and Applied Sciences","FirstCategoryId":"103","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1687850725002985","RegionNum":4,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Objective
This study addresses the clinical need for more accurate breast cancer screening by developing a transformer-based, multi-modal BI-RADS classification framework that integrates mammographic radiomics, deep imaging features, and RNA-Seq-derived genetic biomarkers.
Materials and methods
Lesion auto-segmentation was performed using Swin-UNETR and nnU-Net on a dataset of 4265 patients collected from five medical centers. Radiomics and deep features were extracted using ResNet50 and Vision Transformer (ViT) architectures, and RNA-Seq genetic features were obtained via DNABERT and TabTransformer models. The dataset included BI-RADS distributions as follows: BI-RADS 1 (853), BI-RADS 2 (1066), BI-RADS 3 (853), BI-RADS 4 (853), and BI-RADS 5 (640) patients. Prior to classification, the reliability of extracted features was evaluated via Intraclass Correlation Coefficient (ICC) analysis, and dimensionality reduction was conducted using Principal Component Analysis (PCA), followed by feature selection methods including Least Absolute Shrinkage and Selection Operator (LASSO), Recursive Feature Elimination (RFE), and Analysis of Variance (ANOVA). The refined feature set was subsequently classified using machine learning algorithms such as XGBoost, CatBoost, SVM, and Random Forest. Model performance was assessed using metrics including accuracy, area under the curve (AUC), and recall, with five-fold cross-validation and an external test set utilized to confirm generalization.
Results
Swin-UNETR demonstrated superior segmentation performance compared to nnU-Net (DSC = 0.94 versus 0.88). Feature-based classification leveraging radiomics and deep learning features attained a peak accuracy of 89.22 % when utilizing ViT in combination with Swin-UNETR. The integration of radiomics, deep, and genetic features further enhanced classification outcomes, with the LASSO-XGBoost model achieving 96.17 % accuracy, an AUC of 97.22 %, and a recall rate of 95.28 %. Moreover, the end-to-end deep learning approach also yielded strong results, with the ViT model (based on Swin-UNETR segmentation) attaining an accuracy of 92.68 % and an AUC of 94.81 %.
期刊介绍:
Journal of Radiation Research and Applied Sciences provides a high quality medium for the publication of substantial, original and scientific and technological papers on the development and applications of nuclear, radiation and isotopes in biology, medicine, drugs, biochemistry, microbiology, agriculture, entomology, food technology, chemistry, physics, solid states, engineering, environmental and applied sciences.