Transformer-based multi-modal learning for breast cancer screening: Merging imaging and genetic data

IF 1.7 4区综合性期刊 Q2 MULTIDISCIPLINARY SCIENCES

Journal of Radiation Research and Applied Sciences Pub Date : 2025-05-29 DOI:10.1016/j.jrras.2025.101586

Mingshuang Fang , Binxiong Xu

{"title":"Transformer-based multi-modal learning for breast cancer screening: Merging imaging and genetic data","authors":"Mingshuang Fang , Binxiong Xu","doi":"10.1016/j.jrras.2025.101586","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>This study addresses the clinical need for more accurate breast cancer screening by developing a transformer-based, multi-modal BI-RADS classification framework that integrates mammographic radiomics, deep imaging features, and RNA-Seq-derived genetic biomarkers.</div></div><div><h3>Materials and methods</h3><div>Lesion auto-segmentation was performed using Swin-UNETR and nnU-Net on a dataset of 4265 patients collected from five medical centers. Radiomics and deep features were extracted using ResNet50 and Vision Transformer (ViT) architectures, and RNA-Seq genetic features were obtained via DNABERT and TabTransformer models. The dataset included BI-RADS distributions as follows: BI-RADS 1 (853), BI-RADS 2 (1066), BI-RADS 3 (853), BI-RADS 4 (853), and BI-RADS 5 (640) patients. Prior to classification, the reliability of extracted features was evaluated via Intraclass Correlation Coefficient (ICC) analysis, and dimensionality reduction was conducted using Principal Component Analysis (PCA), followed by feature selection methods including Least Absolute Shrinkage and Selection Operator (LASSO), Recursive Feature Elimination (RFE), and Analysis of Variance (ANOVA). The refined feature set was subsequently classified using machine learning algorithms such as XGBoost, CatBoost, SVM, and Random Forest. Model performance was assessed using metrics including accuracy, area under the curve (AUC), and recall, with five-fold cross-validation and an external test set utilized to confirm generalization.</div></div><div><h3>Results</h3><div>Swin-UNETR demonstrated superior segmentation performance compared to nnU-Net (DSC = 0.94 versus 0.88). Feature-based classification leveraging radiomics and deep learning features attained a peak accuracy of 89.22 % when utilizing ViT in combination with Swin-UNETR. The integration of radiomics, deep, and genetic features further enhanced classification outcomes, with the LASSO-XGBoost model achieving 96.17 % accuracy, an AUC of 97.22 %, and a recall rate of 95.28 %. Moreover, the end-to-end deep learning approach also yielded strong results, with the ViT model (based on Swin-UNETR segmentation) attaining an accuracy of 92.68 % and an AUC of 94.81 %.</div></div><div><h3>Conclusions</h3><div>Multi-modal integration significantly outperformed unimodal approaches, demonstrating strong generalization and robustness.</div></div>","PeriodicalId":16920,"journal":{"name":"Journal of Radiation Research and Applied Sciences","volume":"18 3","pages":"Article 101586"},"PeriodicalIF":1.7000,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Radiation Research and Applied Sciences","FirstCategoryId":"103","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1687850725002985","RegionNum":4,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

Objective

This study addresses the clinical need for more accurate breast cancer screening by developing a transformer-based, multi-modal BI-RADS classification framework that integrates mammographic radiomics, deep imaging features, and RNA-Seq-derived genetic biomarkers.

Materials and methods

Lesion auto-segmentation was performed using Swin-UNETR and nnU-Net on a dataset of 4265 patients collected from five medical centers. Radiomics and deep features were extracted using ResNet50 and Vision Transformer (ViT) architectures, and RNA-Seq genetic features were obtained via DNABERT and TabTransformer models. The dataset included BI-RADS distributions as follows: BI-RADS 1 (853), BI-RADS 2 (1066), BI-RADS 3 (853), BI-RADS 4 (853), and BI-RADS 5 (640) patients. Prior to classification, the reliability of extracted features was evaluated via Intraclass Correlation Coefficient (ICC) analysis, and dimensionality reduction was conducted using Principal Component Analysis (PCA), followed by feature selection methods including Least Absolute Shrinkage and Selection Operator (LASSO), Recursive Feature Elimination (RFE), and Analysis of Variance (ANOVA). The refined feature set was subsequently classified using machine learning algorithms such as XGBoost, CatBoost, SVM, and Random Forest. Model performance was assessed using metrics including accuracy, area under the curve (AUC), and recall, with five-fold cross-validation and an external test set utilized to confirm generalization.

Results

Swin-UNETR demonstrated superior segmentation performance compared to nnU-Net (DSC = 0.94 versus 0.88). Feature-based classification leveraging radiomics and deep learning features attained a peak accuracy of 89.22 % when utilizing ViT in combination with Swin-UNETR. The integration of radiomics, deep, and genetic features further enhanced classification outcomes, with the LASSO-XGBoost model achieving 96.17 % accuracy, an AUC of 97.22 %, and a recall rate of 95.28 %. Moreover, the end-to-end deep learning approach also yielded strong results, with the ViT model (based on Swin-UNETR segmentation) attaining an accuracy of 92.68 % and an AUC of 94.81 %.

Conclusions

Multi-modal integration significantly outperformed unimodal approaches, demonstrating strong generalization and robustness.

查看原文本刊更多论文

基于转换器的多模式学习用于乳腺癌筛查：合并成像和遗传数据

本研究通过开发一种基于转换器的多模式BI-RADS分类框架，将乳房x线摄影放射组学、深度成像特征和rna - seq衍生的遗传生物标志物整合在一起，解决了更准确的乳腺癌筛查的临床需求。材料与方法采用swwin - unetr和nnU-Net对来自5个医疗中心的4265例患者数据集进行切片自动分割。使用ResNet50和Vision Transformer （ViT）架构提取放射组学和深度特征，通过DNABERT和TabTransformer模型获得RNA-Seq遗传特征。数据集包括BI-RADS分布如下：BI-RADS 1（853）、BI-RADS 2（1066）、BI-RADS 3（853）、BI-RADS 4（853）和BI-RADS 5（640）例患者。在分类之前，通过类内相关系数（ICC）分析评估提取的特征的可靠性，并使用主成分分析（PCA）进行降维，然后使用最小绝对收缩和选择算子（LASSO），递归特征消除（RFE）和方差分析（ANOVA）等方法进行特征选择。随后，使用机器学习算法（如XGBoost、CatBoost、SVM和Random Forest）对改进后的特征集进行分类。使用包括准确性、曲线下面积（AUC）和召回率在内的指标评估模型性能，并使用五倍交叉验证和外部测试集来确认泛化。结果swin - unetr的分割效果优于nnU-Net （DSC = 0.94比0.88）。利用放射组学和深度学习特征的基于特征的分类在使用ViT与swing - unetr相结合时达到了89.22%的峰值准确率。放射组学、深度和遗传特征的整合进一步提高了分类结果，LASSO-XGBoost模型准确率达到96.17%，AUC为97.22%，召回率为95.28%。此外，端到端深度学习方法也产生了强大的结果，ViT模型（基于swan - unetr分割）的准确率为92.68%，AUC为94.81%。结论多模态集成方法明显优于单模态集成方法，具有较强的泛化性和鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Radiation Research and Applied Sciences MULTIDISCIPLINARY SCIENCES-

自引率

5.90%

发文量

130

审稿时长

16 weeks

期刊介绍： Journal of Radiation Research and Applied Sciences provides a high quality medium for the publication of substantial, original and scientific and technological papers on the development and applications of nuclear, radiation and isotopes in biology, medicine, drugs, biochemistry, microbiology, agriculture, entomology, food technology, chemistry, physics, solid states, engineering, environmental and applied sciences.