Transformer-based multi-modal learning for breast cancer screening: Merging imaging and genetic data

IF 1.7 4区 综合性期刊 Q2 MULTIDISCIPLINARY SCIENCES
Mingshuang Fang , Binxiong Xu
{"title":"Transformer-based multi-modal learning for breast cancer screening: Merging imaging and genetic data","authors":"Mingshuang Fang ,&nbsp;Binxiong Xu","doi":"10.1016/j.jrras.2025.101586","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>This study addresses the clinical need for more accurate breast cancer screening by developing a transformer-based, multi-modal BI-RADS classification framework that integrates mammographic radiomics, deep imaging features, and RNA-Seq-derived genetic biomarkers.</div></div><div><h3>Materials and methods</h3><div>Lesion auto-segmentation was performed using Swin-UNETR and nnU-Net on a dataset of 4265 patients collected from five medical centers. Radiomics and deep features were extracted using ResNet50 and Vision Transformer (ViT) architectures, and RNA-Seq genetic features were obtained via DNABERT and TabTransformer models. The dataset included BI-RADS distributions as follows: BI-RADS 1 (853), BI-RADS 2 (1066), BI-RADS 3 (853), BI-RADS 4 (853), and BI-RADS 5 (640) patients. Prior to classification, the reliability of extracted features was evaluated via Intraclass Correlation Coefficient (ICC) analysis, and dimensionality reduction was conducted using Principal Component Analysis (PCA), followed by feature selection methods including Least Absolute Shrinkage and Selection Operator (LASSO), Recursive Feature Elimination (RFE), and Analysis of Variance (ANOVA). The refined feature set was subsequently classified using machine learning algorithms such as XGBoost, CatBoost, SVM, and Random Forest. Model performance was assessed using metrics including accuracy, area under the curve (AUC), and recall, with five-fold cross-validation and an external test set utilized to confirm generalization.</div></div><div><h3>Results</h3><div>Swin-UNETR demonstrated superior segmentation performance compared to nnU-Net (DSC = 0.94 versus 0.88). Feature-based classification leveraging radiomics and deep learning features attained a peak accuracy of 89.22 % when utilizing ViT in combination with Swin-UNETR. The integration of radiomics, deep, and genetic features further enhanced classification outcomes, with the LASSO-XGBoost model achieving 96.17 % accuracy, an AUC of 97.22 %, and a recall rate of 95.28 %. Moreover, the end-to-end deep learning approach also yielded strong results, with the ViT model (based on Swin-UNETR segmentation) attaining an accuracy of 92.68 % and an AUC of 94.81 %.</div></div><div><h3>Conclusions</h3><div>Multi-modal integration significantly outperformed unimodal approaches, demonstrating strong generalization and robustness.</div></div>","PeriodicalId":16920,"journal":{"name":"Journal of Radiation Research and Applied Sciences","volume":"18 3","pages":"Article 101586"},"PeriodicalIF":1.7000,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Radiation Research and Applied Sciences","FirstCategoryId":"103","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1687850725002985","RegionNum":4,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Objective

This study addresses the clinical need for more accurate breast cancer screening by developing a transformer-based, multi-modal BI-RADS classification framework that integrates mammographic radiomics, deep imaging features, and RNA-Seq-derived genetic biomarkers.

Materials and methods

Lesion auto-segmentation was performed using Swin-UNETR and nnU-Net on a dataset of 4265 patients collected from five medical centers. Radiomics and deep features were extracted using ResNet50 and Vision Transformer (ViT) architectures, and RNA-Seq genetic features were obtained via DNABERT and TabTransformer models. The dataset included BI-RADS distributions as follows: BI-RADS 1 (853), BI-RADS 2 (1066), BI-RADS 3 (853), BI-RADS 4 (853), and BI-RADS 5 (640) patients. Prior to classification, the reliability of extracted features was evaluated via Intraclass Correlation Coefficient (ICC) analysis, and dimensionality reduction was conducted using Principal Component Analysis (PCA), followed by feature selection methods including Least Absolute Shrinkage and Selection Operator (LASSO), Recursive Feature Elimination (RFE), and Analysis of Variance (ANOVA). The refined feature set was subsequently classified using machine learning algorithms such as XGBoost, CatBoost, SVM, and Random Forest. Model performance was assessed using metrics including accuracy, area under the curve (AUC), and recall, with five-fold cross-validation and an external test set utilized to confirm generalization.

Results

Swin-UNETR demonstrated superior segmentation performance compared to nnU-Net (DSC = 0.94 versus 0.88). Feature-based classification leveraging radiomics and deep learning features attained a peak accuracy of 89.22 % when utilizing ViT in combination with Swin-UNETR. The integration of radiomics, deep, and genetic features further enhanced classification outcomes, with the LASSO-XGBoost model achieving 96.17 % accuracy, an AUC of 97.22 %, and a recall rate of 95.28 %. Moreover, the end-to-end deep learning approach also yielded strong results, with the ViT model (based on Swin-UNETR segmentation) attaining an accuracy of 92.68 % and an AUC of 94.81 %.

Conclusions

Multi-modal integration significantly outperformed unimodal approaches, demonstrating strong generalization and robustness.
基于转换器的多模式学习用于乳腺癌筛查:合并成像和遗传数据
本研究通过开发一种基于转换器的多模式BI-RADS分类框架,将乳房x线摄影放射组学、深度成像特征和rna - seq衍生的遗传生物标志物整合在一起,解决了更准确的乳腺癌筛查的临床需求。材料与方法采用swwin - unetr和nnU-Net对来自5个医疗中心的4265例患者数据集进行切片自动分割。使用ResNet50和Vision Transformer (ViT)架构提取放射组学和深度特征,通过DNABERT和TabTransformer模型获得RNA-Seq遗传特征。数据集包括BI-RADS分布如下:BI-RADS 1(853)、BI-RADS 2(1066)、BI-RADS 3(853)、BI-RADS 4(853)和BI-RADS 5(640)例患者。在分类之前,通过类内相关系数(ICC)分析评估提取的特征的可靠性,并使用主成分分析(PCA)进行降维,然后使用最小绝对收缩和选择算子(LASSO),递归特征消除(RFE)和方差分析(ANOVA)等方法进行特征选择。随后,使用机器学习算法(如XGBoost、CatBoost、SVM和Random Forest)对改进后的特征集进行分类。使用包括准确性、曲线下面积(AUC)和召回率在内的指标评估模型性能,并使用五倍交叉验证和外部测试集来确认泛化。结果swin - unetr的分割效果优于nnU-Net (DSC = 0.94比0.88)。利用放射组学和深度学习特征的基于特征的分类在使用ViT与swing - unetr相结合时达到了89.22%的峰值准确率。放射组学、深度和遗传特征的整合进一步提高了分类结果,LASSO-XGBoost模型准确率达到96.17%,AUC为97.22%,召回率为95.28%。此外,端到端深度学习方法也产生了强大的结果,ViT模型(基于swan - unetr分割)的准确率为92.68%,AUC为94.81%。结论多模态集成方法明显优于单模态集成方法,具有较强的泛化性和鲁棒性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
5.90%
发文量
130
审稿时长
16 weeks
期刊介绍: Journal of Radiation Research and Applied Sciences provides a high quality medium for the publication of substantial, original and scientific and technological papers on the development and applications of nuclear, radiation and isotopes in biology, medicine, drugs, biochemistry, microbiology, agriculture, entomology, food technology, chemistry, physics, solid states, engineering, environmental and applied sciences.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信