基于自监督学习和Swin Transformer与卷积神经网络混合深度模型的乳房x光筛查增强乳腺癌检测。

IF 1.7 Q3 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

Journal of Medical Imaging Pub Date : 2025-11-01 Epub Date: 2025-05-14 DOI:10.1117/1.JMI.12.S2.S22007

Han Chen, Anne L Martel

{"title":"基于自监督学习和Swin Transformer与卷积神经网络混合深度模型的乳房x光筛查增强乳腺癌检测。","authors":"Han Chen, Anne L Martel","doi":"10.1117/1.JMI.12.S2.S22007","DOIUrl":null,"url":null,"abstract":"Purpose: The scarcity of high-quality curated labeled medical training data remains one of the major limitations in applying artificial intelligence systems to breast cancer diagnosis. Deep models for mammogram analysis and mass (or micro-calcification) detection require training with a large volume of labeled images, which are often expensive and time-consuming to collect. To reduce this challenge, we proposed a method that leverages self-supervised learning (SSL) and a deep hybrid model, named HybMNet, which combines local self-attention and fine-grained feature extraction to enhance breast cancer detection on screening mammograms.Approach: Our method employs a two-stage learning process: (1) SSL pretraining: We utilize Efficient Self-Supervised Vision Transformers, an SSL technique, to pretrain a Swin Transformer (Swin-T) using a limited set of mammograms. The pretrained Swin-T then serves as the backbone for the downstream task. (2) Downstream training: The proposed HybMNet combines the Swin-T backbone with a convolutional neural network (CNN)-based network and a fusion strategy. The Swin-T employs local self-attention to identify informative patch regions from the high-resolution mammogram, whereas the CNN-based network extracts fine-grained local features from the selected patches. A fusion module then integrates global and local information from both networks to generate robust predictions. The HybMNet is trained end-to-end, with the loss function combining the outputs of the Swin-T and CNN modules to optimize feature extraction and classification performance.Results: The proposed method was evaluated for its ability to detect breast cancer by distinguishing between benign (normal) and malignant mammograms. Leveraging SSL pretraining and the HybMNet model, it achieved an area under the ROC curve of 0.864 (95% CI: 0.852, 0.875) on the Chinese Mammogram Database (CMMD) dataset and 0.889 (95% CI: 0.875, 0.903) on the INbreast dataset, highlighting its effectiveness.Conclusions: The quantitative results highlight the effectiveness of our proposed HybMNet and the SSL pretraining approach. In addition, visualizations of the selected region of interest patches show the model's potential for weakly supervised detection of microcalcifications, despite being trained using only image-level labels.","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"12 Suppl 2","pages":"S22007"},"PeriodicalIF":1.7000,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12076021/pdf/","citationCount":"0","resultStr":"{\"title\":\"Enhancing breast cancer detection on screening mammogram using self-supervised learning and a hybrid deep model of Swin Transformer and convolutional neural networks.\",\"authors\":\"Han Chen, Anne L Martel\",\"doi\":\"10.1117/1.JMI.12.S2.S22007\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Purpose: The scarcity of high-quality curated labeled medical training data remains one of the major limitations in applying artificial intelligence systems to breast cancer diagnosis. Deep models for mammogram analysis and mass (or micro-calcification) detection require training with a large volume of labeled images, which are often expensive and time-consuming to collect. To reduce this challenge, we proposed a method that leverages self-supervised learning (SSL) and a deep hybrid model, named HybMNet, which combines local self-attention and fine-grained feature extraction to enhance breast cancer detection on screening mammograms.Approach: Our method employs a two-stage learning process: (1) SSL pretraining: We utilize Efficient Self-Supervised Vision Transformers, an SSL technique, to pretrain a Swin Transformer (Swin-T) using a limited set of mammograms. The pretrained Swin-T then serves as the backbone for the downstream task. (2) Downstream training: The proposed HybMNet combines the Swin-T backbone with a convolutional neural network (CNN)-based network and a fusion strategy. The Swin-T employs local self-attention to identify informative patch regions from the high-resolution mammogram, whereas the CNN-based network extracts fine-grained local features from the selected patches. A fusion module then integrates global and local information from both networks to generate robust predictions. The HybMNet is trained end-to-end, with the loss function combining the outputs of the Swin-T and CNN modules to optimize feature extraction and classification performance.Results: The proposed method was evaluated for its ability to detect breast cancer by distinguishing between benign (normal) and malignant mammograms. Leveraging SSL pretraining and the HybMNet model, it achieved an area under the ROC curve of 0.864 (95% CI: 0.852, 0.875) on the Chinese Mammogram Database (CMMD) dataset and 0.889 (95% CI: 0.875, 0.903) on the INbreast dataset, highlighting its effectiveness.Conclusions: The quantitative results highlight the effectiveness of our proposed HybMNet and the SSL pretraining approach. In addition, visualizations of the selected region of interest patches show the model's potential for weakly supervised detection of microcalcifications, despite being trained using only image-level labels.\",\"PeriodicalId\":47707,\"journal\":{\"name\":\"Journal of Medical Imaging\",\"volume\":\"12 Suppl 2\",\"pages\":\"S22007\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2025-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12076021/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Medical Imaging\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1117/1.JMI.12.S2.S22007\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/5/14 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Imaging","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1117/1.JMI.12.S2.S22007","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/5/14 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

摘要

目的：缺乏高质量的精心策划的标记医学训练数据仍然是人工智能系统应用于乳腺癌诊断的主要限制之一。用于乳房x光片分析和肿块（或微钙化）检测的深度模型需要使用大量标记图像进行训练，而这些图像的收集通常既昂贵又耗时。为了减少这一挑战，我们提出了一种利用自监督学习（SSL）和深度混合模型HybMNet的方法，该模型结合了局部自关注和细粒度特征提取，以增强乳房x光筛查中的乳腺癌检测。方法：我们的方法采用两个阶段的学习过程：(1)SSL预训练：我们利用SSL技术高效自监督视觉变压器，使用一组有限的乳房x线照片来预训练Swin变压器（Swin- t）。然后，预训练的swwin - t作为下游任务的主干。(2)下游训练：提出的HybMNet将swwin - t骨干网与基于卷积神经网络（CNN）的网络和融合策略相结合。swwin - t利用局部自关注从高分辨率乳房x光片中识别信息丰富的斑块区域，而基于cnn的网络则从选择的斑块中提取细粒度的局部特征。然后，融合模块将来自两个网络的全球和本地信息集成在一起，生成可靠的预测。HybMNet是端到端训练的，损失函数结合swwin -t和CNN模块的输出来优化特征提取和分类性能。结果：该方法通过区分良性（正常）和恶性乳房x光检查来评估其检测乳腺癌的能力。利用SSL预训练和HybMNet模型，在中国乳房x线摄影数据库（CMMD）数据集上实现了0.864 （95% CI: 0.852, 0.875）的ROC曲线下面积，在INbreast数据集上实现了0.889 (95% CI: 0.875, 0.903)，突出了其有效性。结论：定量结果突出了我们提出的HybMNet和SSL预训练方法的有效性。此外，尽管只使用图像级标签进行训练，但所选择的感兴趣斑块区域的可视化显示了该模型对微钙化的弱监督检测的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Enhancing breast cancer detection on screening mammogram using self-supervised learning and a hybrid deep model of Swin Transformer and convolutional neural networks.

Purpose: The scarcity of high-quality curated labeled medical training data remains one of the major limitations in applying artificial intelligence systems to breast cancer diagnosis. Deep models for mammogram analysis and mass (or micro-calcification) detection require training with a large volume of labeled images, which are often expensive and time-consuming to collect. To reduce this challenge, we proposed a method that leverages self-supervised learning (SSL) and a deep hybrid model, named HybMNet, which combines local self-attention and fine-grained feature extraction to enhance breast cancer detection on screening mammograms.

Approach: Our method employs a two-stage learning process: (1) SSL pretraining: We utilize Efficient Self-Supervised Vision Transformers, an SSL technique, to pretrain a Swin Transformer (Swin-T) using a limited set of mammograms. The pretrained Swin-T then serves as the backbone for the downstream task. (2) Downstream training: The proposed HybMNet combines the Swin-T backbone with a convolutional neural network (CNN)-based network and a fusion strategy. The Swin-T employs local self-attention to identify informative patch regions from the high-resolution mammogram, whereas the CNN-based network extracts fine-grained local features from the selected patches. A fusion module then integrates global and local information from both networks to generate robust predictions. The HybMNet is trained end-to-end, with the loss function combining the outputs of the Swin-T and CNN modules to optimize feature extraction and classification performance.

Results: The proposed method was evaluated for its ability to detect breast cancer by distinguishing between benign (normal) and malignant mammograms. Leveraging SSL pretraining and the HybMNet model, it achieved an area under the ROC curve of 0.864 (95% CI: 0.852, 0.875) on the Chinese Mammogram Database (CMMD) dataset and 0.889 (95% CI: 0.875, 0.903) on the INbreast dataset, highlighting its effectiveness.

Conclusions: The quantitative results highlight the effectiveness of our proposed HybMNet and the SSL pretraining approach. In addition, visualizations of the selected region of interest patches show the model's potential for weakly supervised detection of microcalcifications, despite being trained using only image-level labels.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Medical Imaging RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING-

CiteScore

4.10

自引率

4.20%

发文量

期刊介绍： JMI covers fundamental and translational research, as well as applications, focused on medical imaging, which continue to yield physical and biomedical advancements in the early detection, diagnostics, and therapy of disease as well as in the understanding of normal. The scope of JMI includes: Imaging physics, Tomographic reconstruction algorithms (such as those in CT and MRI), Image processing and deep learning, Computer-aided diagnosis and quantitative image analysis, Visualization and modeling, Picture archiving and communications systems (PACS), Image perception and observer performance, Technology assessment, Ultrasonic imaging, Image-guided procedures, Digital pathology, Biomedical applications of biomedical imaging. JMI allows for the peer-reviewed communication and archiving of scientific developments, translational and clinical applications, reviews, and recommendations for the field.