{"title":"基于自监督学习和Swin Transformer与卷积神经网络混合深度模型的乳房x光筛查增强乳腺癌检测。","authors":"Han Chen, Anne L Martel","doi":"10.1117/1.JMI.12.S2.S22007","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>The scarcity of high-quality curated labeled medical training data remains one of the major limitations in applying artificial intelligence systems to breast cancer diagnosis. Deep models for mammogram analysis and mass (or micro-calcification) detection require training with a large volume of labeled images, which are often expensive and time-consuming to collect. To reduce this challenge, we proposed a method that leverages self-supervised learning (SSL) and a deep hybrid model, named HybMNet, which combines local self-attention and fine-grained feature extraction to enhance breast cancer detection on screening mammograms.</p><p><strong>Approach: </strong>Our method employs a two-stage learning process: (1) SSL pretraining: We utilize Efficient Self-Supervised Vision Transformers, an SSL technique, to pretrain a Swin Transformer (Swin-T) using a limited set of mammograms. The pretrained Swin-T then serves as the backbone for the downstream task. (2) Downstream training: The proposed HybMNet combines the Swin-T backbone with a convolutional neural network (CNN)-based network and a fusion strategy. The Swin-T employs local self-attention to identify informative patch regions from the high-resolution mammogram, whereas the CNN-based network extracts fine-grained local features from the selected patches. A fusion module then integrates global and local information from both networks to generate robust predictions. The HybMNet is trained end-to-end, with the loss function combining the outputs of the Swin-T and CNN modules to optimize feature extraction and classification performance.</p><p><strong>Results: </strong>The proposed method was evaluated for its ability to detect breast cancer by distinguishing between benign (normal) and malignant mammograms. Leveraging SSL pretraining and the HybMNet model, it achieved an area under the ROC curve of 0.864 (95% CI: 0.852, 0.875) on the Chinese Mammogram Database (CMMD) dataset and 0.889 (95% CI: 0.875, 0.903) on the INbreast dataset, highlighting its effectiveness.</p><p><strong>Conclusions: </strong>The quantitative results highlight the effectiveness of our proposed HybMNet and the SSL pretraining approach. In addition, visualizations of the selected region of interest patches show the model's potential for weakly supervised detection of microcalcifications, despite being trained using only image-level labels.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"12 Suppl 2","pages":"S22007"},"PeriodicalIF":1.9000,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12076021/pdf/","citationCount":"0","resultStr":"{\"title\":\"Enhancing breast cancer detection on screening mammogram using self-supervised learning and a hybrid deep model of Swin Transformer and convolutional neural networks.\",\"authors\":\"Han Chen, Anne L Martel\",\"doi\":\"10.1117/1.JMI.12.S2.S22007\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Purpose: </strong>The scarcity of high-quality curated labeled medical training data remains one of the major limitations in applying artificial intelligence systems to breast cancer diagnosis. Deep models for mammogram analysis and mass (or micro-calcification) detection require training with a large volume of labeled images, which are often expensive and time-consuming to collect. To reduce this challenge, we proposed a method that leverages self-supervised learning (SSL) and a deep hybrid model, named HybMNet, which combines local self-attention and fine-grained feature extraction to enhance breast cancer detection on screening mammograms.</p><p><strong>Approach: </strong>Our method employs a two-stage learning process: (1) SSL pretraining: We utilize Efficient Self-Supervised Vision Transformers, an SSL technique, to pretrain a Swin Transformer (Swin-T) using a limited set of mammograms. The pretrained Swin-T then serves as the backbone for the downstream task. (2) Downstream training: The proposed HybMNet combines the Swin-T backbone with a convolutional neural network (CNN)-based network and a fusion strategy. The Swin-T employs local self-attention to identify informative patch regions from the high-resolution mammogram, whereas the CNN-based network extracts fine-grained local features from the selected patches. A fusion module then integrates global and local information from both networks to generate robust predictions. The HybMNet is trained end-to-end, with the loss function combining the outputs of the Swin-T and CNN modules to optimize feature extraction and classification performance.</p><p><strong>Results: </strong>The proposed method was evaluated for its ability to detect breast cancer by distinguishing between benign (normal) and malignant mammograms. Leveraging SSL pretraining and the HybMNet model, it achieved an area under the ROC curve of 0.864 (95% CI: 0.852, 0.875) on the Chinese Mammogram Database (CMMD) dataset and 0.889 (95% CI: 0.875, 0.903) on the INbreast dataset, highlighting its effectiveness.</p><p><strong>Conclusions: </strong>The quantitative results highlight the effectiveness of our proposed HybMNet and the SSL pretraining approach. In addition, visualizations of the selected region of interest patches show the model's potential for weakly supervised detection of microcalcifications, despite being trained using only image-level labels.</p>\",\"PeriodicalId\":47707,\"journal\":{\"name\":\"Journal of Medical Imaging\",\"volume\":\"12 Suppl 2\",\"pages\":\"S22007\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2025-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12076021/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Medical Imaging\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1117/1.JMI.12.S2.S22007\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/5/14 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Imaging","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1117/1.JMI.12.S2.S22007","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/5/14 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
Enhancing breast cancer detection on screening mammogram using self-supervised learning and a hybrid deep model of Swin Transformer and convolutional neural networks.
Purpose: The scarcity of high-quality curated labeled medical training data remains one of the major limitations in applying artificial intelligence systems to breast cancer diagnosis. Deep models for mammogram analysis and mass (or micro-calcification) detection require training with a large volume of labeled images, which are often expensive and time-consuming to collect. To reduce this challenge, we proposed a method that leverages self-supervised learning (SSL) and a deep hybrid model, named HybMNet, which combines local self-attention and fine-grained feature extraction to enhance breast cancer detection on screening mammograms.
Approach: Our method employs a two-stage learning process: (1) SSL pretraining: We utilize Efficient Self-Supervised Vision Transformers, an SSL technique, to pretrain a Swin Transformer (Swin-T) using a limited set of mammograms. The pretrained Swin-T then serves as the backbone for the downstream task. (2) Downstream training: The proposed HybMNet combines the Swin-T backbone with a convolutional neural network (CNN)-based network and a fusion strategy. The Swin-T employs local self-attention to identify informative patch regions from the high-resolution mammogram, whereas the CNN-based network extracts fine-grained local features from the selected patches. A fusion module then integrates global and local information from both networks to generate robust predictions. The HybMNet is trained end-to-end, with the loss function combining the outputs of the Swin-T and CNN modules to optimize feature extraction and classification performance.
Results: The proposed method was evaluated for its ability to detect breast cancer by distinguishing between benign (normal) and malignant mammograms. Leveraging SSL pretraining and the HybMNet model, it achieved an area under the ROC curve of 0.864 (95% CI: 0.852, 0.875) on the Chinese Mammogram Database (CMMD) dataset and 0.889 (95% CI: 0.875, 0.903) on the INbreast dataset, highlighting its effectiveness.
Conclusions: The quantitative results highlight the effectiveness of our proposed HybMNet and the SSL pretraining approach. In addition, visualizations of the selected region of interest patches show the model's potential for weakly supervised detection of microcalcifications, despite being trained using only image-level labels.
期刊介绍:
JMI covers fundamental and translational research, as well as applications, focused on medical imaging, which continue to yield physical and biomedical advancements in the early detection, diagnostics, and therapy of disease as well as in the understanding of normal. The scope of JMI includes: Imaging physics, Tomographic reconstruction algorithms (such as those in CT and MRI), Image processing and deep learning, Computer-aided diagnosis and quantitative image analysis, Visualization and modeling, Picture archiving and communications systems (PACS), Image perception and observer performance, Technology assessment, Ultrasonic imaging, Image-guided procedures, Digital pathology, Biomedical applications of biomedical imaging. JMI allows for the peer-reviewed communication and archiving of scientific developments, translational and clinical applications, reviews, and recommendations for the field.