{"title":"Mammography classification with multi-view deep learning techniques: Investigating graph and transformer-based architectures","authors":"","doi":"10.1016/j.media.2024.103320","DOIUrl":null,"url":null,"abstract":"<div><p>The potential and promise of deep learning systems to provide an independent assessment and relieve radiologists’ burden in screening mammography have been recognized in several studies. However, the low cancer prevalence, the need to process high-resolution images, and the need to combine information from multiple views and scales still pose technical challenges. Multi-view architectures that combine information from the four mammographic views to produce an exam-level classification score are a promising approach to the automated processing of screening mammography. However, training such architectures from exam-level labels, without relying on pixel-level supervision, requires very large datasets and may result in suboptimal accuracy. Emerging architectures such as Visual Transformers (ViT) and graph-based architectures can potentially integrate ipsi-lateral and contra-lateral breast views better than traditional convolutional neural networks, thanks to their stronger ability of modeling long-range dependencies. In this paper, we extensively evaluate novel transformer-based and graph-based architectures against state-of-the-art multi-view convolutional neural networks, trained in a weakly-supervised setting on a middle-scale dataset, both in terms of performance and interpretability. Extensive experiments on the CSAW dataset suggest that, while transformer-based architecture outperform other architectures, different inductive biases lead to complementary strengths and weaknesses, as each architecture is sensitive to different signs and mammographic features. Hence, an ensemble of different architectures should be preferred over a winner-takes-all approach to achieve more accurate and robust results. Overall, the findings highlight the potential of a wide range of multi-view architectures for breast cancer classification, even in datasets of relatively modest size, although the detection of small lesions remains challenging without pixel-wise supervision or ad-hoc networks.</p></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":null,"pages":null},"PeriodicalIF":10.7000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1361841524002457/pdfft?md5=54882ce8ea86df8174b91d3e6c870da0&pid=1-s2.0-S1361841524002457-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical image analysis","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1361841524002457","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The potential and promise of deep learning systems to provide an independent assessment and relieve radiologists’ burden in screening mammography have been recognized in several studies. However, the low cancer prevalence, the need to process high-resolution images, and the need to combine information from multiple views and scales still pose technical challenges. Multi-view architectures that combine information from the four mammographic views to produce an exam-level classification score are a promising approach to the automated processing of screening mammography. However, training such architectures from exam-level labels, without relying on pixel-level supervision, requires very large datasets and may result in suboptimal accuracy. Emerging architectures such as Visual Transformers (ViT) and graph-based architectures can potentially integrate ipsi-lateral and contra-lateral breast views better than traditional convolutional neural networks, thanks to their stronger ability of modeling long-range dependencies. In this paper, we extensively evaluate novel transformer-based and graph-based architectures against state-of-the-art multi-view convolutional neural networks, trained in a weakly-supervised setting on a middle-scale dataset, both in terms of performance and interpretability. Extensive experiments on the CSAW dataset suggest that, while transformer-based architecture outperform other architectures, different inductive biases lead to complementary strengths and weaknesses, as each architecture is sensitive to different signs and mammographic features. Hence, an ensemble of different architectures should be preferred over a winner-takes-all approach to achieve more accurate and robust results. Overall, the findings highlight the potential of a wide range of multi-view architectures for breast cancer classification, even in datasets of relatively modest size, although the detection of small lesions remains challenging without pixel-wise supervision or ad-hoc networks.
深度学习系统在提供独立评估和减轻放射科医生在乳腺放射摄影筛查中的负担方面的潜力和前景已在多项研究中得到认可。然而,癌症发病率低、需要处理高分辨率图像以及需要结合来自多个视图和尺度的信息等问题仍然是技术上的挑战。多视图架构结合了四个乳腺X光检查视图的信息,从而得出检查级别的分类分数,是乳腺X光筛查自动处理的一种很有前景的方法。然而,在不依赖像素级监督的情况下,从检查级标签训练此类架构需要非常大的数据集,并可能导致精度不达标。与传统卷积神经网络相比,视觉变换器(ViT)和基于图的架构等新兴架构能更好地整合乳房同侧和反侧视图,这要归功于它们更强的远距离依赖建模能力。在本文中,我们对基于变压器和基于图的新型架构与最先进的多视图卷积神经网络进行了广泛的评估,这些架构是在一个中等规模的数据集上以弱监督的方式进行训练的,在性能和可解释性方面都是如此。在 CSAW 数据集上进行的大量实验表明,虽然基于变压器的架构优于其他架构,但不同的归纳偏差会导致优缺点互补,因为每种架构对不同的体征和乳房 X 射线特征都很敏感。因此,为了获得更准确、更稳健的结果,应优先选择不同架构的组合,而不是赢家通吃的方法。总之,研究结果凸显了多种多视角架构在乳腺癌分类方面的潜力,即使是在规模相对较小的数据集中也是如此,不过在没有像素监督或特设网络的情况下,检测小病灶仍然是一项挑战。
期刊介绍:
Medical Image Analysis serves as a platform for sharing new research findings in the realm of medical and biological image analysis, with a focus on applications of computer vision, virtual reality, and robotics to biomedical imaging challenges. The journal prioritizes the publication of high-quality, original papers contributing to the fundamental science of processing, analyzing, and utilizing medical and biological images. It welcomes approaches utilizing biomedical image datasets across all spatial scales, from molecular/cellular imaging to tissue/organ imaging.