利用双向交叉注意的变压器对老年性黄斑变性进行多模态分类

IF 4.9 2区医学 Q1 ENGINEERING, BIOMEDICAL

Biomedical Signal Processing and Control Pub Date : 2025-05-02 DOI:10.1016/j.bspc.2025.107887

Jianfeng Li , Zongda Wang , Yuanqiong Chen , Chengzhang Zhu , Mingqiang Xiong , Harrison Xiao Bai

{"title":"利用双向交叉注意的变压器对老年性黄斑变性进行多模态分类","authors":"Jianfeng Li , Zongda Wang , Yuanqiong Chen , Chengzhang Zhu , Mingqiang Xiong , Harrison Xiao Bai","doi":"10.1016/j.bspc.2025.107887","DOIUrl":null,"url":null,"abstract":"<div><div>Age-related Macular Degeneration (AMD) ranks among the leading causes of blindness globally, especially among people over 50. Color fundus photograph (CFP) and optical coherence tomography (OCT) B-scan image are both widely applied in diagnostic phase of AMD. However, most existing multimodal approaches are based on traditional convolutional neural networks (CNN), which usually have limited local receptive fields when processing cross-modal information. Therefore, we propose a model that integrates CNN and transformer architectures for the diagnosis of AMD. Specifically, we first extract features through CNN to learn local representations of the images, and then employ bidirectional cross attention blocks with intramodal and intermodal attention. This allows the model to learn global representations from all input modalities, enabling it to capture long-range dependencies and enhance multimodal feature fusion through its global modeling capabilities. Moreover, we apply data augmentation for effective training. Our data augmentation approach leverages class activation mapping (CAM) as a conditional input to guide a GAN-based network in synthesizing high-resolution CFP and OCT images. Extensive experiments were conducted were carried out on a publicly available AMD dataset to assess the effectiveness of our model. Our method achieves an F1-score of 0.897 and an accuracy of 84.3% on the test set. The results indicate that our proposed approach significantly outperforms multiple baselines for multimodal AMD diagnosis.</div></div>","PeriodicalId":55362,"journal":{"name":"Biomedical Signal Processing and Control","volume":"109 ","pages":"Article 107887"},"PeriodicalIF":4.9000,"publicationDate":"2025-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Transformer utilizing bidirectional cross-attention for multi-modal classification of Age-Related Macular Degeneration\",\"authors\":\"Jianfeng Li , Zongda Wang , Yuanqiong Chen , Chengzhang Zhu , Mingqiang Xiong , Harrison Xiao Bai\",\"doi\":\"10.1016/j.bspc.2025.107887\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Age-related Macular Degeneration (AMD) ranks among the leading causes of blindness globally, especially among people over 50. Color fundus photograph (CFP) and optical coherence tomography (OCT) B-scan image are both widely applied in diagnostic phase of AMD. However, most existing multimodal approaches are based on traditional convolutional neural networks (CNN), which usually have limited local receptive fields when processing cross-modal information. Therefore, we propose a model that integrates CNN and transformer architectures for the diagnosis of AMD. Specifically, we first extract features through CNN to learn local representations of the images, and then employ bidirectional cross attention blocks with intramodal and intermodal attention. This allows the model to learn global representations from all input modalities, enabling it to capture long-range dependencies and enhance multimodal feature fusion through its global modeling capabilities. Moreover, we apply data augmentation for effective training. Our data augmentation approach leverages class activation mapping (CAM) as a conditional input to guide a GAN-based network in synthesizing high-resolution CFP and OCT images. Extensive experiments were conducted were carried out on a publicly available AMD dataset to assess the effectiveness of our model. Our method achieves an F1-score of 0.897 and an accuracy of 84.3% on the test set. The results indicate that our proposed approach significantly outperforms multiple baselines for multimodal AMD diagnosis.</div></div>\",\"PeriodicalId\":55362,\"journal\":{\"name\":\"Biomedical Signal Processing and Control\",\"volume\":\"109 \",\"pages\":\"Article 107887\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2025-05-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biomedical Signal Processing and Control\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1746809425003982\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, BIOMEDICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical Signal Processing and Control","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1746809425003982","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}

引用次数: 0

摘要

年龄相关性黄斑变性（AMD）是全球失明的主要原因之一，尤其是在50岁以上的人群中。彩色眼底照片（CFP）和光学相干断层扫描（OCT） b扫描图像都被广泛应用于AMD的诊断阶段。然而，大多数现有的多模态方法都是基于传统的卷积神经网络（CNN），在处理跨模态信息时，卷积神经网络通常具有有限的局部接受域。因此，我们提出了一个集成CNN和变压器架构的模型来诊断AMD。具体来说，我们首先通过CNN提取特征来学习图像的局部表示，然后使用具有模态内和多模态注意的双向交叉注意块。这允许模型从所有输入模态中学习全局表示，使其能够捕获远程依赖关系，并通过其全局建模功能增强多模态特征融合。此外，我们应用数据增强来进行有效的训练。我们的数据增强方法利用类激活映射（CAM）作为条件输入来指导基于gan的网络合成高分辨率CFP和OCT图像。在公开可用的AMD数据集上进行了大量实验，以评估我们模型的有效性。我们的方法在测试集上的f1得分为0.897，准确率为84.3%。结果表明，我们提出的方法明显优于多模态AMD诊断的多个基线。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Transformer utilizing bidirectional cross-attention for multi-modal classification of Age-Related Macular Degeneration

Age-related Macular Degeneration (AMD) ranks among the leading causes of blindness globally, especially among people over 50. Color fundus photograph (CFP) and optical coherence tomography (OCT) B-scan image are both widely applied in diagnostic phase of AMD. However, most existing multimodal approaches are based on traditional convolutional neural networks (CNN), which usually have limited local receptive fields when processing cross-modal information. Therefore, we propose a model that integrates CNN and transformer architectures for the diagnosis of AMD. Specifically, we first extract features through CNN to learn local representations of the images, and then employ bidirectional cross attention blocks with intramodal and intermodal attention. This allows the model to learn global representations from all input modalities, enabling it to capture long-range dependencies and enhance multimodal feature fusion through its global modeling capabilities. Moreover, we apply data augmentation for effective training. Our data augmentation approach leverages class activation mapping (CAM) as a conditional input to guide a GAN-based network in synthesizing high-resolution CFP and OCT images. Extensive experiments were conducted were carried out on a publicly available AMD dataset to assess the effectiveness of our model. Our method achieves an F1-score of 0.897 and an accuracy of 84.3% on the test set. The results indicate that our proposed approach significantly outperforms multiple baselines for multimodal AMD diagnosis.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Biomedical Signal Processing and Control 工程技术-工程：生物医学

CiteScore

9.80

自引率

13.70%

发文量

822

审稿时长

4 months

期刊介绍： Biomedical Signal Processing and Control aims to provide a cross-disciplinary international forum for the interchange of information on research in the measurement and analysis of signals and images in clinical medicine and the biological sciences. Emphasis is placed on contributions dealing with the practical, applications-led research on the use of methods and devices in clinical diagnosis, patient monitoring and management. Biomedical Signal Processing and Control reflects the main areas in which these methods are being used and developed at the interface of both engineering and clinical science. The scope of the journal is defined to include relevant review papers, technical notes, short communications and letters. Tutorial papers and special issues will also be published.