Jianfeng Li , Zongda Wang , Yuanqiong Chen , Chengzhang Zhu , Mingqiang Xiong , Harrison Xiao Bai
{"title":"利用双向交叉注意的变压器对老年性黄斑变性进行多模态分类","authors":"Jianfeng Li , Zongda Wang , Yuanqiong Chen , Chengzhang Zhu , Mingqiang Xiong , Harrison Xiao Bai","doi":"10.1016/j.bspc.2025.107887","DOIUrl":null,"url":null,"abstract":"<div><div>Age-related Macular Degeneration (AMD) ranks among the leading causes of blindness globally, especially among people over 50. Color fundus photograph (CFP) and optical coherence tomography (OCT) B-scan image are both widely applied in diagnostic phase of AMD. However, most existing multimodal approaches are based on traditional convolutional neural networks (CNN), which usually have limited local receptive fields when processing cross-modal information. Therefore, we propose a model that integrates CNN and transformer architectures for the diagnosis of AMD. Specifically, we first extract features through CNN to learn local representations of the images, and then employ bidirectional cross attention blocks with intramodal and intermodal attention. This allows the model to learn global representations from all input modalities, enabling it to capture long-range dependencies and enhance multimodal feature fusion through its global modeling capabilities. Moreover, we apply data augmentation for effective training. Our data augmentation approach leverages class activation mapping (CAM) as a conditional input to guide a GAN-based network in synthesizing high-resolution CFP and OCT images. Extensive experiments were conducted were carried out on a publicly available AMD dataset to assess the effectiveness of our model. Our method achieves an F1-score of 0.897 and an accuracy of 84.3% on the test set. The results indicate that our proposed approach significantly outperforms multiple baselines for multimodal AMD diagnosis.</div></div>","PeriodicalId":55362,"journal":{"name":"Biomedical Signal Processing and Control","volume":"109 ","pages":"Article 107887"},"PeriodicalIF":4.9000,"publicationDate":"2025-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Transformer utilizing bidirectional cross-attention for multi-modal classification of Age-Related Macular Degeneration\",\"authors\":\"Jianfeng Li , Zongda Wang , Yuanqiong Chen , Chengzhang Zhu , Mingqiang Xiong , Harrison Xiao Bai\",\"doi\":\"10.1016/j.bspc.2025.107887\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Age-related Macular Degeneration (AMD) ranks among the leading causes of blindness globally, especially among people over 50. Color fundus photograph (CFP) and optical coherence tomography (OCT) B-scan image are both widely applied in diagnostic phase of AMD. However, most existing multimodal approaches are based on traditional convolutional neural networks (CNN), which usually have limited local receptive fields when processing cross-modal information. Therefore, we propose a model that integrates CNN and transformer architectures for the diagnosis of AMD. Specifically, we first extract features through CNN to learn local representations of the images, and then employ bidirectional cross attention blocks with intramodal and intermodal attention. This allows the model to learn global representations from all input modalities, enabling it to capture long-range dependencies and enhance multimodal feature fusion through its global modeling capabilities. Moreover, we apply data augmentation for effective training. Our data augmentation approach leverages class activation mapping (CAM) as a conditional input to guide a GAN-based network in synthesizing high-resolution CFP and OCT images. Extensive experiments were conducted were carried out on a publicly available AMD dataset to assess the effectiveness of our model. Our method achieves an F1-score of 0.897 and an accuracy of 84.3% on the test set. The results indicate that our proposed approach significantly outperforms multiple baselines for multimodal AMD diagnosis.</div></div>\",\"PeriodicalId\":55362,\"journal\":{\"name\":\"Biomedical Signal Processing and Control\",\"volume\":\"109 \",\"pages\":\"Article 107887\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2025-05-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biomedical Signal Processing and Control\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1746809425003982\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, BIOMEDICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical Signal Processing and Control","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1746809425003982","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
A Transformer utilizing bidirectional cross-attention for multi-modal classification of Age-Related Macular Degeneration
Age-related Macular Degeneration (AMD) ranks among the leading causes of blindness globally, especially among people over 50. Color fundus photograph (CFP) and optical coherence tomography (OCT) B-scan image are both widely applied in diagnostic phase of AMD. However, most existing multimodal approaches are based on traditional convolutional neural networks (CNN), which usually have limited local receptive fields when processing cross-modal information. Therefore, we propose a model that integrates CNN and transformer architectures for the diagnosis of AMD. Specifically, we first extract features through CNN to learn local representations of the images, and then employ bidirectional cross attention blocks with intramodal and intermodal attention. This allows the model to learn global representations from all input modalities, enabling it to capture long-range dependencies and enhance multimodal feature fusion through its global modeling capabilities. Moreover, we apply data augmentation for effective training. Our data augmentation approach leverages class activation mapping (CAM) as a conditional input to guide a GAN-based network in synthesizing high-resolution CFP and OCT images. Extensive experiments were conducted were carried out on a publicly available AMD dataset to assess the effectiveness of our model. Our method achieves an F1-score of 0.897 and an accuracy of 84.3% on the test set. The results indicate that our proposed approach significantly outperforms multiple baselines for multimodal AMD diagnosis.
期刊介绍:
Biomedical Signal Processing and Control aims to provide a cross-disciplinary international forum for the interchange of information on research in the measurement and analysis of signals and images in clinical medicine and the biological sciences. Emphasis is placed on contributions dealing with the practical, applications-led research on the use of methods and devices in clinical diagnosis, patient monitoring and management.
Biomedical Signal Processing and Control reflects the main areas in which these methods are being used and developed at the interface of both engineering and clinical science. The scope of the journal is defined to include relevant review papers, technical notes, short communications and letters. Tutorial papers and special issues will also be published.