Zhanpeng Fan , Xiaoming Liu , Ying Zhang , Jia Zhang
{"title":"Multi-scale CNN-Swin transformer network with boundary supervision for multiclass biomarker segmentation in retinal OCT images","authors":"Zhanpeng Fan , Xiaoming Liu , Ying Zhang , Jia Zhang","doi":"10.1016/j.bbe.2025.05.002","DOIUrl":null,"url":null,"abstract":"<div><div>Retinal biomarker morphology is closely associated with a variety of chronic ophthalmic diseases, in which biomarker localization and segmentation in optical coherence tomography (OCT) play a key role in the diagnosis of retina-related diseases. Although great progress has been made in deep learning based OCT biomarker segmentation, several challenges still exist. Due to issues such as image noise or class imbalance, retinal biomarkers affect the model’s recognition of other biomarkers. Moreover, small biomarkers are prone to lose accuracy during downsampling. And most existing methods rely on convolutional neural networks, which make it challenging to obtain the global context due to locality of convolution. Benefiting from the Swin Transformer with powerful modeling capabilities, we propose MSCS-Net (Multi-scale CNN-Swin Network), a network for OCT biomarker segmentation, which effectively combines CNN and Swin Transformer and integrates them in parallel into a dual-encoder structure. Specifically, an edge detection path is added alongside to enhance the localization of biomarkers at the edges. For the Swin Transformer branch, considering the irregular distribution of most OCT biomarkers, a new windowing partition is performed in the Swin Transformer to capture the features more efficiently. Meanwhile, we design a Feature Dimensionality Reduction Module to extensively collect the information of small-scale biomarkers. To effectively integrate information from two scales, we design a Transformer Cross Fusion Module to finely fuse the global and local feature information from the two-branch encoders. We validate the proposed approach on local and public datasets, and the experimental results demonstrate the effectiveness of the proposed framework.</div></div>","PeriodicalId":55381,"journal":{"name":"Biocybernetics and Biomedical Engineering","volume":"45 3","pages":"Pages 340-356"},"PeriodicalIF":6.6000,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biocybernetics and Biomedical Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0208521625000300","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Retinal biomarker morphology is closely associated with a variety of chronic ophthalmic diseases, in which biomarker localization and segmentation in optical coherence tomography (OCT) play a key role in the diagnosis of retina-related diseases. Although great progress has been made in deep learning based OCT biomarker segmentation, several challenges still exist. Due to issues such as image noise or class imbalance, retinal biomarkers affect the model’s recognition of other biomarkers. Moreover, small biomarkers are prone to lose accuracy during downsampling. And most existing methods rely on convolutional neural networks, which make it challenging to obtain the global context due to locality of convolution. Benefiting from the Swin Transformer with powerful modeling capabilities, we propose MSCS-Net (Multi-scale CNN-Swin Network), a network for OCT biomarker segmentation, which effectively combines CNN and Swin Transformer and integrates them in parallel into a dual-encoder structure. Specifically, an edge detection path is added alongside to enhance the localization of biomarkers at the edges. For the Swin Transformer branch, considering the irregular distribution of most OCT biomarkers, a new windowing partition is performed in the Swin Transformer to capture the features more efficiently. Meanwhile, we design a Feature Dimensionality Reduction Module to extensively collect the information of small-scale biomarkers. To effectively integrate information from two scales, we design a Transformer Cross Fusion Module to finely fuse the global and local feature information from the two-branch encoders. We validate the proposed approach on local and public datasets, and the experimental results demonstrate the effectiveness of the proposed framework.
期刊介绍:
Biocybernetics and Biomedical Engineering is a quarterly journal, founded in 1981, devoted to publishing the results of original, innovative and creative research investigations in the field of Biocybernetics and biomedical engineering, which bridges mathematical, physical, chemical and engineering methods and technology to analyse physiological processes in living organisms as well as to develop methods, devices and systems used in biology and medicine, mainly in medical diagnosis, monitoring systems and therapy. The Journal''s mission is to advance scientific discovery into new or improved standards of care, and promotion a wide-ranging exchange between science and its application to humans.