Jin Qian, Lei Tao, Changhao Gong, Jun Xu, Yuemei Luo
{"title":"Classifying retinal diseases via pyramid vision graph convolutional network for optical coherence tomography images.","authors":"Jin Qian, Lei Tao, Changhao Gong, Jun Xu, Yuemei Luo","doi":"10.1364/BOE.558731","DOIUrl":null,"url":null,"abstract":"<p><p>Recent advancements have seen a significant focus on using deep neural networks for classifying retinal diseases in optical coherence tomography (OCT) images. However, traditional deep neural networks treat images as grid or sequential structures, limiting their flexibility in capturing irregular and complex objects, resulting in suboptimal performance in practical applications. To address this issue, we propose a novel visual neural network model with a pyramid structure, called pyramid vision graph convolutional networks (PVGCN). This model enhances the correlations between structures by segmenting images into multiple nodes and connecting the nearest nodes. Specifically, it consists of two core components: 1) vision graph block and 2) pyramid structure. The vision graph block, composed of a grapher block and a feed-forward network (FFN), uses graph convolution methods to divide the image into multiple regions, treating them as nodes and representing the image as graph data. The graph constructed based on nodes can capture relationships between nodes without positional restrictions, better representing the irregular structure of retinal tissue. The FFN module improves the over-smoothing phenomenon in the grapher stage, enabling more accurate classification. The pyramid structure decomposes OCT images into a series of sub-images at different scales, integrating features at different scales to obtain a comprehensive feature representation of retinal hierarchical structure information. This structure can replace the extraction of higher-dimensional features in a large model by integrating features at different scales, significantly reducing the number of parameters. We conducted extensive experiments on two different datasets. The experimental results show that the proposed PVGCN achieved accuracies of 0.9954 and 0.9787 on the two datasets, respectively, surpassing existing methods. Additionally, the model demonstrated recognition capabilities comparable to those of human experts in the experiments, effectively identifying retinal diseases in OCT images.</p>","PeriodicalId":8969,"journal":{"name":"Biomedical optics express","volume":"16 6","pages":"2312-2326"},"PeriodicalIF":3.2000,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12265600/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical optics express","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1364/BOE.558731","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/6/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Recent advancements have seen a significant focus on using deep neural networks for classifying retinal diseases in optical coherence tomography (OCT) images. However, traditional deep neural networks treat images as grid or sequential structures, limiting their flexibility in capturing irregular and complex objects, resulting in suboptimal performance in practical applications. To address this issue, we propose a novel visual neural network model with a pyramid structure, called pyramid vision graph convolutional networks (PVGCN). This model enhances the correlations between structures by segmenting images into multiple nodes and connecting the nearest nodes. Specifically, it consists of two core components: 1) vision graph block and 2) pyramid structure. The vision graph block, composed of a grapher block and a feed-forward network (FFN), uses graph convolution methods to divide the image into multiple regions, treating them as nodes and representing the image as graph data. The graph constructed based on nodes can capture relationships between nodes without positional restrictions, better representing the irregular structure of retinal tissue. The FFN module improves the over-smoothing phenomenon in the grapher stage, enabling more accurate classification. The pyramid structure decomposes OCT images into a series of sub-images at different scales, integrating features at different scales to obtain a comprehensive feature representation of retinal hierarchical structure information. This structure can replace the extraction of higher-dimensional features in a large model by integrating features at different scales, significantly reducing the number of parameters. We conducted extensive experiments on two different datasets. The experimental results show that the proposed PVGCN achieved accuracies of 0.9954 and 0.9787 on the two datasets, respectively, surpassing existing methods. Additionally, the model demonstrated recognition capabilities comparable to those of human experts in the experiments, effectively identifying retinal diseases in OCT images.
期刊介绍:
The journal''s scope encompasses fundamental research, technology development, biomedical studies and clinical applications. BOEx focuses on the leading edge topics in the field, including:
Tissue optics and spectroscopy
Novel microscopies
Optical coherence tomography
Diffuse and fluorescence tomography
Photoacoustic and multimodal imaging
Molecular imaging and therapies
Nanophotonic biosensing
Optical biophysics/photobiology
Microfluidic optical devices
Vision research.