使用深度学习模型检测早产儿视网膜病变：评估视觉变压器和ResNet架构

IF 2.5 4区计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

International Journal of Imaging Systems and Technology Pub Date : 2025-08-02 DOI:10.1002/ima.70174

Ibrahim Kocak, Sadık Etka Bayramoglu, Nihat Sayin, Lukman Thalib

{"title":"使用深度学习模型检测早产儿视网膜病变：评估视觉变压器和ResNet架构","authors":"Ibrahim Kocak, Sadık Etka Bayramoglu, Nihat Sayin, Lukman Thalib","doi":"10.1002/ima.70174","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>To evaluate the performance of Vision Transformer (ViT) and ResNet-50 in detecting Plus Disease (PD) on fundus color images and vascular segmented mask images of Retinopathy of Prematurity (ROP) patients. A dataset consisting of 1205 fundus color images of ROP patients was extracted from the registry of a leading Research Hospital in Istanbul. Using these fundus images, a second dataset of vascular segmented mask images was created with a U-net segmentation model. The performance of ViT and ResNet models in detecting Plus Disease was evaluated on both sets of images. External validation of the model performances was carried out using a public domain dataset. For fundus color images, ViT models performed better than ResNet in terms of accuracy (96.9% vs. 91.5%), precision (97.1% vs. 85.5%), and F1 score (96.9% vs. 92.2%). However, ResNet had a better recall rate (100% vs. 96.9%). For segmented images, all performance measures were better with ResNet than ViT: accuracy (91.5% vs. 82.7%), precision (85.5% vs. 82.9%), recall (100% vs. 92.3%), F1 scores (92.2% vs. 82.6%), and AUC (99.8% vs. 88.6%). The strong performance of the ViT on fundus color images highlights its potential as a promising model for PD detection. However, its higher computational cost suggests that further optimization will be needed in future research. ResNet-50, with its solid overall performance and perfect recall rate—ensuring no false negatives—appears to be an optimal choice for PD detection. Additionally, vascular segmentation did not provide any enhancement to the model performances.</p>\n </div>","PeriodicalId":14027,"journal":{"name":"International Journal of Imaging Systems and Technology","volume":"35 5","pages":""},"PeriodicalIF":2.5000,"publicationDate":"2025-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Detection of Plus Disease in Retinopathy of Prematurity Using Deep Learning Models: Evaluating Vision Transformers and ResNet Architectures\",\"authors\":\"Ibrahim Kocak, Sadık Etka Bayramoglu, Nihat Sayin, Lukman Thalib\",\"doi\":\"10.1002/ima.70174\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>To evaluate the performance of Vision Transformer (ViT) and ResNet-50 in detecting Plus Disease (PD) on fundus color images and vascular segmented mask images of Retinopathy of Prematurity (ROP) patients. A dataset consisting of 1205 fundus color images of ROP patients was extracted from the registry of a leading Research Hospital in Istanbul. Using these fundus images, a second dataset of vascular segmented mask images was created with a U-net segmentation model. The performance of ViT and ResNet models in detecting Plus Disease was evaluated on both sets of images. External validation of the model performances was carried out using a public domain dataset. For fundus color images, ViT models performed better than ResNet in terms of accuracy (96.9% vs. 91.5%), precision (97.1% vs. 85.5%), and F1 score (96.9% vs. 92.2%). However, ResNet had a better recall rate (100% vs. 96.9%). For segmented images, all performance measures were better with ResNet than ViT: accuracy (91.5% vs. 82.7%), precision (85.5% vs. 82.9%), recall (100% vs. 92.3%), F1 scores (92.2% vs. 82.6%), and AUC (99.8% vs. 88.6%). The strong performance of the ViT on fundus color images highlights its potential as a promising model for PD detection. However, its higher computational cost suggests that further optimization will be needed in future research. ResNet-50, with its solid overall performance and perfect recall rate—ensuring no false negatives—appears to be an optimal choice for PD detection. Additionally, vascular segmentation did not provide any enhancement to the model performances.</p>\\n </div>\",\"PeriodicalId\":14027,\"journal\":{\"name\":\"International Journal of Imaging Systems and Technology\",\"volume\":\"35 5\",\"pages\":\"\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2025-08-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Imaging Systems and Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/ima.70174\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Imaging Systems and Technology","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/ima.70174","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

评价Vision Transformer （ViT）和ResNet-50在早产儿视网膜病变（ROP）眼底彩色图像和血管分割掩膜图像上检测Plus Disease （PD）的性能。从伊斯坦布尔一家领先的研究医院的注册表中提取了1205张ROP患者眼底彩色图像的数据集。利用这些眼底图像，利用U-net分割模型创建了血管分割的第二个掩膜图像数据集。在两组图像上评价了ViT和ResNet模型检测Plus疾病的性能。使用公共领域数据集对模型性能进行外部验证。对于眼底彩色图像，ViT模型在准确率（96.9% vs. 91.5%）、精密度（97.1% vs. 85.5%）和F1评分（96.9% vs. 92.2%）方面均优于ResNet。然而，ResNet的召回率更高（100% vs. 96.9%）。对于分割图像，ResNet的所有性能指标都优于ViT：准确率（91.5% vs. 82.7%）、精确度（85.5% vs. 82.9%）、召回率（100% vs. 92.3%）、F1分数（92.2% vs. 82.6%）和AUC （99.8% vs. 88.6%）。ViT在眼底彩色图像上的出色表现突出了它作为PD检测模型的潜力。然而，其较高的计算成本表明，在未来的研究中还需要进一步优化。ResNet-50，凭借其坚实的整体性能和完美的召回率-确保没有假阴性-似乎是PD检测的最佳选择。此外，血管分割对模型性能没有任何增强作用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Detection of Plus Disease in Retinopathy of Prematurity Using Deep Learning Models: Evaluating Vision Transformers and ResNet Architectures

To evaluate the performance of Vision Transformer (ViT) and ResNet-50 in detecting Plus Disease (PD) on fundus color images and vascular segmented mask images of Retinopathy of Prematurity (ROP) patients. A dataset consisting of 1205 fundus color images of ROP patients was extracted from the registry of a leading Research Hospital in Istanbul. Using these fundus images, a second dataset of vascular segmented mask images was created with a U-net segmentation model. The performance of ViT and ResNet models in detecting Plus Disease was evaluated on both sets of images. External validation of the model performances was carried out using a public domain dataset. For fundus color images, ViT models performed better than ResNet in terms of accuracy (96.9% vs. 91.5%), precision (97.1% vs. 85.5%), and F1 score (96.9% vs. 92.2%). However, ResNet had a better recall rate (100% vs. 96.9%). For segmented images, all performance measures were better with ResNet than ViT: accuracy (91.5% vs. 82.7%), precision (85.5% vs. 82.9%), recall (100% vs. 92.3%), F1 scores (92.2% vs. 82.6%), and AUC (99.8% vs. 88.6%). The strong performance of the ViT on fundus color images highlights its potential as a promising model for PD detection. However, its higher computational cost suggests that further optimization will be needed in future research. ResNet-50, with its solid overall performance and perfect recall rate—ensuring no false negatives—appears to be an optimal choice for PD detection. Additionally, vascular segmentation did not provide any enhancement to the model performances.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Imaging Systems and Technology 工程技术-成像科学与照相技术

CiteScore

6.90

自引率

6.10%

发文量

138

审稿时长

3 months

期刊介绍： The International Journal of Imaging Systems and Technology (IMA) is a forum for the exchange of ideas and results relevant to imaging systems, including imaging physics and informatics. The journal covers all imaging modalities in humans and animals. IMA accepts technically sound and scientifically rigorous research in the interdisciplinary field of imaging, including relevant algorithmic research and hardware and software development, and their applications relevant to medical research. The journal provides a platform to publish original research in structural and functional imaging. The journal is also open to imaging studies of the human body and on animals that describe novel diagnostic imaging and analyses methods. Technical, theoretical, and clinical research in both normal and clinical populations is encouraged. Submissions describing methods, software, databases, replication studies as well as negative results are also considered. The scope of the journal includes, but is not limited to, the following in the context of biomedical research: Imaging and neuro-imaging modalities: structural MRI, functional MRI, PET, SPECT, CT, ultrasound, EEG, MEG, NIRS etc.; Neuromodulation and brain stimulation techniques such as TMS and tDCS; Software and hardware for imaging, especially related to human and animal health; Image segmentation in normal and clinical populations; Pattern analysis and classification using machine learning techniques; Computational modeling and analysis; Brain connectivity and connectomics; Systems-level characterization of brain function; Neural networks and neurorobotics; Computer vision, based on human/animal physiology; Brain-computer interface (BCI) technology; Big data, databasing and data mining.