Controllable retinal image synthesis using conditional StyleGAN and latent space manipulation for improved diagnosis and grading of diabetic retinopathy

arXiv - EE - Image and Video Processing Pub Date : 2024-09-11 DOI:arxiv-2409.07422

Somayeh PakdelmoezDepartment of Biomedical Engineering, Amirkabir University of Technology, Tehran, Iran, Saba OmidikiaDepartment of Biomedical Engineering, Amirkabir University of Technology, Tehran, Iran, Seyyed Ali SeyyedsalehiDepartment of Biomedical Engineering, Amirkabir University of Technology, Tehran, Iran, Seyyede Zohreh SeyyedsalehiDepartment of Biomedical Engineering, Faculty of Health, Tehran Medical Sciences, Islamic Azad University, Tehran, Iran

{"title":"Controllable retinal image synthesis using conditional StyleGAN and latent space manipulation for improved diagnosis and grading of diabetic retinopathy","authors":"Somayeh PakdelmoezDepartment of Biomedical Engineering, Amirkabir University of Technology, Tehran, Iran, Saba OmidikiaDepartment of Biomedical Engineering, Amirkabir University of Technology, Tehran, Iran, Seyyed Ali SeyyedsalehiDepartment of Biomedical Engineering, Amirkabir University of Technology, Tehran, Iran, Seyyede Zohreh SeyyedsalehiDepartment of Biomedical Engineering, Faculty of Health, Tehran Medical Sciences, Islamic Azad University, Tehran, Iran","doi":"arxiv-2409.07422","DOIUrl":null,"url":null,"abstract":"Diabetic retinopathy (DR) is a consequence of diabetes mellitus characterized\nby vascular damage within the retinal tissue. Timely detection is paramount to\nmitigate the risk of vision loss. However, training robust grading models is\nhindered by a shortage of annotated data, particularly for severe cases. This\npaper proposes a framework for controllably generating high-fidelity and\ndiverse DR fundus images, thereby improving classifier performance in DR\ngrading and detection. We achieve comprehensive control over DR severity and\nvisual features (optic disc, vessel structure, lesion areas) within generated\nimages solely through a conditional StyleGAN, eliminating the need for feature\nmasks or auxiliary networks. Specifically, leveraging the SeFa algorithm to\nidentify meaningful semantics within the latent space, we manipulate the DR\nimages generated conditionally on grades, further enhancing the dataset\ndiversity. Additionally, we propose a novel, effective SeFa-based data\naugmentation strategy, helping the classifier focus on discriminative regions\nwhile ignoring redundant features. Using this approach, a ResNet50 model\ntrained for DR detection achieves 98.09% accuracy, 99.44% specificity, 99.45%\nprecision, and an F1-score of 98.09%. Moreover, incorporating synthetic images\ngenerated by conditional StyleGAN into ResNet50 training for DR grading yields\n83.33% accuracy, a quadratic kappa score of 87.64%, 95.67% specificity, and\n72.24% precision. Extensive experiments conducted on the APTOS 2019 dataset\ndemonstrate the exceptional realism of the generated images and the superior\nperformance of our classifier compared to recent studies.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"58 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Image and Video Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07422","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Diabetic retinopathy (DR) is a consequence of diabetes mellitus characterized by vascular damage within the retinal tissue. Timely detection is paramount to mitigate the risk of vision loss. However, training robust grading models is hindered by a shortage of annotated data, particularly for severe cases. This paper proposes a framework for controllably generating high-fidelity and diverse DR fundus images, thereby improving classifier performance in DR grading and detection. We achieve comprehensive control over DR severity and visual features (optic disc, vessel structure, lesion areas) within generated images solely through a conditional StyleGAN, eliminating the need for feature masks or auxiliary networks. Specifically, leveraging the SeFa algorithm to identify meaningful semantics within the latent space, we manipulate the DR images generated conditionally on grades, further enhancing the dataset diversity. Additionally, we propose a novel, effective SeFa-based data augmentation strategy, helping the classifier focus on discriminative regions while ignoring redundant features. Using this approach, a ResNet50 model trained for DR detection achieves 98.09% accuracy, 99.44% specificity, 99.45% precision, and an F1-score of 98.09%. Moreover, incorporating synthetic images generated by conditional StyleGAN into ResNet50 training for DR grading yields 83.33% accuracy, a quadratic kappa score of 87.64%, 95.67% specificity, and 72.24% precision. Extensive experiments conducted on the APTOS 2019 dataset demonstrate the exceptional realism of the generated images and the superior performance of our classifier compared to recent studies.

查看原文本刊更多论文

利用条件 StyleGAN 和潜空间操作进行可控视网膜图像合成，改进糖尿病视网膜病变的诊断和分级

糖尿病视网膜病变（DR）是糖尿病的一种后遗症，其特点是视网膜组织内的血管受损。及时检测对降低视力丧失的风险至关重要。然而，由于缺乏注释数据，尤其是严重病例的注释数据，训练稳健的分级模型受到了阻碍。本文提出了一种框架，用于可控地生成高保真和多样化的 DR 眼底图像，从而提高 DR 分级和检测中分类器的性能。我们仅通过条件式广域网（StyleGAN）就实现了对 DR 严重程度和生成图像中视觉特征（视盘、血管结构、病变区域）的全面控制，从而消除了对特征掩码或辅助网络的需求。具体来说，我们利用 SeFa 算法识别潜空间内有意义的语义，根据等级有条件地处理生成的 DR 图像，进一步增强了数据集的多样性。此外，我们还提出了一种新颖、有效的基于 SeFa 的数据分割策略，帮助分类器专注于有区分度的区域，同时忽略冗余特征。利用这种方法，针对 DR 检测训练的 ResNet50 模型达到了 98.09% 的准确率、99.44% 的特异性、99.45% 的精确性和 98.09% 的 F1 分数。此外，将有条件的 StyleGAN 生成的合成图像纳入用于 DR 分级的 ResNet50 训练，可获得 83.33% 的准确率、87.64% 的二次 kappa 分数、95.67% 的特异性和 72.24% 的精确度。在 APTOS 2019 数据集上进行的大量实验证明，生成的图像异常逼真，与近期的研究相比，我们的分类器性能更优。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - EE - Image and Video Processing

自引率

0.00%

发文量