arXiv - EE - Image and Video Processing最新文献

筛选
英文 中文
Three-Dimensional, Multimodal Synchrotron Data for Machine Learning Applications 用于机器学习应用的三维多模态同步辐射数据
arXiv - EE - Image and Video Processing Pub Date : 2024-09-11 DOI: arxiv-2409.07322
Calum Green, Sharif Ahmed, Shashidhara Marathe, Liam Perera, Alberto Leonardi, Killian Gmyrek, Daniele Dini, James Le Houx
{"title":"Three-Dimensional, Multimodal Synchrotron Data for Machine Learning Applications","authors":"Calum Green, Sharif Ahmed, Shashidhara Marathe, Liam Perera, Alberto Leonardi, Killian Gmyrek, Daniele Dini, James Le Houx","doi":"arxiv-2409.07322","DOIUrl":"https://doi.org/arxiv-2409.07322","url":null,"abstract":"Machine learning techniques are being increasingly applied in medical and\u0000physical sciences across a variety of imaging modalities; however, an important\u0000issue when developing these tools is the availability of good quality training\u0000data. Here we present a unique, multimodal synchrotron dataset of a bespoke\u0000zinc-doped Zeolite 13X sample that can be used to develop advanced deep\u0000learning and data fusion pipelines. Multi-resolution micro X-ray computed\u0000tomography was performed on a zinc-doped Zeolite 13X fragment to characterise\u0000its pores and features, before spatially resolved X-ray diffraction computed\u0000tomography was carried out to characterise the homogeneous distribution of\u0000sodium and zinc phases. Zinc absorption was controlled to create a simple,\u0000spatially isolated, two-phase material. Both raw and processed data is\u0000available as a series of Zenodo entries. Altogether we present a spatially\u0000resolved, three-dimensional, multimodal, multi-resolution dataset that can be\u0000used for the development of machine learning techniques. Such techniques\u0000include development of super-resolution, multimodal data fusion, and 3D\u0000reconstruction algorithm development.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AC-IND: Sparse CT reconstruction based on attenuation coefficient estimation and implicit neural distribution AC-IND:基于衰减系数估计和隐式神经分布的稀疏 CT 重构
arXiv - EE - Image and Video Processing Pub Date : 2024-09-11 DOI: arxiv-2409.07171
Wangduo Xie, Richard Schoonhoven, Tristan van Leeuwen, Matthew B. Blaschko
{"title":"AC-IND: Sparse CT reconstruction based on attenuation coefficient estimation and implicit neural distribution","authors":"Wangduo Xie, Richard Schoonhoven, Tristan van Leeuwen, Matthew B. Blaschko","doi":"arxiv-2409.07171","DOIUrl":"https://doi.org/arxiv-2409.07171","url":null,"abstract":"Computed tomography (CT) reconstruction plays a crucial role in industrial\u0000nondestructive testing and medical diagnosis. Sparse view CT reconstruction\u0000aims to reconstruct high-quality CT images while only using a small number of\u0000projections, which helps to improve the detection speed of industrial assembly\u0000lines and is also meaningful for reducing radiation in medical scenarios.\u0000Sparse CT reconstruction methods based on implicit neural representations\u0000(INRs) have recently shown promising performance, but still produce artifacts\u0000because of the difficulty of obtaining useful prior information. In this work,\u0000we incorporate a powerful prior: the total number of material categories of\u0000objects. To utilize the prior, we design AC-IND, a self-supervised method based\u0000on Attenuation Coefficient Estimation and Implicit Neural Distribution.\u0000Specifically, our method first transforms the traditional INR from scalar\u0000mapping to probability distribution mapping. Then we design a compact\u0000attenuation coefficient estimator initialized with values from a rough\u0000reconstruction and fast segmentation. Finally, our algorithm finishes the CT\u0000reconstruction by jointly optimizing the estimator and the generated\u0000distribution. Through experiments, we find that our method not only outperforms\u0000the comparative methods in sparse CT reconstruction but also can automatically\u0000generate semantic segmentation maps.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Feature Importance in Pedestrian Intention Prediction: A Context-Aware Review 行人意向预测中的特征重要性:情境感知回顾
arXiv - EE - Image and Video Processing Pub Date : 2024-09-11 DOI: arxiv-2409.07645
Mohsen Azarmi, Mahdi Rezaei, He Wang, Ali Arabian
{"title":"Feature Importance in Pedestrian Intention Prediction: A Context-Aware Review","authors":"Mohsen Azarmi, Mahdi Rezaei, He Wang, Ali Arabian","doi":"arxiv-2409.07645","DOIUrl":"https://doi.org/arxiv-2409.07645","url":null,"abstract":"Recent advancements in predicting pedestrian crossing intentions for\u0000Autonomous Vehicles using Computer Vision and Deep Neural Networks are\u0000promising. However, the black-box nature of DNNs poses challenges in\u0000understanding how the model works and how input features contribute to final\u0000predictions. This lack of interpretability delimits the trust in model\u0000performance and hinders informed decisions on feature selection,\u0000representation, and model optimisation; thereby affecting the efficacy of\u0000future research in the field. To address this, we introduce Context-aware\u0000Permutation Feature Importance (CAPFI), a novel approach tailored for\u0000pedestrian intention prediction. CAPFI enables more interpretability and\u0000reliable assessments of feature importance by leveraging subdivided scenario\u0000contexts, mitigating the randomness of feature values through targeted\u0000shuffling. This aims to reduce variance and prevent biased estimations in\u0000importance scores during permutations. We divide the Pedestrian Intention\u0000Estimation (PIE) dataset into 16 comparable context sets, measure the baseline\u0000performance of five distinct neural network architectures for intention\u0000prediction in each context, and assess input feature importance using CAPFI. We\u0000observed nuanced differences among models across various contextual\u0000characteristics. The research reveals the critical role of pedestrian bounding\u0000boxes and ego-vehicle speed in predicting pedestrian intentions, and potential\u0000prediction biases due to the speed feature through cross-context permutation\u0000evaluation. We propose an alternative feature representation by considering\u0000proximity change rate for rendering dynamic pedestrian-vehicle locomotion,\u0000thereby enhancing the contributions of input features to intention prediction.\u0000These findings underscore the importance of contextual features and their\u0000diversity to develop accurate and robust intent-predictive models.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NVRC: Neural Video Representation Compression NVRC:神经视频表示压缩
arXiv - EE - Image and Video Processing Pub Date : 2024-09-11 DOI: arxiv-2409.07414
Ho Man Kwan, Ge Gao, Fan Zhang, Andrew Gower, David Bull
{"title":"NVRC: Neural Video Representation Compression","authors":"Ho Man Kwan, Ge Gao, Fan Zhang, Andrew Gower, David Bull","doi":"arxiv-2409.07414","DOIUrl":"https://doi.org/arxiv-2409.07414","url":null,"abstract":"Recent advances in implicit neural representation (INR)-based video coding\u0000have demonstrated its potential to compete with both conventional and other\u0000learning-based approaches. With INR methods, a neural network is trained to\u0000overfit a video sequence, with its parameters compressed to obtain a compact\u0000representation of the video content. However, although promising results have\u0000been achieved, the best INR-based methods are still out-performed by the latest\u0000standard codecs, such as VVC VTM, partially due to the simple model compression\u0000techniques employed. In this paper, rather than focusing on representation\u0000architectures as in many existing works, we propose a novel INR-based video\u0000compression framework, Neural Video Representation Compression (NVRC),\u0000targeting compression of the representation. Based on the novel entropy coding\u0000and quantization models proposed, NVRC, for the first time, is able to optimize\u0000an INR-based video codec in a fully end-to-end manner. To further minimize the\u0000additional bitrate overhead introduced by the entropy models, we have also\u0000proposed a new model compression framework for coding all the network,\u0000quantization and entropy model parameters hierarchically. Our experiments show\u0000that NVRC outperforms many conventional and learning-based benchmark codecs,\u0000with a 24% average coding gain over VVC VTM (Random Access) on the UVG dataset,\u0000measured in PSNR. As far as we are aware, this is the first time an INR-based\u0000video codec achieving such performance. The implementation of NVRC will be\u0000released at www.github.com.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fast Medical Shape Reconstruction via Meta-learned Implicit Neural Representations 通过元学习隐含神经表征实现快速医学形状重构
arXiv - EE - Image and Video Processing Pub Date : 2024-09-11 DOI: arxiv-2409.07100
Gaia Romana De Paolis, Dimitrios Lenis, Johannes Novotny, Maria Wimmer, Astrid Berg, Theresa Neubauer, Philip Matthias Winter, David Major, Ariharasudhan Muthusami, Gerald Schröcker, Martin Mienkina, Katja Bühler
{"title":"Fast Medical Shape Reconstruction via Meta-learned Implicit Neural Representations","authors":"Gaia Romana De Paolis, Dimitrios Lenis, Johannes Novotny, Maria Wimmer, Astrid Berg, Theresa Neubauer, Philip Matthias Winter, David Major, Ariharasudhan Muthusami, Gerald Schröcker, Martin Mienkina, Katja Bühler","doi":"arxiv-2409.07100","DOIUrl":"https://doi.org/arxiv-2409.07100","url":null,"abstract":"Efficient and fast reconstruction of anatomical structures plays a crucial\u0000role in clinical practice. Minimizing retrieval and processing times not only\u0000potentially enhances swift response and decision-making in critical scenarios\u0000but also supports interactive surgical planning and navigation. Recent methods\u0000attempt to solve the medical shape reconstruction problem by utilizing implicit\u0000neural functions. However, their performance suffers in terms of generalization\u0000and computation time, a critical metric for real-time applications. To address\u0000these challenges, we propose to leverage meta-learning to improve the network\u0000parameters initialization, reducing inference time by an order of magnitude\u0000while maintaining high accuracy. We evaluate our approach on three public\u0000datasets covering different anatomical shapes and modalities, namely CT and\u0000MRI. Our experimental results show that our model can handle various input\u0000configurations, such as sparse slices with different orientations and spacings.\u0000Additionally, we demonstrate that our method exhibits strong transferable\u0000capabilities in generalizing to shape domains unobserved at training time.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dual channel CW nnU-Net for 3D PET-CT Lesion Segmentation in 2024 autoPET III Challenge 2024 autoPET III 挑战赛中用于三维 PET-CT 病灶分割的双通道 CW nnU-Net
arXiv - EE - Image and Video Processing Pub Date : 2024-09-11 DOI: arxiv-2409.07144
Ching-Wei Wang, Ting-Sheng Su, Keng-Wei Liu
{"title":"Dual channel CW nnU-Net for 3D PET-CT Lesion Segmentation in 2024 autoPET III Challenge","authors":"Ching-Wei Wang, Ting-Sheng Su, Keng-Wei Liu","doi":"arxiv-2409.07144","DOIUrl":"https://doi.org/arxiv-2409.07144","url":null,"abstract":"PET/CT is extensively used in imaging malignant tumors because it highlights\u0000areas of increased glucose metabolism, indicative of cancerous activity.\u0000Accurate 3D lesion segmentation in PET/CT imaging is essential for effective\u0000oncological diagnostics and treatment planning. In this study, we developed an\u0000advanced 3D residual U-Net model for the Automated Lesion Segmentation in\u0000Whole-Body PET/CT - Multitracer Multicenter Generalization (autoPET III)\u0000Challenge, which will be held jointly with 2024 Medical Image Computing and\u0000Computer Assisted Intervention (MICCAI) conference at Marrakesh, Morocco.\u0000Proposed model incorporates a novel sample attention boosting technique to\u0000enhance segmentation performance by adjusting the contribution of challenging\u0000cases during training, improving generalization across FDG and PSMA tracers.\u0000The proposed model outperformed the challenge baseline model in the preliminary\u0000test set on the Grand Challenge platform, and our team is currently ranking in\u0000the 2nd place among 497 participants worldwide from 53 countries (accessed\u0000date: 2024/9/4), with Dice score of 0.8700, False Negative Volume of 19.3969\u0000and False Positive Volume of 1.0857.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Retinex-RAWMamba: Bridging Demosaicing and Denoising for Low-Light RAW Image Enhancement Retinex-RAWMamba:为低照度 RAW 图像增强架起去马赛克和去噪的桥梁
arXiv - EE - Image and Video Processing Pub Date : 2024-09-11 DOI: arxiv-2409.07040
Xianmin Chen, Peiliang Huang, Xiaoxu Feng, Dingwen Zhang, Longfei Han, Junwei Han
{"title":"Retinex-RAWMamba: Bridging Demosaicing and Denoising for Low-Light RAW Image Enhancement","authors":"Xianmin Chen, Peiliang Huang, Xiaoxu Feng, Dingwen Zhang, Longfei Han, Junwei Han","doi":"arxiv-2409.07040","DOIUrl":"https://doi.org/arxiv-2409.07040","url":null,"abstract":"Low-light image enhancement, particularly in cross-domain tasks such as\u0000mapping from the raw domain to the sRGB domain, remains a significant\u0000challenge. Many deep learning-based methods have been developed to address this\u0000issue and have shown promising results in recent years. However, single-stage\u0000methods, which attempt to unify the complex mapping across both domains,\u0000leading to limited denoising performance. In contrast, two-stage approaches\u0000typically decompose a raw image with color filter arrays (CFA) into a\u0000four-channel RGGB format before feeding it into a neural network. However, this\u0000strategy overlooks the critical role of demosaicing within the Image Signal\u0000Processing (ISP) pipeline, leading to color distortions under varying lighting\u0000conditions, especially in low-light scenarios. To address these issues, we\u0000design a novel Mamba scanning mechanism, called RAWMamba, to effectively handle\u0000raw images with different CFAs. Furthermore, we present a Retinex Decomposition\u0000Module (RDM) grounded in Retinex prior, which decouples illumination from\u0000reflectance to facilitate more effective denoising and automatic non-linear\u0000exposure correction. By bridging demosaicing and denoising, better raw image\u0000enhancement is achieved. Experimental evaluations conducted on public datasets\u0000SID and MCR demonstrate that our proposed RAWMamba achieves state-of-the-art\u0000performance on cross-domain mapping.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DS-ViT: Dual-Stream Vision Transformer for Cross-Task Distillation in Alzheimer's Early Diagnosis DS-ViT:用于阿尔茨海默氏症早期诊断中跨任务蒸馏的双流视觉转换器
arXiv - EE - Image and Video Processing Pub Date : 2024-09-11 DOI: arxiv-2409.07584
Ke Chen, Yifeng Wang, Yufei Zhou, Haohan Wang
{"title":"DS-ViT: Dual-Stream Vision Transformer for Cross-Task Distillation in Alzheimer's Early Diagnosis","authors":"Ke Chen, Yifeng Wang, Yufei Zhou, Haohan Wang","doi":"arxiv-2409.07584","DOIUrl":"https://doi.org/arxiv-2409.07584","url":null,"abstract":"In the field of Alzheimer's disease diagnosis, segmentation and\u0000classification tasks are inherently interconnected. Sharing knowledge between\u0000models for these tasks can significantly improve training efficiency,\u0000particularly when training data is scarce. However, traditional knowledge\u0000distillation techniques often struggle to bridge the gap between segmentation\u0000and classification due to the distinct nature of tasks and different model\u0000architectures. To address this challenge, we propose a dual-stream pipeline\u0000that facilitates cross-task and cross-architecture knowledge sharing. Our\u0000approach introduces a dual-stream embedding module that unifies feature\u0000representations from segmentation and classification models, enabling\u0000dimensional integration of these features to guide the classification model. We\u0000validated our method on multiple 3D datasets for Alzheimer's disease diagnosis,\u0000demonstrating significant improvements in classification performance,\u0000especially on small datasets. Furthermore, we extended our pipeline with a\u0000residual temporal attention mechanism for early diagnosis, utilizing images\u0000taken before the atrophy of patients' brain mass. This advancement shows\u0000promise in enabling diagnosis approximately six months earlier in mild and\u0000asymptomatic stages, offering critical time for intervention.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BLS-GAN: A Deep Layer Separation Framework for Eliminating Bone Overlap in Conventional Radiographs BLS-GAN:消除传统射线照片中骨重叠的深层分离框架
arXiv - EE - Image and Video Processing Pub Date : 2024-09-11 DOI: arxiv-2409.07304
Haolin Wang, Yafei Ou, Prasoon Ambalathankandy, Gen Ota, Pengyu Dai, Masayuki Ikebe, Kenji Suzuki, Tamotsu Kamishima
{"title":"BLS-GAN: A Deep Layer Separation Framework for Eliminating Bone Overlap in Conventional Radiographs","authors":"Haolin Wang, Yafei Ou, Prasoon Ambalathankandy, Gen Ota, Pengyu Dai, Masayuki Ikebe, Kenji Suzuki, Tamotsu Kamishima","doi":"arxiv-2409.07304","DOIUrl":"https://doi.org/arxiv-2409.07304","url":null,"abstract":"Conventional radiography is the widely used imaging technology in diagnosing,\u0000monitoring, and prognosticating musculoskeletal (MSK) diseases because of its\u0000easy availability, versatility, and cost-effectiveness. In conventional\u0000radiographs, bone overlaps are prevalent, and can impede the accurate\u0000assessment of bone characteristics by radiologists or algorithms, posing\u0000significant challenges to conventional and computer-aided diagnoses. This work\u0000initiated the study of a challenging scenario - bone layer separation in\u0000conventional radiographs, in which separate overlapped bone regions enable the\u0000independent assessment of the bone characteristics of each bone layer and lay\u0000the groundwork for MSK disease diagnosis and its automation. This work proposed\u0000a Bone Layer Separation GAN (BLS-GAN) framework that can produce high-quality\u0000bone layer images with reasonable bone characteristics and texture. This\u0000framework introduced a reconstructor based on conventional radiography imaging\u0000principles, which achieved efficient reconstruction and mitigates the recurrent\u0000calculations and training instability issues caused by soft tissue in the\u0000overlapped regions. Additionally, pre-training with synthetic images was\u0000implemented to enhance the stability of both the training process and the\u0000results. The generated images passed the visual Turing test, and improved\u0000performance in downstream tasks. This work affirms the feasibility of\u0000extracting bone layer images from conventional radiographs, which holds promise\u0000for leveraging bone layer separation technology to facilitate more\u0000comprehensive analytical research in MSK diagnosis, monitoring, and prognosis.\u0000Code and dataset will be made available.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Controllable retinal image synthesis using conditional StyleGAN and latent space manipulation for improved diagnosis and grading of diabetic retinopathy 利用条件 StyleGAN 和潜空间操作进行可控视网膜图像合成,改进糖尿病视网膜病变的诊断和分级
arXiv - EE - Image and Video Processing Pub Date : 2024-09-11 DOI: arxiv-2409.07422
Somayeh PakdelmoezDepartment of Biomedical Engineering, Amirkabir University of Technology, Tehran, Iran, Saba OmidikiaDepartment of Biomedical Engineering, Amirkabir University of Technology, Tehran, Iran, Seyyed Ali SeyyedsalehiDepartment of Biomedical Engineering, Amirkabir University of Technology, Tehran, Iran, Seyyede Zohreh SeyyedsalehiDepartment of Biomedical Engineering, Faculty of Health, Tehran Medical Sciences, Islamic Azad University, Tehran, Iran
{"title":"Controllable retinal image synthesis using conditional StyleGAN and latent space manipulation for improved diagnosis and grading of diabetic retinopathy","authors":"Somayeh PakdelmoezDepartment of Biomedical Engineering, Amirkabir University of Technology, Tehran, Iran, Saba OmidikiaDepartment of Biomedical Engineering, Amirkabir University of Technology, Tehran, Iran, Seyyed Ali SeyyedsalehiDepartment of Biomedical Engineering, Amirkabir University of Technology, Tehran, Iran, Seyyede Zohreh SeyyedsalehiDepartment of Biomedical Engineering, Faculty of Health, Tehran Medical Sciences, Islamic Azad University, Tehran, Iran","doi":"arxiv-2409.07422","DOIUrl":"https://doi.org/arxiv-2409.07422","url":null,"abstract":"Diabetic retinopathy (DR) is a consequence of diabetes mellitus characterized\u0000by vascular damage within the retinal tissue. Timely detection is paramount to\u0000mitigate the risk of vision loss. However, training robust grading models is\u0000hindered by a shortage of annotated data, particularly for severe cases. This\u0000paper proposes a framework for controllably generating high-fidelity and\u0000diverse DR fundus images, thereby improving classifier performance in DR\u0000grading and detection. We achieve comprehensive control over DR severity and\u0000visual features (optic disc, vessel structure, lesion areas) within generated\u0000images solely through a conditional StyleGAN, eliminating the need for feature\u0000masks or auxiliary networks. Specifically, leveraging the SeFa algorithm to\u0000identify meaningful semantics within the latent space, we manipulate the DR\u0000images generated conditionally on grades, further enhancing the dataset\u0000diversity. Additionally, we propose a novel, effective SeFa-based data\u0000augmentation strategy, helping the classifier focus on discriminative regions\u0000while ignoring redundant features. Using this approach, a ResNet50 model\u0000trained for DR detection achieves 98.09% accuracy, 99.44% specificity, 99.45%\u0000precision, and an F1-score of 98.09%. Moreover, incorporating synthetic images\u0000generated by conditional StyleGAN into ResNet50 training for DR grading yields\u000083.33% accuracy, a quadratic kappa score of 87.64%, 95.67% specificity, and\u000072.24% precision. Extensive experiments conducted on the APTOS 2019 dataset\u0000demonstrate the exceptional realism of the generated images and the superior\u0000performance of our classifier compared to recent studies.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信