利用基于ddpm的GenAI合成图像改进声带病理图像分类的可行性：一项初步研究。

IF 2.2 3区医学 Q2 OTORHINOLARYNGOLOGY

European Archives of Oto-Rhino-Laryngology Pub Date : 2025-08-01 Epub Date: 2025-05-17 DOI:10.1007/s00405-025-09443-4

Iman Khazrak, Shahryar Zainaee, Mostafa M Rezaee, Mehran Ghasemi, Robert C Green

{"title":"利用基于ddpm的GenAI合成图像改进声带病理图像分类的可行性：一项初步研究。","authors":"Iman Khazrak, Shahryar Zainaee, Mostafa M Rezaee, Mehran Ghasemi, Robert C Green","doi":"10.1007/s00405-025-09443-4","DOIUrl":null,"url":null,"abstract":"Background: Voice disorders (VD) are often linked to vocal fold structural pathologies (VFSP). Laryngeal imaging plays a vital role in assessing VFSPs and VD in clinical and research settings, but challenges like scarce and imbalanced datasets can limit the generalizability of findings. Denoising Diffusion Probabilistic Models (DDPMs), a subtype of Generative AI, has gained attention for its ability to generate high-quality and realistic synthetic images to address these challenges.Purpose: This study explores the feasibility of improving VFSP image classification by generating synthetic images using DDPMs.Methods: 404 laryngoscopic images depicting VF without and with VFSP were included. DDPMs were used to generate synthetic images to augment the original dataset. Two convolutional neural network architectures, VGG16 and ResNet50, were applied for model training. The models were initially trained only on the original dataset. Then, they were trained on the augmented datasets. Evaluation metrics were analyzed to assess the performance of the models for both binary classification (with/without VFSPs) and multi-class classification (seven specific VFSPs).Results: Realistic and high-quality synthetic images were generated for dataset augmentation. The model first failed to converge when trained only on the original dataset, but they successfully converged and achieved low loss and high accuracy when trained on the augmented datasets. The best performance was gained for both binary and multi-class classification when the models were trained on an augmented dataset.Conclusion: Generating realistic images of VFSP using DDPMs is feasible and can enhance the classification of VFSPs by an AI model and may support VD screening and diagnosis.","PeriodicalId":11952,"journal":{"name":"European Archives of Oto-Rhino-Laryngology","volume":" ","pages":"4139-4153"},"PeriodicalIF":2.2000,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12399448/pdf/","citationCount":"0","resultStr":"{\"title\":\"Feasibility of improving vocal fold pathology image classification with synthetic images generated by DDPM-based GenAI: a pilot study.\",\"authors\":\"Iman Khazrak, Shahryar Zainaee, Mostafa M Rezaee, Mehran Ghasemi, Robert C Green\",\"doi\":\"10.1007/s00405-025-09443-4\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Voice disorders (VD) are often linked to vocal fold structural pathologies (VFSP). Laryngeal imaging plays a vital role in assessing VFSPs and VD in clinical and research settings, but challenges like scarce and imbalanced datasets can limit the generalizability of findings. Denoising Diffusion Probabilistic Models (DDPMs), a subtype of Generative AI, has gained attention for its ability to generate high-quality and realistic synthetic images to address these challenges.Purpose: This study explores the feasibility of improving VFSP image classification by generating synthetic images using DDPMs.Methods: 404 laryngoscopic images depicting VF without and with VFSP were included. DDPMs were used to generate synthetic images to augment the original dataset. Two convolutional neural network architectures, VGG16 and ResNet50, were applied for model training. The models were initially trained only on the original dataset. Then, they were trained on the augmented datasets. Evaluation metrics were analyzed to assess the performance of the models for both binary classification (with/without VFSPs) and multi-class classification (seven specific VFSPs).Results: Realistic and high-quality synthetic images were generated for dataset augmentation. The model first failed to converge when trained only on the original dataset, but they successfully converged and achieved low loss and high accuracy when trained on the augmented datasets. The best performance was gained for both binary and multi-class classification when the models were trained on an augmented dataset.Conclusion: Generating realistic images of VFSP using DDPMs is feasible and can enhance the classification of VFSPs by an AI model and may support VD screening and diagnosis.\",\"PeriodicalId\":11952,\"journal\":{\"name\":\"European Archives of Oto-Rhino-Laryngology\",\"volume\":\" \",\"pages\":\"4139-4153\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2025-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12399448/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"European Archives of Oto-Rhino-Laryngology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1007/s00405-025-09443-4\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/5/17 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"OTORHINOLARYNGOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Archives of Oto-Rhino-Laryngology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00405-025-09443-4","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/5/17 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"OTORHINOLARYNGOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

背景：声音障碍（VD）通常与声带结构病变（VFSP）有关。在临床和研究中，喉部成像在评估vfsp和VD方面起着至关重要的作用，但缺乏和不平衡的数据集等挑战限制了结果的普遍性。降噪扩散概率模型（ddpm）是生成式人工智能的一个子类型，因其能够生成高质量和逼真的合成图像来解决这些挑战而受到关注。目的：探讨利用ddpm生成合成图像来改进VFSP图像分类的可行性。方法：选取404张喉镜图像，包括不伴和伴VFSP的VF。使用ddpm生成合成图像来增强原始数据集。采用VGG16和ResNet50两种卷积神经网络架构进行模型训练。这些模型最初只在原始数据集上进行训练。然后，他们在增强数据集上进行训练。对评价指标进行分析，以评估模型在二元分类（有/没有VFSPs）和多类别分类（7个特定的VFSPs）方面的性能。结果：生成了真实、高质量的合成图像，用于数据集增强。该模型仅在原始数据集上训练时无法收敛，但在增强数据集上训练时成功收敛，获得了低损失和高精度。当模型在增强数据集上训练时，二值分类和多类分类都获得了最好的性能。结论：利用ddpm生成VFSP的真实图像是可行的，可以增强VFSP的AI模型分类，为VD的筛查和诊断提供支持。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Feasibility of improving vocal fold pathology image classification with synthetic images generated by DDPM-based GenAI: a pilot study.

查看原文本刊更多论文

Feasibility of improving vocal fold pathology image classification with synthetic images generated by DDPM-based GenAI: a pilot study.

Background: Voice disorders (VD) are often linked to vocal fold structural pathologies (VFSP). Laryngeal imaging plays a vital role in assessing VFSPs and VD in clinical and research settings, but challenges like scarce and imbalanced datasets can limit the generalizability of findings. Denoising Diffusion Probabilistic Models (DDPMs), a subtype of Generative AI, has gained attention for its ability to generate high-quality and realistic synthetic images to address these challenges.

Purpose: This study explores the feasibility of improving VFSP image classification by generating synthetic images using DDPMs.

Methods: 404 laryngoscopic images depicting VF without and with VFSP were included. DDPMs were used to generate synthetic images to augment the original dataset. Two convolutional neural network architectures, VGG16 and ResNet50, were applied for model training. The models were initially trained only on the original dataset. Then, they were trained on the augmented datasets. Evaluation metrics were analyzed to assess the performance of the models for both binary classification (with/without VFSPs) and multi-class classification (seven specific VFSPs).

Results: Realistic and high-quality synthetic images were generated for dataset augmentation. The model first failed to converge when trained only on the original dataset, but they successfully converged and achieved low loss and high accuracy when trained on the augmented datasets. The best performance was gained for both binary and multi-class classification when the models were trained on an augmented dataset.

Conclusion: Generating realistic images of VFSP using DDPMs is feasible and can enhance the classification of VFSPs by an AI model and may support VD screening and diagnosis.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

European Archives of Oto-Rhino-Laryngology 医学-耳鼻喉科学

CiteScore

5.30

自引率

7.70%

发文量

537

审稿时长

2-4 weeks

期刊介绍： Official Journal of European Union of Medical Specialists – ORL Section and Board Official Journal of Confederation of European Oto-Rhino-Laryngology Head and Neck Surgery "European Archives of Oto-Rhino-Laryngology" publishes original clinical reports and clinically relevant experimental studies, as well as short communications presenting new results of special interest. With peer review by a respected international editorial board and prompt English-language publication, the journal provides rapid dissemination of information by authors from around the world. This particular feature makes it the journal of choice for readers who want to be informed about the continuing state of the art concerning basic sciences and the diagnosis and management of diseases of the head and neck on an international level. European Archives of Oto-Rhino-Laryngology was founded in 1864 as "Archiv für Ohrenheilkunde" by A. von Tröltsch, A. Politzer and H. Schwartze.