利用基于ddpm的GenAI合成图像改进声带病理图像分类的可行性:一项初步研究。

IF 1.9 3区 医学 Q2 OTORHINOLARYNGOLOGY
Iman Khazrak, Shahryar Zainaee, Mostafa M Rezaee, Mehran Ghasemi, Robert C Green
{"title":"利用基于ddpm的GenAI合成图像改进声带病理图像分类的可行性:一项初步研究。","authors":"Iman Khazrak, Shahryar Zainaee, Mostafa M Rezaee, Mehran Ghasemi, Robert C Green","doi":"10.1007/s00405-025-09443-4","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Voice disorders (VD) are often linked to vocal fold structural pathologies (VFSP). Laryngeal imaging plays a vital role in assessing VFSPs and VD in clinical and research settings, but challenges like scarce and imbalanced datasets can limit the generalizability of findings. Denoising Diffusion Probabilistic Models (DDPMs), a subtype of Generative AI, has gained attention for its ability to generate high-quality and realistic synthetic images to address these challenges.</p><p><strong>Purpose: </strong>This study explores the feasibility of improving VFSP image classification by generating synthetic images using DDPMs.</p><p><strong>Methods: </strong>404 laryngoscopic images depicting VF without and with VFSP were included. DDPMs were used to generate synthetic images to augment the original dataset. Two convolutional neural network architectures, VGG16 and ResNet50, were applied for model training. The models were initially trained only on the original dataset. Then, they were trained on the augmented datasets. Evaluation metrics were analyzed to assess the performance of the models for both binary classification (with/without VFSPs) and multi-class classification (seven specific VFSPs).</p><p><strong>Results: </strong>Realistic and high-quality synthetic images were generated for dataset augmentation. The model first failed to converge when trained only on the original dataset, but they successfully converged and achieved low loss and high accuracy when trained on the augmented datasets. The best performance was gained for both binary and multi-class classification when the models were trained on an augmented dataset.</p><p><strong>Conclusion: </strong>Generating realistic images of VFSP using DDPMs is feasible and can enhance the classification of VFSPs by an AI model and may support VD screening and diagnosis.</p>","PeriodicalId":11952,"journal":{"name":"European Archives of Oto-Rhino-Laryngology","volume":" ","pages":""},"PeriodicalIF":1.9000,"publicationDate":"2025-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Feasibility of improving vocal fold pathology image classification with synthetic images generated by DDPM-based GenAI: a pilot study.\",\"authors\":\"Iman Khazrak, Shahryar Zainaee, Mostafa M Rezaee, Mehran Ghasemi, Robert C Green\",\"doi\":\"10.1007/s00405-025-09443-4\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Voice disorders (VD) are often linked to vocal fold structural pathologies (VFSP). Laryngeal imaging plays a vital role in assessing VFSPs and VD in clinical and research settings, but challenges like scarce and imbalanced datasets can limit the generalizability of findings. Denoising Diffusion Probabilistic Models (DDPMs), a subtype of Generative AI, has gained attention for its ability to generate high-quality and realistic synthetic images to address these challenges.</p><p><strong>Purpose: </strong>This study explores the feasibility of improving VFSP image classification by generating synthetic images using DDPMs.</p><p><strong>Methods: </strong>404 laryngoscopic images depicting VF without and with VFSP were included. DDPMs were used to generate synthetic images to augment the original dataset. Two convolutional neural network architectures, VGG16 and ResNet50, were applied for model training. The models were initially trained only on the original dataset. Then, they were trained on the augmented datasets. Evaluation metrics were analyzed to assess the performance of the models for both binary classification (with/without VFSPs) and multi-class classification (seven specific VFSPs).</p><p><strong>Results: </strong>Realistic and high-quality synthetic images were generated for dataset augmentation. The model first failed to converge when trained only on the original dataset, but they successfully converged and achieved low loss and high accuracy when trained on the augmented datasets. The best performance was gained for both binary and multi-class classification when the models were trained on an augmented dataset.</p><p><strong>Conclusion: </strong>Generating realistic images of VFSP using DDPMs is feasible and can enhance the classification of VFSPs by an AI model and may support VD screening and diagnosis.</p>\",\"PeriodicalId\":11952,\"journal\":{\"name\":\"European Archives of Oto-Rhino-Laryngology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2025-05-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"European Archives of Oto-Rhino-Laryngology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1007/s00405-025-09443-4\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"OTORHINOLARYNGOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Archives of Oto-Rhino-Laryngology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00405-025-09443-4","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"OTORHINOLARYNGOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

背景:声音障碍(VD)通常与声带结构病变(VFSP)有关。在临床和研究中,喉部成像在评估vfsp和VD方面起着至关重要的作用,但缺乏和不平衡的数据集等挑战限制了结果的普遍性。降噪扩散概率模型(ddpm)是生成式人工智能的一个子类型,因其能够生成高质量和逼真的合成图像来解决这些挑战而受到关注。目的:探讨利用ddpm生成合成图像来改进VFSP图像分类的可行性。方法:选取404张喉镜图像,包括不伴和伴VFSP的VF。使用ddpm生成合成图像来增强原始数据集。采用VGG16和ResNet50两种卷积神经网络架构进行模型训练。这些模型最初只在原始数据集上进行训练。然后,他们在增强数据集上进行训练。对评价指标进行分析,以评估模型在二元分类(有/没有VFSPs)和多类别分类(7个特定的VFSPs)方面的性能。结果:生成了真实、高质量的合成图像,用于数据集增强。该模型仅在原始数据集上训练时无法收敛,但在增强数据集上训练时成功收敛,获得了低损失和高精度。当模型在增强数据集上训练时,二值分类和多类分类都获得了最好的性能。结论:利用ddpm生成VFSP的真实图像是可行的,可以增强VFSP的AI模型分类,为VD的筛查和诊断提供支持。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Feasibility of improving vocal fold pathology image classification with synthetic images generated by DDPM-based GenAI: a pilot study.

Background: Voice disorders (VD) are often linked to vocal fold structural pathologies (VFSP). Laryngeal imaging plays a vital role in assessing VFSPs and VD in clinical and research settings, but challenges like scarce and imbalanced datasets can limit the generalizability of findings. Denoising Diffusion Probabilistic Models (DDPMs), a subtype of Generative AI, has gained attention for its ability to generate high-quality and realistic synthetic images to address these challenges.

Purpose: This study explores the feasibility of improving VFSP image classification by generating synthetic images using DDPMs.

Methods: 404 laryngoscopic images depicting VF without and with VFSP were included. DDPMs were used to generate synthetic images to augment the original dataset. Two convolutional neural network architectures, VGG16 and ResNet50, were applied for model training. The models were initially trained only on the original dataset. Then, they were trained on the augmented datasets. Evaluation metrics were analyzed to assess the performance of the models for both binary classification (with/without VFSPs) and multi-class classification (seven specific VFSPs).

Results: Realistic and high-quality synthetic images were generated for dataset augmentation. The model first failed to converge when trained only on the original dataset, but they successfully converged and achieved low loss and high accuracy when trained on the augmented datasets. The best performance was gained for both binary and multi-class classification when the models were trained on an augmented dataset.

Conclusion: Generating realistic images of VFSP using DDPMs is feasible and can enhance the classification of VFSPs by an AI model and may support VD screening and diagnosis.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
5.30
自引率
7.70%
发文量
537
审稿时长
2-4 weeks
期刊介绍: Official Journal of European Union of Medical Specialists – ORL Section and Board Official Journal of Confederation of European Oto-Rhino-Laryngology Head and Neck Surgery "European Archives of Oto-Rhino-Laryngology" publishes original clinical reports and clinically relevant experimental studies, as well as short communications presenting new results of special interest. With peer review by a respected international editorial board and prompt English-language publication, the journal provides rapid dissemination of information by authors from around the world. This particular feature makes it the journal of choice for readers who want to be informed about the continuing state of the art concerning basic sciences and the diagnosis and management of diseases of the head and neck on an international level. European Archives of Oto-Rhino-Laryngology was founded in 1864 as "Archiv für Ohrenheilkunde" by A. von Tröltsch, A. Politzer and H. Schwartze.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信