MedIENet: medical image enhancement network based on conditional latent diffusion model.

IF 3.2 3区医学 Q2 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

BMC Medical Imaging Pub Date : 2025-09-26 DOI:10.1186/s12880-025-01909-5

Weizhen Yuan, Yue Feng, Tiancai Wen, Guancong Luo, Jiexin Liang, Qianshuai Sun, Shufen Liang

{"title":"MedIENet: medical image enhancement network based on conditional latent diffusion model.","authors":"Weizhen Yuan, Yue Feng, Tiancai Wen, Guancong Luo, Jiexin Liang, Qianshuai Sun, Shufen Liang","doi":"10.1186/s12880-025-01909-5","DOIUrl":null,"url":null,"abstract":"Background: Deep learning necessitates a substantial amount of data, yet obtaining sufficient medical images is difficult due to concerns about patient privacy and high collection costs.Methods: To address this issue, we propose a conditional latent diffusion model-based medical image enhancement network, referred to as the Medical Image Enhancement Network (MedIENet). To meet the rigorous standards required for image generation in the medical imaging field, a multi-attention module is incorporated in the encoder of the denoising U-Net backbone. Additionally Rotary Position Embedding (RoPE) is integrated into the self-attention module to effectively capture positional information, while cross-attention is utilised to embed integrate class information into the diffusion process.Results: MedIENet is evaluated on three datasets: Chest CT-Scan images, Chest X-Ray Images (Pneumonia), and Tongue dataset. Compared to existing methods, MedIENet demonstrates superior performance in both fidelity and diversity of the generated images. Experimental results indicate that for downstream classification tasks using ResNet50, the Area Under the Receiver Operating Characteristic curve (AUROC) achieved with real data alone is 0.76 for the Chest CT-Scan images dataset, 0.87 for the Chest X-Ray Images (Pneumonia) dataset, and 0.78 for the Tongue Dataset. When using mixed data consisting of real data and generated data, the AUROC improves to 0.82, 0.94, and 0.82, respectively, reflecting increases of approximately 6%, 7%, and 4%.Conclusion: These findings indicate that the images generated by MedIENet can enhance the performance of downstream classification tasks, providing an effective solution to the scarcity of medical image training data.","PeriodicalId":9020,"journal":{"name":"BMC Medical Imaging","volume":"25 1","pages":"372"},"PeriodicalIF":3.2000,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12465763/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Imaging","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12880-025-01909-5","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Deep learning necessitates a substantial amount of data, yet obtaining sufficient medical images is difficult due to concerns about patient privacy and high collection costs.

Methods: To address this issue, we propose a conditional latent diffusion model-based medical image enhancement network, referred to as the Medical Image Enhancement Network (MedIENet). To meet the rigorous standards required for image generation in the medical imaging field, a multi-attention module is incorporated in the encoder of the denoising U-Net backbone. Additionally Rotary Position Embedding (RoPE) is integrated into the self-attention module to effectively capture positional information, while cross-attention is utilised to embed integrate class information into the diffusion process.

Results: MedIENet is evaluated on three datasets: Chest CT-Scan images, Chest X-Ray Images (Pneumonia), and Tongue dataset. Compared to existing methods, MedIENet demonstrates superior performance in both fidelity and diversity of the generated images. Experimental results indicate that for downstream classification tasks using ResNet50, the Area Under the Receiver Operating Characteristic curve (AUROC) achieved with real data alone is 0.76 for the Chest CT-Scan images dataset, 0.87 for the Chest X-Ray Images (Pneumonia) dataset, and 0.78 for the Tongue Dataset. When using mixed data consisting of real data and generated data, the AUROC improves to 0.82, 0.94, and 0.82, respectively, reflecting increases of approximately 6%, 7%, and 4%.

Conclusion: These findings indicate that the images generated by MedIENet can enhance the performance of downstream classification tasks, providing an effective solution to the scarcity of medical image training data.

查看原文本刊更多论文

MedIENet：基于条件潜扩散模型的医学图像增强网络。

背景：深度学习需要大量的数据，但由于对患者隐私和高收集成本的担忧，很难获得足够的医学图像。方法：为了解决这一问题，我们提出了一种基于条件潜在扩散模型的医学图像增强网络，称为医学图像增强网络（MedIENet）。为了满足医学成像领域对图像生成的严格要求，在去噪U-Net骨干网的编码器中加入了多关注模块。此外，在自注意模块中集成了旋转位置嵌入（RoPE）来有效地捕获位置信息，而交叉注意则用于将集成的类信息嵌入到扩散过程中。结果：MedIENet在三个数据集上进行评估：胸部ct扫描图像，胸部x射线图像（肺炎）和舌头数据集。与现有方法相比，MedIENet在生成图像的保真度和多样性方面都表现出优异的性能。实验结果表明，对于使用ResNet50的下游分类任务，仅使用真实数据获得的接受者工作特征曲线下面积（AUROC）对于胸部ct扫描图像数据集为0.76，对于胸部x射线图像（肺炎）数据集为0.87，对于舌头数据集为0.78。当使用由真实数据和生成数据组成的混合数据时，AUROC分别提高到0.82、0.94和0.82，分别提高了约6%、7%和4%。结论：这些发现表明MedIENet生成的图像可以提高下游分类任务的性能，有效解决了医学图像训练数据稀缺的问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMC Medical Imaging RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING-

CiteScore

4.60

自引率

3.70%

发文量

198

审稿时长

27 weeks

期刊介绍： BMC Medical Imaging is an open access journal publishing original peer-reviewed research articles in the development, evaluation, and use of imaging techniques and image processing tools to diagnose and manage disease.