{"title":"MedIENet: medical image enhancement network based on conditional latent diffusion model.","authors":"Weizhen Yuan, Yue Feng, Tiancai Wen, Guancong Luo, Jiexin Liang, Qianshuai Sun, Shufen Liang","doi":"10.1186/s12880-025-01909-5","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Deep learning necessitates a substantial amount of data, yet obtaining sufficient medical images is difficult due to concerns about patient privacy and high collection costs.</p><p><strong>Methods: </strong>To address this issue, we propose a conditional latent diffusion model-based medical image enhancement network, referred to as the Medical Image Enhancement Network (MedIENet). To meet the rigorous standards required for image generation in the medical imaging field, a multi-attention module is incorporated in the encoder of the denoising U-Net backbone. Additionally Rotary Position Embedding (RoPE) is integrated into the self-attention module to effectively capture positional information, while cross-attention is utilised to embed integrate class information into the diffusion process.</p><p><strong>Results: </strong>MedIENet is evaluated on three datasets: Chest CT-Scan images, Chest X-Ray Images (Pneumonia), and Tongue dataset. Compared to existing methods, MedIENet demonstrates superior performance in both fidelity and diversity of the generated images. Experimental results indicate that for downstream classification tasks using ResNet50, the Area Under the Receiver Operating Characteristic curve (AUROC) achieved with real data alone is 0.76 for the Chest CT-Scan images dataset, 0.87 for the Chest X-Ray Images (Pneumonia) dataset, and 0.78 for the Tongue Dataset. When using mixed data consisting of real data and generated data, the AUROC improves to 0.82, 0.94, and 0.82, respectively, reflecting increases of approximately 6%, 7%, and 4%.</p><p><strong>Conclusion: </strong>These findings indicate that the images generated by MedIENet can enhance the performance of downstream classification tasks, providing an effective solution to the scarcity of medical image training data.</p>","PeriodicalId":9020,"journal":{"name":"BMC Medical Imaging","volume":"25 1","pages":"372"},"PeriodicalIF":3.2000,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12465763/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Imaging","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12880-025-01909-5","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Deep learning necessitates a substantial amount of data, yet obtaining sufficient medical images is difficult due to concerns about patient privacy and high collection costs.
Methods: To address this issue, we propose a conditional latent diffusion model-based medical image enhancement network, referred to as the Medical Image Enhancement Network (MedIENet). To meet the rigorous standards required for image generation in the medical imaging field, a multi-attention module is incorporated in the encoder of the denoising U-Net backbone. Additionally Rotary Position Embedding (RoPE) is integrated into the self-attention module to effectively capture positional information, while cross-attention is utilised to embed integrate class information into the diffusion process.
Results: MedIENet is evaluated on three datasets: Chest CT-Scan images, Chest X-Ray Images (Pneumonia), and Tongue dataset. Compared to existing methods, MedIENet demonstrates superior performance in both fidelity and diversity of the generated images. Experimental results indicate that for downstream classification tasks using ResNet50, the Area Under the Receiver Operating Characteristic curve (AUROC) achieved with real data alone is 0.76 for the Chest CT-Scan images dataset, 0.87 for the Chest X-Ray Images (Pneumonia) dataset, and 0.78 for the Tongue Dataset. When using mixed data consisting of real data and generated data, the AUROC improves to 0.82, 0.94, and 0.82, respectively, reflecting increases of approximately 6%, 7%, and 4%.
Conclusion: These findings indicate that the images generated by MedIENet can enhance the performance of downstream classification tasks, providing an effective solution to the scarcity of medical image training data.
期刊介绍:
BMC Medical Imaging is an open access journal publishing original peer-reviewed research articles in the development, evaluation, and use of imaging techniques and image processing tools to diagnose and manage disease.