Yannuo Wen, Kathleen M Curran, Xinzhu Wang, Nuala A Healy, John J Healy
{"title":"Synthesizing breast cancer ultrasound images from healthy samples using latent diffusion models.","authors":"Yannuo Wen, Kathleen M Curran, Xinzhu Wang, Nuala A Healy, John J Healy","doi":"10.1117/1.JMI.13.2.024002","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Breast ultrasound is widely used for cancer screening, but data scarcity and annotation challenges hinder deep learning adoption. Synthetic image generation offers a promising solution to enhance training datasets while preserving patient privacy. However, problems such as inadequate quality of synthesized images and the need for large amounts of data to train the synthesis models remain significant.</p><p><strong>Approach: </strong>We propose a three-stage latent diffusion model (LDM) workflow-enhanced by Vision Transformers and fine-tuned with low-rank adaptation-that synthesizes realistic malignant and benign breast ultrasound images directly from healthy samples while simultaneously generating accurate segmentation masks. Stage division significantly reduces the task complexity of a single synthesis model. Applied to the BUSI dataset (133 healthy, 487 benign, and 210 malignant images), the method generates synthetic cases of each tumor type.</p><p><strong>Results: </strong>A ResNet101 classifier could not reliably distinguish synthetic from real images (AUC = 0.563), indicating high visual plausibility. Quantitative metrics confirmed strong fidelity: Fréchet inception distance = 15.2 and inception score = 1.79, indicating low distributional divergence in feature space and high similarity to real data. When used for training a U-Net segmentation model, the augmented dataset improved the <math><mrow><mi>F</mi> <mn>1</mn></mrow> </math> -score from 0.870 to 0.896, demonstrating substantial gains in diagnostic accuracy.</p><p><strong>Conclusions: </strong>These results show that the proposed three-stage LDM can generate high-quality, anatomically coherent breast cancer images from healthy controls, effectively alleviating data scarcity and enabling more robust training of medical AI systems without compromising clinical realism.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"13 2","pages":"024002"},"PeriodicalIF":1.7000,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12999972/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Imaging","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1117/1.JMI.13.2.024002","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/3/19 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: Breast ultrasound is widely used for cancer screening, but data scarcity and annotation challenges hinder deep learning adoption. Synthetic image generation offers a promising solution to enhance training datasets while preserving patient privacy. However, problems such as inadequate quality of synthesized images and the need for large amounts of data to train the synthesis models remain significant.
Approach: We propose a three-stage latent diffusion model (LDM) workflow-enhanced by Vision Transformers and fine-tuned with low-rank adaptation-that synthesizes realistic malignant and benign breast ultrasound images directly from healthy samples while simultaneously generating accurate segmentation masks. Stage division significantly reduces the task complexity of a single synthesis model. Applied to the BUSI dataset (133 healthy, 487 benign, and 210 malignant images), the method generates synthetic cases of each tumor type.
Results: A ResNet101 classifier could not reliably distinguish synthetic from real images (AUC = 0.563), indicating high visual plausibility. Quantitative metrics confirmed strong fidelity: Fréchet inception distance = 15.2 and inception score = 1.79, indicating low distributional divergence in feature space and high similarity to real data. When used for training a U-Net segmentation model, the augmented dataset improved the -score from 0.870 to 0.896, demonstrating substantial gains in diagnostic accuracy.
Conclusions: These results show that the proposed three-stage LDM can generate high-quality, anatomically coherent breast cancer images from healthy controls, effectively alleviating data scarcity and enabling more robust training of medical AI systems without compromising clinical realism.
期刊介绍:
JMI covers fundamental and translational research, as well as applications, focused on medical imaging, which continue to yield physical and biomedical advancements in the early detection, diagnostics, and therapy of disease as well as in the understanding of normal. The scope of JMI includes: Imaging physics, Tomographic reconstruction algorithms (such as those in CT and MRI), Image processing and deep learning, Computer-aided diagnosis and quantitative image analysis, Visualization and modeling, Picture archiving and communications systems (PACS), Image perception and observer performance, Technology assessment, Ultrasonic imaging, Image-guided procedures, Digital pathology, Biomedical applications of biomedical imaging. JMI allows for the peer-reviewed communication and archiving of scientific developments, translational and clinical applications, reviews, and recommendations for the field.