{"title":"Synthetic Breast Ultrasound Images: A Study to Overcome Medical Data Sharing Barriers.","authors":"JiaLe Xu, Qing Hua, XiaoHong Jia, YuHang Zheng, Qiao Hu, BaoYan Bai, Juan Miao, LiSha Zhu, MeiXiang Zhang, RuoLin Tao, YuHeng Li, Ting Luo, Jun Xie, XueBin Zheng, PengChen Gu, FengYuan Xing, Chuan He, YanYan Song, YiJie Dong, ShuJun Xia, JianQiao Zhou","doi":"10.34133/research.0532","DOIUrl":null,"url":null,"abstract":"<p><p>The vast potential of medical big data to enhance healthcare outcomes remains underutilized due to privacy concerns, which restrict cross-center data sharing and the construction of diverse, large-scale datasets. To address this challenge, we developed a deep generative model aimed at synthesizing medical data to overcome data sharing barriers, with a focus on breast ultrasound (US) image synthesis. Specifically, we introduce CoLDiT, a conditional latent diffusion model with a transformer backbone, to generate US images of breast lesions across various Breast Imaging Reporting and Data System (BI-RADS) categories. Using a training dataset of 9,705 US images from 5,243 patients across 202 hospitals with diverse US systems, CoLDiT generated breast US images without duplicating private information, as confirmed through nearest-neighbor analysis. Blinded reader studies further validated the realism of these images, with area under the receiver operating characteristic curve (AUC) scores ranging from 0.53 to 0.77. Additionally, synthetic breast US images effectively augmented the training dataset for BI-RADS classification, achieving performance comparable to that using an equal-sized training set comprising solely real images (<i>P</i> = 0.81 for AUC). Our findings suggest that synthetic data, such as CoLDiT-generated images, offer a viable, privacy-preserving solution to facilitate secure medical data sharing and advance the utilization of medical big data.</p>","PeriodicalId":21120,"journal":{"name":"Research","volume":"7 ","pages":"0532"},"PeriodicalIF":11.0000,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11612121/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.34133/research.0532","RegionNum":1,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"Multidisciplinary","Score":null,"Total":0}
引用次数: 0
Abstract
The vast potential of medical big data to enhance healthcare outcomes remains underutilized due to privacy concerns, which restrict cross-center data sharing and the construction of diverse, large-scale datasets. To address this challenge, we developed a deep generative model aimed at synthesizing medical data to overcome data sharing barriers, with a focus on breast ultrasound (US) image synthesis. Specifically, we introduce CoLDiT, a conditional latent diffusion model with a transformer backbone, to generate US images of breast lesions across various Breast Imaging Reporting and Data System (BI-RADS) categories. Using a training dataset of 9,705 US images from 5,243 patients across 202 hospitals with diverse US systems, CoLDiT generated breast US images without duplicating private information, as confirmed through nearest-neighbor analysis. Blinded reader studies further validated the realism of these images, with area under the receiver operating characteristic curve (AUC) scores ranging from 0.53 to 0.77. Additionally, synthetic breast US images effectively augmented the training dataset for BI-RADS classification, achieving performance comparable to that using an equal-sized training set comprising solely real images (P = 0.81 for AUC). Our findings suggest that synthetic data, such as CoLDiT-generated images, offer a viable, privacy-preserving solution to facilitate secure medical data sharing and advance the utilization of medical big data.
期刊介绍:
Research serves as a global platform for academic exchange, collaboration, and technological advancements. This journal welcomes high-quality research contributions from any domain, with open arms to authors from around the globe.
Comprising fundamental research in the life and physical sciences, Research also highlights significant findings and issues in engineering and applied science. The journal proudly features original research articles, reviews, perspectives, and editorials, fostering a diverse and dynamic scholarly environment.