Synthetic Breast Ultrasound Images: A Study to Overcome Medical Data Sharing Barriers.

IF 11 1区 综合性期刊 Q1 Multidisciplinary
Research Pub Date : 2024-12-03 eCollection Date: 2024-01-01 DOI:10.34133/research.0532
JiaLe Xu, Qing Hua, XiaoHong Jia, YuHang Zheng, Qiao Hu, BaoYan Bai, Juan Miao, LiSha Zhu, MeiXiang Zhang, RuoLin Tao, YuHeng Li, Ting Luo, Jun Xie, XueBin Zheng, PengChen Gu, FengYuan Xing, Chuan He, YanYan Song, YiJie Dong, ShuJun Xia, JianQiao Zhou
{"title":"Synthetic Breast Ultrasound Images: A Study to Overcome Medical Data Sharing Barriers.","authors":"JiaLe Xu, Qing Hua, XiaoHong Jia, YuHang Zheng, Qiao Hu, BaoYan Bai, Juan Miao, LiSha Zhu, MeiXiang Zhang, RuoLin Tao, YuHeng Li, Ting Luo, Jun Xie, XueBin Zheng, PengChen Gu, FengYuan Xing, Chuan He, YanYan Song, YiJie Dong, ShuJun Xia, JianQiao Zhou","doi":"10.34133/research.0532","DOIUrl":null,"url":null,"abstract":"<p><p>The vast potential of medical big data to enhance healthcare outcomes remains underutilized due to privacy concerns, which restrict cross-center data sharing and the construction of diverse, large-scale datasets. To address this challenge, we developed a deep generative model aimed at synthesizing medical data to overcome data sharing barriers, with a focus on breast ultrasound (US) image synthesis. Specifically, we introduce CoLDiT, a conditional latent diffusion model with a transformer backbone, to generate US images of breast lesions across various Breast Imaging Reporting and Data System (BI-RADS) categories. Using a training dataset of 9,705 US images from 5,243 patients across 202 hospitals with diverse US systems, CoLDiT generated breast US images without duplicating private information, as confirmed through nearest-neighbor analysis. Blinded reader studies further validated the realism of these images, with area under the receiver operating characteristic curve (AUC) scores ranging from 0.53 to 0.77. Additionally, synthetic breast US images effectively augmented the training dataset for BI-RADS classification, achieving performance comparable to that using an equal-sized training set comprising solely real images (<i>P</i> = 0.81 for AUC). Our findings suggest that synthetic data, such as CoLDiT-generated images, offer a viable, privacy-preserving solution to facilitate secure medical data sharing and advance the utilization of medical big data.</p>","PeriodicalId":21120,"journal":{"name":"Research","volume":"7 ","pages":"0532"},"PeriodicalIF":11.0000,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11612121/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.34133/research.0532","RegionNum":1,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"Multidisciplinary","Score":null,"Total":0}
引用次数: 0

Abstract

The vast potential of medical big data to enhance healthcare outcomes remains underutilized due to privacy concerns, which restrict cross-center data sharing and the construction of diverse, large-scale datasets. To address this challenge, we developed a deep generative model aimed at synthesizing medical data to overcome data sharing barriers, with a focus on breast ultrasound (US) image synthesis. Specifically, we introduce CoLDiT, a conditional latent diffusion model with a transformer backbone, to generate US images of breast lesions across various Breast Imaging Reporting and Data System (BI-RADS) categories. Using a training dataset of 9,705 US images from 5,243 patients across 202 hospitals with diverse US systems, CoLDiT generated breast US images without duplicating private information, as confirmed through nearest-neighbor analysis. Blinded reader studies further validated the realism of these images, with area under the receiver operating characteristic curve (AUC) scores ranging from 0.53 to 0.77. Additionally, synthetic breast US images effectively augmented the training dataset for BI-RADS classification, achieving performance comparable to that using an equal-sized training set comprising solely real images (P = 0.81 for AUC). Our findings suggest that synthetic data, such as CoLDiT-generated images, offer a viable, privacy-preserving solution to facilitate secure medical data sharing and advance the utilization of medical big data.

合成乳腺超声图像:克服医学数据共享障碍的研究
由于隐私问题,医疗大数据在提高医疗保健结果方面的巨大潜力仍未得到充分利用,这限制了跨中心数据共享和多样化、大规模数据集的构建。为了应对这一挑战,我们开发了一个深度生成模型,旨在综合医疗数据,以克服数据共享障碍,重点是乳房超声(US)图像合成。具体来说,我们引入了CoLDiT,一种具有变压器主干的条件潜伏扩散模型,以生成各种乳腺成像报告和数据系统(BI-RADS)类别的乳腺病变的超声图像。使用来自202家拥有不同美国系统的医院的5243名患者的9705张美国图像的训练数据集,CoLDiT生成的乳房美国图像没有复制私人信息,通过最近邻分析证实了这一点。盲法读者研究进一步验证了这些图像的真实性,受试者工作特征曲线下面积(area under receiver operating characteristic curve, AUC)得分在0.53 ~ 0.77之间。此外,合成乳腺US图像有效地增强了BI-RADS分类的训练数据集,其性能可与使用仅包含真实图像的等大小训练集相媲美(AUC的P = 0.81)。我们的研究结果表明,合成数据(如coldit生成的图像)提供了一种可行的、保护隐私的解决方案,可以促进安全的医疗数据共享,并推进医疗大数据的利用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Research
Research Multidisciplinary-Multidisciplinary
CiteScore
13.40
自引率
3.60%
发文量
0
审稿时长
14 weeks
期刊介绍: Research serves as a global platform for academic exchange, collaboration, and technological advancements. This journal welcomes high-quality research contributions from any domain, with open arms to authors from around the globe. Comprising fundamental research in the life and physical sciences, Research also highlights significant findings and issues in engineering and applied science. The journal proudly features original research articles, reviews, perspectives, and editorials, fostering a diverse and dynamic scholarly environment.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信