Preserving privacy in healthcare: A systematic review of deep learning approaches for synthetic data generation

IF 4.9 2区 医学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Yintong Liu , U. Rajendra Acharya , Jen Hong Tan
{"title":"Preserving privacy in healthcare: A systematic review of deep learning approaches for synthetic data generation","authors":"Yintong Liu ,&nbsp;U. Rajendra Acharya ,&nbsp;Jen Hong Tan","doi":"10.1016/j.cmpb.2024.108571","DOIUrl":null,"url":null,"abstract":"<div><h3>Background:</h3><div>Data sharing in healthcare is vital for advancing research and personalized medicine. However, the process is hindered by privacy, ethical, and legal challenges associated with patient data. Synthetic data generation emerges as a promising solution, replicating statistical properties of real data while enhancing privacy protection.</div></div><div><h3>Methods:</h3><div>This systematic review examines deep learning techniques for synthetic data generation in healthcare, focusing on their ability to maintain data utility and enhance privacy. Studies from Scopus, Web of Science, PubMed, and IEEE databases published between 2019 and 2023 were analyzed. Key methods explored include Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Diffusion Models. Evaluation metrics encompass data resemblance, utility, and privacy preservation, with special attention to privacy-enhancing methods like differential privacy and federated learning.</div></div><div><h3>Results:</h3><div>GANs and VAEs demonstrated robust capabilities in generating realistic synthetic data for tabular, signal, image, and multi-modal datasets. Privacy-preserving approaches such as differential privacy and adversarial training significantly reduced re-identification risks while maintaining data fidelity. However, challenges persist in preserving temporal correlations, reducing biases, and aligning with regulatory frameworks, particularly for longitudinal and high-dimensional data.</div></div><div><h3>Conclusion:</h3><div>Synthetic data generation holds significant potential for privacy-preserving data sharing in healthcare. Ongoing research is required to develop advanced algorithms and evaluation frameworks, ensuring synthetic data’s quality and privacy. Collaboration between technologists and policymakers is essential to create comprehensive guidelines, fostering secure and effective data sharing in healthcare.</div></div>","PeriodicalId":10624,"journal":{"name":"Computer methods and programs in biomedicine","volume":"260 ","pages":"Article 108571"},"PeriodicalIF":4.9000,"publicationDate":"2024-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169260724005649","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Background:

Data sharing in healthcare is vital for advancing research and personalized medicine. However, the process is hindered by privacy, ethical, and legal challenges associated with patient data. Synthetic data generation emerges as a promising solution, replicating statistical properties of real data while enhancing privacy protection.

Methods:

This systematic review examines deep learning techniques for synthetic data generation in healthcare, focusing on their ability to maintain data utility and enhance privacy. Studies from Scopus, Web of Science, PubMed, and IEEE databases published between 2019 and 2023 were analyzed. Key methods explored include Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Diffusion Models. Evaluation metrics encompass data resemblance, utility, and privacy preservation, with special attention to privacy-enhancing methods like differential privacy and federated learning.

Results:

GANs and VAEs demonstrated robust capabilities in generating realistic synthetic data for tabular, signal, image, and multi-modal datasets. Privacy-preserving approaches such as differential privacy and adversarial training significantly reduced re-identification risks while maintaining data fidelity. However, challenges persist in preserving temporal correlations, reducing biases, and aligning with regulatory frameworks, particularly for longitudinal and high-dimensional data.

Conclusion:

Synthetic data generation holds significant potential for privacy-preserving data sharing in healthcare. Ongoing research is required to develop advanced algorithms and evaluation frameworks, ensuring synthetic data’s quality and privacy. Collaboration between technologists and policymakers is essential to create comprehensive guidelines, fostering secure and effective data sharing in healthcare.
在医疗保健中保护隐私:用于合成数据生成的深度学习方法的系统回顾。
背景:医疗保健领域的数据共享对于推进研究和个性化医疗至关重要。然而,这一过程受到与患者数据相关的隐私、道德和法律挑战的阻碍。合成数据生成是一种很有前途的解决方案,它在增强隐私保护的同时复制了真实数据的统计属性。方法:本系统综述研究了医疗保健领域合成数据生成的深度学习技术,重点关注其维护数据效用和增强隐私的能力。分析了2019年至2023年间发表的Scopus、Web of Science、PubMed和IEEE数据库中的研究。探索的关键方法包括生成对抗网络(gan),变分自编码器(VAEs)和扩散模型。评估指标包括数据相似性、实用性和隐私保护,特别关注隐私增强方法,如差分隐私和联邦学习。结果:gan和VAEs在生成表格、信号、图像和多模态数据集的真实合成数据方面表现出强大的能力。隐私保护方法,如差分隐私和对抗性训练,在保持数据保真度的同时显著降低了重新识别风险。然而,在保持时间相关性、减少偏差和与监管框架保持一致方面仍然存在挑战,特别是对于纵向和高维数据。结论:合成数据生成在医疗保健领域保护隐私的数据共享方面具有巨大潜力。正在进行的研究需要开发先进的算法和评估框架,以确保合成数据的质量和隐私。技术人员和政策制定者之间的协作对于创建全面的指导方针、促进医疗保健领域安全有效的数据共享至关重要。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computer methods and programs in biomedicine
Computer methods and programs in biomedicine 工程技术-工程:生物医学
CiteScore
12.30
自引率
6.60%
发文量
601
审稿时长
135 days
期刊介绍: To encourage the development of formal computing methods, and their application in biomedical research and medical practice, by illustration of fundamental principles in biomedical informatics research; to stimulate basic research into application software design; to report the state of research of biomedical information processing projects; to report new computer methodologies applied in biomedical areas; the eventual distribution of demonstrable software to avoid duplication of effort; to provide a forum for discussion and improvement of existing software; to optimize contact between national organizations and regional user groups by promoting an international exchange of information on formal methods, standards and software in biomedicine. Computer Methods and Programs in Biomedicine covers computing methodology and software systems derived from computing science for implementation in all aspects of biomedical research and medical practice. It is designed to serve: biochemists; biologists; geneticists; immunologists; neuroscientists; pharmacologists; toxicologists; clinicians; epidemiologists; psychiatrists; psychologists; cardiologists; chemists; (radio)physicists; computer scientists; programmers and systems analysts; biomedical, clinical, electrical and other engineers; teachers of medical informatics and users of educational software.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信