迫切需要加快医学研究的合成数据隐私框架。

IF 23.8 1区 医学 Q1 MEDICAL INFORMATICS
Anmol Arora MBBChir MA , Siegfried Karl Wagner PhD FRCOphth , Robin Carpenter BSc , Rajesh Jena MD , Pearse A Keane MD FRCOphth
{"title":"迫切需要加快医学研究的合成数据隐私框架。","authors":"Anmol Arora MBBChir MA ,&nbsp;Siegfried Karl Wagner PhD FRCOphth ,&nbsp;Robin Carpenter BSc ,&nbsp;Rajesh Jena MD ,&nbsp;Pearse A Keane MD FRCOphth","doi":"10.1016/S2589-7500(24)00196-1","DOIUrl":null,"url":null,"abstract":"<div><div>Synthetic data, generated through artificial intelligence technologies such as generative adversarial networks and latent diffusion models, maintain aggregate patterns and relationships present in the real data the technologies were trained on without exposing individual identities, thereby mitigating re-identification risks. This approach has been gaining traction in biomedical research because of its ability to preserve privacy and enable dataset sharing between organisations. Although the use of synthetic data has become widespread in other domains, such as finance and high-energy physics, use in medical research raises novel issues. The use of synthetic data as a method of preserving the privacy of data used to train models requires that the data are high fidelity with the original data to preserve utility, but must be sufficiently different as to protect against adversarial or accidental re-identification. There is a need for the development of standards for synthetic data generation and consensus standards for its evaluation. As synthetic data applications expand, ongoing legal and ethical evaluations are crucial to ensure that they remain a secure and effective tool for advancing medical research without compromising individual privacy.</div></div>","PeriodicalId":48534,"journal":{"name":"Lancet Digital Health","volume":"7 2","pages":"Pages e157-e160"},"PeriodicalIF":23.8000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The urgent need to accelerate synthetic data privacy frameworks for medical research\",\"authors\":\"Anmol Arora MBBChir MA ,&nbsp;Siegfried Karl Wagner PhD FRCOphth ,&nbsp;Robin Carpenter BSc ,&nbsp;Rajesh Jena MD ,&nbsp;Pearse A Keane MD FRCOphth\",\"doi\":\"10.1016/S2589-7500(24)00196-1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Synthetic data, generated through artificial intelligence technologies such as generative adversarial networks and latent diffusion models, maintain aggregate patterns and relationships present in the real data the technologies were trained on without exposing individual identities, thereby mitigating re-identification risks. This approach has been gaining traction in biomedical research because of its ability to preserve privacy and enable dataset sharing between organisations. Although the use of synthetic data has become widespread in other domains, such as finance and high-energy physics, use in medical research raises novel issues. The use of synthetic data as a method of preserving the privacy of data used to train models requires that the data are high fidelity with the original data to preserve utility, but must be sufficiently different as to protect against adversarial or accidental re-identification. There is a need for the development of standards for synthetic data generation and consensus standards for its evaluation. As synthetic data applications expand, ongoing legal and ethical evaluations are crucial to ensure that they remain a secure and effective tool for advancing medical research without compromising individual privacy.</div></div>\",\"PeriodicalId\":48534,\"journal\":{\"name\":\"Lancet Digital Health\",\"volume\":\"7 2\",\"pages\":\"Pages e157-e160\"},\"PeriodicalIF\":23.8000,\"publicationDate\":\"2025-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Lancet Digital Health\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2589750024001961\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MEDICAL INFORMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Lancet Digital Health","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2589750024001961","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0

摘要

通过生成式对抗网络和潜在扩散模型等人工智能技术生成的合成数据,可以在不暴露个人身份的情况下,保持这些技术所训练的真实数据中存在的总体模式和关系,从而降低重新识别风险。由于这种方法能够保护隐私并实现组织间的数据集共享,因此在生物医学研究中越来越受到重视。虽然合成数据的使用在金融和高能物理等其他领域已经非常普遍,但在医学研究中的使用却带来了新的问题。使用合成数据作为保护用于训练模型的数据隐私的一种方法,要求数据与原始数据具有高保真性,以保持实用性,但必须有足够的差异,以防止对抗性或意外的重新识别。有必要制定合成数据生成标准和评估标准。随着合成数据应用的不断扩大,持续的法律和伦理评估对于确保合成数据在不损害个人隐私的情况下继续成为推进医学研究的安全有效工具至关重要。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
The urgent need to accelerate synthetic data privacy frameworks for medical research
Synthetic data, generated through artificial intelligence technologies such as generative adversarial networks and latent diffusion models, maintain aggregate patterns and relationships present in the real data the technologies were trained on without exposing individual identities, thereby mitigating re-identification risks. This approach has been gaining traction in biomedical research because of its ability to preserve privacy and enable dataset sharing between organisations. Although the use of synthetic data has become widespread in other domains, such as finance and high-energy physics, use in medical research raises novel issues. The use of synthetic data as a method of preserving the privacy of data used to train models requires that the data are high fidelity with the original data to preserve utility, but must be sufficiently different as to protect against adversarial or accidental re-identification. There is a need for the development of standards for synthetic data generation and consensus standards for its evaluation. As synthetic data applications expand, ongoing legal and ethical evaluations are crucial to ensure that they remain a secure and effective tool for advancing medical research without compromising individual privacy.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
41.20
自引率
1.60%
发文量
232
审稿时长
13 weeks
期刊介绍: The Lancet Digital Health publishes important, innovative, and practice-changing research on any topic connected with digital technology in clinical medicine, public health, and global health. The journal’s open access content crosses subject boundaries, building bridges between health professionals and researchers.By bringing together the most important advances in this multidisciplinary field,The Lancet Digital Health is the most prominent publishing venue in digital health. We publish a range of content types including Articles,Review, Comment, and Correspondence, contributing to promoting digital technologies in health practice worldwide.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信