Synthetic Observational Health Data with GANs: from slow adoption to a boom in medical research and ultimately digital twins?

Jeremy Georges-Filteau, Elisa Cirillo
{"title":"Synthetic Observational Health Data with GANs: from slow adoption to a boom in medical research and ultimately digital twins?","authors":"Jeremy Georges-Filteau, Elisa Cirillo","doi":"10.22541/au.158921777.79483839/v2","DOIUrl":null,"url":null,"abstract":"\n After being collected for patient care, Observational Health Data (OHD) can further benefit patient well-being by sustaining the development of health informatics and medical research. Vast potential is unexploited because of the fiercely private nature of patient-related data and regulations to protect it.Generative Adversarial Networks (GANs) have recently emerged as a groundbreaking way to learn generative models that produce realistic synthetic data. They have revolutionized practices in multiple domains such as self-driving cars, fraud detection, digital twin simulations in industrial sectors, and medical imaging.The digital twin concept could readily apply to modelling and quantifying disease progression. In addition, GANs posses many capabilities relevant to common problems in healthcare: lack of data, class imbalance, rare diseases, and preserving privacy. Unlocking open access to privacy-preserving OHD could be transformative for scientific research. In the midst of COVID-19, the healthcare system is facing unprecedented challenges, many of which of are data related for the reasons stated above.Considering these facts, publications concerning GAN applied to OHD seemed to be severely lacking. To uncover the reasons for this slow adoption, we broadly reviewed the published literature on the subject. Our findings show that the properties of OHD were initially challenging for the existing GAN algorithms (unlike medical imaging, for which state-of-the-art model were directly transferable) and the evaluation synthetic data lacked clear metrics.We find more publications on the subject than expected, starting slowly in 2017, and since then at an increasing rate. The difficulties of OHD remain, and we discuss issues relating to evaluation, consistency, benchmarking, data modelling, and reproducibility.","PeriodicalId":8468,"journal":{"name":"arXiv: Learning","volume":"10 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2020-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv: Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.22541/au.158921777.79483839/v2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

Abstract

After being collected for patient care, Observational Health Data (OHD) can further benefit patient well-being by sustaining the development of health informatics and medical research. Vast potential is unexploited because of the fiercely private nature of patient-related data and regulations to protect it.Generative Adversarial Networks (GANs) have recently emerged as a groundbreaking way to learn generative models that produce realistic synthetic data. They have revolutionized practices in multiple domains such as self-driving cars, fraud detection, digital twin simulations in industrial sectors, and medical imaging.The digital twin concept could readily apply to modelling and quantifying disease progression. In addition, GANs posses many capabilities relevant to common problems in healthcare: lack of data, class imbalance, rare diseases, and preserving privacy. Unlocking open access to privacy-preserving OHD could be transformative for scientific research. In the midst of COVID-19, the healthcare system is facing unprecedented challenges, many of which of are data related for the reasons stated above.Considering these facts, publications concerning GAN applied to OHD seemed to be severely lacking. To uncover the reasons for this slow adoption, we broadly reviewed the published literature on the subject. Our findings show that the properties of OHD were initially challenging for the existing GAN algorithms (unlike medical imaging, for which state-of-the-art model were directly transferable) and the evaluation synthetic data lacked clear metrics.We find more publications on the subject than expected, starting slowly in 2017, and since then at an increasing rate. The difficulties of OHD remain, and we discuss issues relating to evaluation, consistency, benchmarking, data modelling, and reproducibility.
使用gan的综合观察性健康数据:从缓慢采用到医学研究的繁荣,最终成为数字双胞胎?
观察性健康数据(OHD)被收集用于患者护理后,可以通过维持健康信息学和医学研究的发展,进一步造福患者福祉。由于与患者有关的数据具有极强的私密性,而且有保护这些数据的法规,因此巨大的潜力尚未得到开发。生成对抗网络(GANs)最近作为一种突破性的方式出现,用于学习生成模型,产生真实的合成数据。它们在自动驾驶汽车、欺诈检测、工业部门的数字孪生模拟和医疗成像等多个领域带来了革命性的实践。数字孪生概念可以很容易地应用于疾病进展的建模和量化。此外,gan还具有许多与医疗保健中的常见问题相关的功能:缺乏数据、类别不平衡、罕见疾病和保护隐私。开放访问保护隐私的OHD可能会对科学研究产生革命性的影响。在2019冠状病毒病期间,医疗保健系统面临着前所未有的挑战,其中许多挑战与上述原因有关。考虑到这些事实,有关GAN应用于OHD的出版物似乎严重缺乏。为了揭示这种缓慢采用的原因,我们广泛地回顾了有关该主题的已发表文献。我们的研究结果表明,OHD的特性最初对现有的GAN算法具有挑战性(不像医学成像,最先进的模型可以直接转移),评估合成数据缺乏明确的指标。我们发现关于这一主题的出版物比预期的要多,从2017年开始缓慢增加,从那以后增加的速度越来越快。OHD的困难仍然存在,我们讨论了与评估、一致性、基准、数据建模和可重复性有关的问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信