Synthetic data: How could it be used for infectious disease research?

Styliani-Christina Fragkouli, Dhwani Solanki, Leyla J Castro, Fotis E Psomopoulos, Núria Queralt-Rosinach, Davide Cirillo, Lisa C Crossman
{"title":"Synthetic data: How could it be used for infectious disease research?","authors":"Styliani-Christina Fragkouli, Dhwani Solanki, Leyla J Castro, Fotis E Psomopoulos, Núria Queralt-Rosinach, Davide Cirillo, Lisa C Crossman","doi":"arxiv-2407.06211","DOIUrl":null,"url":null,"abstract":"Over the last three to five years, it has become possible to generate machine\nlearning synthetic data for healthcare-related uses. However, concerns have\nbeen raised about potential negative factors associated with the possibilities\nof artificial dataset generation. These include the potential misuse of\ngenerative artificial intelligence (AI) in fields such as cybercrime, the use\nof deepfakes and fake news to deceive or manipulate, and displacement of human\njobs across various market sectors. Here, we consider both current and future positive advances and possibilities\nwith synthetic datasets. Synthetic data offers significant benefits,\nparticularly in data privacy, research, in balancing datasets and reducing bias\nin machine learning models. Generative AI is an artificial intelligence genre\ncapable of creating text, images, video or other data using generative models.\nThe recent explosion of interest in GenAI was heralded by the invention and\nspeedy move to use of large language models (LLM). These computational models\nare able to achieve general-purpose language generation and other natural\nlanguage processing tasks and are based on transformer architectures, which\nmade an evolutionary leap from previous neural network architectures. Fuelled by the advent of improved GenAI techniques and wide scale usage, this\nis surely the time to consider how synthetic data can be used to advance\ninfectious disease research. In this commentary we aim to create an overview of\nthe current and future position of synthetic data in infectious disease\nresearch.","PeriodicalId":501219,"journal":{"name":"arXiv - QuanBio - Other Quantitative Biology","volume":"22 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Other Quantitative Biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.06211","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Over the last three to five years, it has become possible to generate machine learning synthetic data for healthcare-related uses. However, concerns have been raised about potential negative factors associated with the possibilities of artificial dataset generation. These include the potential misuse of generative artificial intelligence (AI) in fields such as cybercrime, the use of deepfakes and fake news to deceive or manipulate, and displacement of human jobs across various market sectors. Here, we consider both current and future positive advances and possibilities with synthetic datasets. Synthetic data offers significant benefits, particularly in data privacy, research, in balancing datasets and reducing bias in machine learning models. Generative AI is an artificial intelligence genre capable of creating text, images, video or other data using generative models. The recent explosion of interest in GenAI was heralded by the invention and speedy move to use of large language models (LLM). These computational models are able to achieve general-purpose language generation and other natural language processing tasks and are based on transformer architectures, which made an evolutionary leap from previous neural network architectures. Fuelled by the advent of improved GenAI techniques and wide scale usage, this is surely the time to consider how synthetic data can be used to advance infectious disease research. In this commentary we aim to create an overview of the current and future position of synthetic data in infectious disease research.
合成数据:如何将其用于传染病研究?
在过去的三到五年中,为医疗保健相关用途生成机器学习合成数据已成为可能。然而,与人工生成数据集的可能性相关的潜在负面因素也引起了人们的关注。这些因素包括生成人工智能(AI)在网络犯罪等领域的潜在滥用、利用深度伪造和假新闻进行欺骗或操纵,以及在各个市场领域取代人类工作。在此,我们将探讨当前和未来合成数据集的积极进展和可能性。合成数据具有显著优势,特别是在数据隐私、研究、平衡数据集和减少机器学习模型偏差方面。生成式人工智能(Genative AI)是一种能够使用生成模型创建文本、图像、视频或其他数据的人工智能类型。这些计算模型能够实现通用语言生成和其他自然语言处理任务,并以变压器架构为基础,与以前的神经网络架构相比实现了飞跃。随着 GenAI 技术的改进和广泛应用,现在肯定是考虑如何利用合成数据推进传染病研究的时候了。在这篇评论中,我们旨在概述合成数据在传染病研究中的当前和未来地位。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信