生物医学研究合成数据生产。

IF 2.1 Q3 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH
Yun Gyeong Lee, Mi-Sook Kwak, Jeong Eun Kim, Min Sun Kim, Dong Un No, Hee Youl Chai
{"title":"生物医学研究合成数据生产。","authors":"Yun Gyeong Lee, Mi-Sook Kwak, Jeong Eun Kim, Min Sun Kim, Dong Un No, Hee Youl Chai","doi":"10.24171/j.phrp.2024.0335","DOIUrl":null,"url":null,"abstract":"<p><p>Synthetic data, generated using advanced artificial intelligence (AI) techniques, replicates the statistical properties of real-world datasets while excluding identifiable information. Although synthetic data does not consist of actual data points, it is derived from original datasets, thereby enabling analyses that yield results comparable to those obtained with real data. Synthetic datasets are evaluated based on their utility-a measure of how effectively they mirror real data for analytical purposes. This paper presents the generation of synthetic datasets through the Healthcare Big Data Showcase Project (2019-2023). The original dataset comprises comprehensive multi-omics data from 400 individuals, including cancer survivors, chronic disease patients, and healthy participants. Synthetic data facilitates efficient access and robust analyses, serving as a practical tool for research and education. It addresses privacy concerns, supports AI research, and provides a foundation for innovative applications across diverse fields, such as public health and precision medicine.</p>","PeriodicalId":38949,"journal":{"name":"Osong Public Health and Research Perspectives","volume":"16 2","pages":"94-99"},"PeriodicalIF":2.1000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12066231/pdf/","citationCount":"0","resultStr":"{\"title\":\"Synthetic data production for biomedical research.\",\"authors\":\"Yun Gyeong Lee, Mi-Sook Kwak, Jeong Eun Kim, Min Sun Kim, Dong Un No, Hee Youl Chai\",\"doi\":\"10.24171/j.phrp.2024.0335\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Synthetic data, generated using advanced artificial intelligence (AI) techniques, replicates the statistical properties of real-world datasets while excluding identifiable information. Although synthetic data does not consist of actual data points, it is derived from original datasets, thereby enabling analyses that yield results comparable to those obtained with real data. Synthetic datasets are evaluated based on their utility-a measure of how effectively they mirror real data for analytical purposes. This paper presents the generation of synthetic datasets through the Healthcare Big Data Showcase Project (2019-2023). The original dataset comprises comprehensive multi-omics data from 400 individuals, including cancer survivors, chronic disease patients, and healthy participants. Synthetic data facilitates efficient access and robust analyses, serving as a practical tool for research and education. It addresses privacy concerns, supports AI research, and provides a foundation for innovative applications across diverse fields, such as public health and precision medicine.</p>\",\"PeriodicalId\":38949,\"journal\":{\"name\":\"Osong Public Health and Research Perspectives\",\"volume\":\"16 2\",\"pages\":\"94-99\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2025-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12066231/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Osong Public Health and Research Perspectives\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.24171/j.phrp.2024.0335\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/4/22 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Osong Public Health and Research Perspectives","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.24171/j.phrp.2024.0335","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/22 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0

摘要

使用先进的人工智能(AI)技术生成的合成数据复制了现实世界数据集的统计属性,同时排除了可识别的信息。虽然合成数据不包括实际数据点,但它来自原始数据集,从而使分析产生的结果与使用真实数据获得的结果相当。对合成数据集的评估是基于它们的效用——一种衡量它们如何有效地反映真实数据以用于分析目的的度量。本文介绍了通过医疗保健大数据展示项目(2019-2023)生成合成数据集的情况。原始数据集包括来自400个人的综合多组学数据,包括癌症幸存者、慢性病患者和健康参与者。综合数据有助于有效获取和可靠分析,可作为研究和教育的实用工具。它解决了隐私问题,支持人工智能研究,并为公共卫生和精准医疗等不同领域的创新应用奠定了基础。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Synthetic data production for biomedical research.

Synthetic data, generated using advanced artificial intelligence (AI) techniques, replicates the statistical properties of real-world datasets while excluding identifiable information. Although synthetic data does not consist of actual data points, it is derived from original datasets, thereby enabling analyses that yield results comparable to those obtained with real data. Synthetic datasets are evaluated based on their utility-a measure of how effectively they mirror real data for analytical purposes. This paper presents the generation of synthetic datasets through the Healthcare Big Data Showcase Project (2019-2023). The original dataset comprises comprehensive multi-omics data from 400 individuals, including cancer survivors, chronic disease patients, and healthy participants. Synthetic data facilitates efficient access and robust analyses, serving as a practical tool for research and education. It addresses privacy concerns, supports AI research, and provides a foundation for innovative applications across diverse fields, such as public health and precision medicine.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Osong Public Health and Research Perspectives
Osong Public Health and Research Perspectives Medicine-Public Health, Environmental and Occupational Health
CiteScore
10.30
自引率
2.30%
发文量
44
审稿时长
16 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信