在药物遗传学合成数据生成中使用和评估 GANs。

Dominic Aeschbacher, Jessica Meisner, Marko Miletic, Murat Sariyar
{"title":"在药物遗传学合成数据生成中使用和评估 GANs。","authors":"Dominic Aeschbacher, Jessica Meisner, Marko Miletic, Murat Sariyar","doi":"10.3233/SHTI241100","DOIUrl":null,"url":null,"abstract":"<p><p>Pharmacogenetics (PGx) explores the influence of genetic variability on drug efficacy and tolerability. Synthetic Data Generation (SDG) has emerged as a promising alternative to the labor-intensive process of collecting real-world PGx data, which is required for high-qualitative prediction models. This study investigates the performance of two Generative Adversarial Network (GAN) models, CTGAN and CTAB-GAN+, in generating synthetic PGx data. The benchmarking is based on utility metrics (Hellinger distance and Random Forest accuracy) and ϵ-identifiability. Results demonstrate that synthetic data generated by CTAB-GAN+ can surpass the original dataset in terms of utility. For instance, CTAB-GAN+ achieves higher Random Forest accuracy compared to the original data, indicating better predictive performance. These improvements suggest that synthetic data not only capture the essential patterns of the original data but also enhance model generalization and prediction capabilities, providing a more robust training ground for machine learning models. Consequently, SDG offers a promising solution to address data scarcity and imbalance in pharmacogenetic research.</p>","PeriodicalId":94357,"journal":{"name":"Studies in health technology and informatics","volume":"321 ","pages":"240-244"},"PeriodicalIF":0.0000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Use and Evaluation of GANs for Synthetic Data Generation in Pharmacogenetics.\",\"authors\":\"Dominic Aeschbacher, Jessica Meisner, Marko Miletic, Murat Sariyar\",\"doi\":\"10.3233/SHTI241100\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Pharmacogenetics (PGx) explores the influence of genetic variability on drug efficacy and tolerability. Synthetic Data Generation (SDG) has emerged as a promising alternative to the labor-intensive process of collecting real-world PGx data, which is required for high-qualitative prediction models. This study investigates the performance of two Generative Adversarial Network (GAN) models, CTGAN and CTAB-GAN+, in generating synthetic PGx data. The benchmarking is based on utility metrics (Hellinger distance and Random Forest accuracy) and ϵ-identifiability. Results demonstrate that synthetic data generated by CTAB-GAN+ can surpass the original dataset in terms of utility. For instance, CTAB-GAN+ achieves higher Random Forest accuracy compared to the original data, indicating better predictive performance. These improvements suggest that synthetic data not only capture the essential patterns of the original data but also enhance model generalization and prediction capabilities, providing a more robust training ground for machine learning models. Consequently, SDG offers a promising solution to address data scarcity and imbalance in pharmacogenetic research.</p>\",\"PeriodicalId\":94357,\"journal\":{\"name\":\"Studies in health technology and informatics\",\"volume\":\"321 \",\"pages\":\"240-244\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-11-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Studies in health technology and informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3233/SHTI241100\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Studies in health technology and informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/SHTI241100","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

药物遗传学(PGx)探索基因变异对药物疗效和耐受性的影响。合成数据生成(SDG)是收集真实世界 PGx 数据这一劳动密集型过程的一种有前途的替代方法,而收集真实世界 PGx 数据是建立高质量预测模型所必需的。本研究调查了 CTGAN 和 CTAB-GAN+ 这两种生成对抗网络 (GAN) 模型在生成合成 PGx 数据方面的性能。基准测试基于实用性指标(海林格距离和随机森林准确度)和ϵ可识别性。结果表明,CTAB-GAN+ 生成的合成数据在实用性方面超过了原始数据集。例如,与原始数据相比,CTAB-GAN+ 获得了更高的随机森林准确率,这表明它具有更好的预测性能。这些改进表明,合成数据不仅能捕捉原始数据的基本模式,还能增强模型的泛化和预测能力,为机器学习模型提供更强大的训练场。因此,SDG 为解决药物遗传学研究中的数据稀缺和不平衡问题提供了一种前景广阔的解决方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Use and Evaluation of GANs for Synthetic Data Generation in Pharmacogenetics.

Pharmacogenetics (PGx) explores the influence of genetic variability on drug efficacy and tolerability. Synthetic Data Generation (SDG) has emerged as a promising alternative to the labor-intensive process of collecting real-world PGx data, which is required for high-qualitative prediction models. This study investigates the performance of two Generative Adversarial Network (GAN) models, CTGAN and CTAB-GAN+, in generating synthetic PGx data. The benchmarking is based on utility metrics (Hellinger distance and Random Forest accuracy) and ϵ-identifiability. Results demonstrate that synthetic data generated by CTAB-GAN+ can surpass the original dataset in terms of utility. For instance, CTAB-GAN+ achieves higher Random Forest accuracy compared to the original data, indicating better predictive performance. These improvements suggest that synthetic data not only capture the essential patterns of the original data but also enhance model generalization and prediction capabilities, providing a more robust training ground for machine learning models. Consequently, SDG offers a promising solution to address data scarcity and imbalance in pharmacogenetic research.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信