当合成数据与临床药理学相遇时，"是 "或 "不是"：药物遗传学重点研究。

IF 3.1 3区医学 Q2 PHARMACOLOGY & PHARMACY

CPT: Pharmacometrics & Systems Pharmacology Pub Date : 2024-10-16 DOI:10.1002/psp4.13240

Jean-Baptiste Woillard, Clément Benoist, Alexandre Destere, Marc Labriffe, Giulia Marchello, Julie Josse, Pierre Marquet

{"title":"当合成数据与临床药理学相遇时，\"是 \"或 \"不是\"：药物遗传学重点研究。","authors":"Jean-Baptiste Woillard, Clément Benoist, Alexandre Destere, Marc Labriffe, Giulia Marchello, Julie Josse, Pierre Marquet","doi":"10.1002/psp4.13240","DOIUrl":null,"url":null,"abstract":"The use of synthetic data in pharmacology research has gained significant attention due to its potential to address privacy concerns and promote open science. In this study, we implemented and compared three synthetic data generation methods, CT-GAN, TVAE, and a simplified implementation of Avatar, for a previously published pharmacogenetic dataset of 253 patients with one measurement per patient (non-longitudinal). The aim of this study was to evaluate the performance of these methods in terms of data utility and privacy trade off. Our results showed that CT-GAN and Avatar used with k = 10 (number of patients used to create the local model of generation) had the best overall performance in terms of data utility and privacy preservation. However, the TVAE method showed a relatively lower level of performance in these aspects. In terms of Hazard ratio estimation, Avatar with k = 10 produced HR estimates closest to the original data, whereas CT-GAN slightly underestimated the HR and TVAE showed the most significant deviation from the original HR. We also investigated the effect of applying the algorithms multiple times to improve results stability in terms of HR estimation. Our findings suggested that this approach could be beneficial, especially in the case of small datasets, to achieve more reliable and robust results. In conclusion, our study provides valuable insights into the performance of CT-GAN, TVAE, and Avatar methods for synthetic data generation in pharmacogenetic research. The application to other type of data and analyses (data driven) used in pharmacology should be further investigated.","PeriodicalId":10774,"journal":{"name":"CPT: Pharmacometrics & Systems Pharmacology","volume":"14 1","pages":"82-94"},"PeriodicalIF":3.1000,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11706419/pdf/","citationCount":"0","resultStr":"{\"title\":\"To be or not to be, when synthetic data meet clinical pharmacology: A focused study on pharmacogenetics\",\"authors\":\"Jean-Baptiste Woillard, Clément Benoist, Alexandre Destere, Marc Labriffe, Giulia Marchello, Julie Josse, Pierre Marquet\",\"doi\":\"10.1002/psp4.13240\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The use of synthetic data in pharmacology research has gained significant attention due to its potential to address privacy concerns and promote open science. In this study, we implemented and compared three synthetic data generation methods, CT-GAN, TVAE, and a simplified implementation of Avatar, for a previously published pharmacogenetic dataset of 253 patients with one measurement per patient (non-longitudinal). The aim of this study was to evaluate the performance of these methods in terms of data utility and privacy trade off. Our results showed that CT-GAN and Avatar used with k = 10 (number of patients used to create the local model of generation) had the best overall performance in terms of data utility and privacy preservation. However, the TVAE method showed a relatively lower level of performance in these aspects. In terms of Hazard ratio estimation, Avatar with k = 10 produced HR estimates closest to the original data, whereas CT-GAN slightly underestimated the HR and TVAE showed the most significant deviation from the original HR. We also investigated the effect of applying the algorithms multiple times to improve results stability in terms of HR estimation. Our findings suggested that this approach could be beneficial, especially in the case of small datasets, to achieve more reliable and robust results. In conclusion, our study provides valuable insights into the performance of CT-GAN, TVAE, and Avatar methods for synthetic data generation in pharmacogenetic research. The application to other type of data and analyses (data driven) used in pharmacology should be further investigated.\",\"PeriodicalId\":10774,\"journal\":{\"name\":\"CPT: Pharmacometrics & Systems Pharmacology\",\"volume\":\"14 1\",\"pages\":\"82-94\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2024-10-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11706419/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"CPT: Pharmacometrics & Systems Pharmacology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/psp4.13240\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"PHARMACOLOGY & PHARMACY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"CPT: Pharmacometrics & Systems Pharmacology","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/psp4.13240","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PHARMACOLOGY & PHARMACY","Score":null,"Total":0}

引用次数: 0

摘要

在药理学研究中使用合成数据因其在解决隐私问题和促进开放科学方面的潜力而备受关注。在本研究中，我们针对之前发表的 253 位患者的药物遗传学数据集，实施并比较了三种合成数据生成方法：CT-GAN、TVAE 和 Avatar 的简化实施，每位患者只需进行一次测量（非纵向）。本研究的目的是评估这些方法在数据效用和隐私权衡方面的性能。结果表明，在 k = 10（用于创建局部生成模型的患者人数）条件下使用的 CT-GAN 和 Avatar 在数据效用和隐私保护方面的整体性能最佳。然而，TVAE 方法在这些方面的表现相对较差。在危险比估计方面，k = 10 的 Avatar 得出的心率估计值最接近原始数据，而 CT-GAN 则略微低估了心率，TVAE 与原始心率的偏差最大。我们还研究了多次应用算法的效果，以提高心率估计结果的稳定性。我们的研究结果表明，这种方法可以获得更可靠、更稳健的结果，尤其是在数据集较小的情况下。总之，我们的研究为药物遗传学研究中合成数据生成的 CT-GAN、TVAE 和 Avatar 方法的性能提供了宝贵的见解。我们应该进一步研究这些方法在药理学中其他类型数据和分析（数据驱动）中的应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

To be or not to be, when synthetic data meet clinical pharmacology: A focused study on pharmacogenetics

查看原文本刊更多论文

To be or not to be, when synthetic data meet clinical pharmacology: A focused study on pharmacogenetics

The use of synthetic data in pharmacology research has gained significant attention due to its potential to address privacy concerns and promote open science. In this study, we implemented and compared three synthetic data generation methods, CT-GAN, TVAE, and a simplified implementation of Avatar, for a previously published pharmacogenetic dataset of 253 patients with one measurement per patient (non-longitudinal). The aim of this study was to evaluate the performance of these methods in terms of data utility and privacy trade off. Our results showed that CT-GAN and Avatar used with k = 10 (number of patients used to create the local model of generation) had the best overall performance in terms of data utility and privacy preservation. However, the TVAE method showed a relatively lower level of performance in these aspects. In terms of Hazard ratio estimation, Avatar with k = 10 produced HR estimates closest to the original data, whereas CT-GAN slightly underestimated the HR and TVAE showed the most significant deviation from the original HR. We also investigated the effect of applying the algorithms multiple times to improve results stability in terms of HR estimation. Our findings suggested that this approach could be beneficial, especially in the case of small datasets, to achieve more reliable and robust results. In conclusion, our study provides valuable insights into the performance of CT-GAN, TVAE, and Avatar methods for synthetic data generation in pharmacogenetic research. The application to other type of data and analyses (data driven) used in pharmacology should be further investigated.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊