TrialSynth: Generation of Synthetic Sequential Clinical Trial Data

arXiv - CS - Machine Learning Pub Date : 2024-09-11 DOI:arxiv-2409.07089

Chufan Gao, Mandis Beigi, Afrah Shafquat, Jacob Aptekar, Jimeng Sun

{"title":"TrialSynth: Generation of Synthetic Sequential Clinical Trial Data","authors":"Chufan Gao, Mandis Beigi, Afrah Shafquat, Jacob Aptekar, Jimeng Sun","doi":"arxiv-2409.07089","DOIUrl":null,"url":null,"abstract":"Analyzing data from past clinical trials is part of the ongoing effort to\noptimize the design, implementation, and execution of new clinical trials and\nmore efficiently bring life-saving interventions to market. While there have\nbeen recent advances in the generation of static context synthetic clinical\ntrial data, due to both limited patient availability and constraints imposed by\npatient privacy needs, the generation of fine-grained synthetic time-sequential\nclinical trial data has been challenging. Given that patient trajectories over\nan entire clinical trial are of high importance for optimizing trial design and\nefforts to prevent harmful adverse events, there is a significant need for the\ngeneration of high-fidelity time-sequence clinical trial data. Here we\nintroduce TrialSynth, a Variational Autoencoder (VAE) designed to address the\nspecific challenges of generating synthetic time-sequence clinical trial data.\nDistinct from related clinical data VAE methods, the core of our method\nleverages Hawkes Processes (HP), which are particularly well-suited for\nmodeling event-type and time gap prediction needed to capture the structure of\nsequential clinical trial data. Our experiments demonstrate that TrialSynth\nsurpasses the performance of other comparable methods that can generate\nsequential clinical trial data, in terms of both fidelity and in enabling the\ngeneration of highly accurate event sequences across multiple real-world\nsequential event datasets with small patient source populations when using\nminimal external information. Notably, our empirical findings highlight that\nTrialSynth not only outperforms existing clinical sequence-generating methods\nbut also produces data with superior utility while empirically preserving\npatient privacy.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"113 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07089","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Analyzing data from past clinical trials is part of the ongoing effort to optimize the design, implementation, and execution of new clinical trials and more efficiently bring life-saving interventions to market. While there have been recent advances in the generation of static context synthetic clinical trial data, due to both limited patient availability and constraints imposed by patient privacy needs, the generation of fine-grained synthetic time-sequential clinical trial data has been challenging. Given that patient trajectories over an entire clinical trial are of high importance for optimizing trial design and efforts to prevent harmful adverse events, there is a significant need for the generation of high-fidelity time-sequence clinical trial data. Here we introduce TrialSynth, a Variational Autoencoder (VAE) designed to address the specific challenges of generating synthetic time-sequence clinical trial data. Distinct from related clinical data VAE methods, the core of our method leverages Hawkes Processes (HP), which are particularly well-suited for modeling event-type and time gap prediction needed to capture the structure of sequential clinical trial data. Our experiments demonstrate that TrialSynth surpasses the performance of other comparable methods that can generate sequential clinical trial data, in terms of both fidelity and in enabling the generation of highly accurate event sequences across multiple real-world sequential event datasets with small patient source populations when using minimal external information. Notably, our empirical findings highlight that TrialSynth not only outperforms existing clinical sequence-generating methods but also produces data with superior utility while empirically preserving patient privacy.

查看原文本刊更多论文

TrialSynth：生成合成序列临床试验数据

分析过去临床试验的数据是优化新临床试验的设计、实施和执行以及更有效地将救生干预措施推向市场的持续努力的一部分。虽然最近在生成静态背景合成临床试验数据方面取得了进展，但由于患者可用性有限以及患者隐私需求的限制，生成细粒度合成时序临床试验数据一直是个挑战。鉴于患者在整个临床试验过程中的轨迹对于优化试验设计和努力预防有害不良事件非常重要，因此非常需要生成高保真时序临床试验数据。有别于相关的临床数据 VAE 方法，我们方法的核心是利用霍克斯过程（Hawkes Processes，HP），HP 特别适合对事件类型和时间间隙进行建模预测，以捕捉连续临床试验数据的结构。我们的实验证明，TrialSynths 超越了其他可生成连续临床试验数据的同类方法，无论是在保真度方面，还是在使用最少的外部信息在多个真实世界连续事件数据集上生成高精度事件序列方面，都是如此。值得注意的是，我们的实证研究结果表明，TrialSynth 不仅在性能上优于现有的临床序列生成方法，而且还能生成具有卓越实用性的数据，同时根据经验保护了患者的隐私。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Machine Learning

自引率

0.00%

发文量