Synthetic neurosurgical data generation with generative adversarial networks and large language models:an investigation on fidelity, utility, and privacy.

IF 3 2区医学 Q2 CLINICAL NEUROLOGY

Neurosurgical focus Pub Date : 2025-07-01 DOI:10.3171/2025.4.FOCUS25225

Austin A Barr, Eddie Guo, Brij S Karmur, Emre Sezgin

{"title":"Synthetic neurosurgical data generation with generative adversarial networks and large language models:an investigation on fidelity, utility, and privacy.","authors":"Austin A Barr, Eddie Guo, Brij S Karmur, Emre Sezgin","doi":"10.3171/2025.4.FOCUS25225","DOIUrl":null,"url":null,"abstract":"Objective: Use of neurosurgical data for clinical research and machine learning (ML) model development is often limited by data availability, sample sizes, and regulatory constraints. Synthetic data offer a potential solution to challenges associated with accessing, sharing, and using real-world data (RWD). The aim of this study was to evaluate the capability of generating synthetic neurosurgical data with a generative adversarial network and large language model (LLM) to augment RWD, perform secondary analyses in place of RWD, and train an ML model to predict postoperative outcomes.Methods: Synthetic data were generated with a conditional tabular generative adversarial network (CTGAN) and the LLM GPT-4o based on a real-world neurosurgical dataset of 140 older adults who underwent neurosurgical interventions. Each model was used to generate datasets at equivalent (n = 140) and amplified (n = 1000) sample sizes. Data fidelity was evaluated by comparing univariate and bivariate statistics to the RWD. Privacy evaluation involved measuring the uniqueness of generated synthetic records. Utility was assessed by: 1) reproducing and extending clinical analyses on predictors of Karnofsky Performance Status (KPS) deterioration at discharge and a prolonged postoperative intensive care unit (ICU) stay, and 2) training a binary ML classifier on amplified synthetic datasets to predict KPS deterioration on RWD.Results: Both the CTGAN and GPT-4o generated complete, high-fidelity synthetic tabular datasets. GPT-4o matched or exceeded CTGAN across all measured fidelity, utility, and privacy metrics. All significant clinical predictors of KPS deterioration and prolonged ICU stay were retained in the GPT-4o-generated synthetic data, with some differences observed in effect sizes. Preoperative KPS was not preserved as a significant predictor in the CTGAN-generated data. The ML classifier trained on GPT-4o data outperformed the model trained on CTGAN data, achieving a higher F1 score (0.725 vs 0.688) for predicting KPS deterioration.Conclusions: This study demonstrated a promising ability to produce high-fidelity synthetic neurosurgical data using generative models. Synthetic neurosurgical data present a potential solution to critical limitations in data availability for neurosurgical research. Further investigation is necessary to enhance synthetic data utility for secondary analyses and ML model training, and to evaluate synthetic data generation methods across other datasets, including clinical trial data.","PeriodicalId":19187,"journal":{"name":"Neurosurgical focus","volume":"59 1","pages":"E17"},"PeriodicalIF":3.0000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurosurgical focus","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3171/2025.4.FOCUS25225","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Objective: Use of neurosurgical data for clinical research and machine learning (ML) model development is often limited by data availability, sample sizes, and regulatory constraints. Synthetic data offer a potential solution to challenges associated with accessing, sharing, and using real-world data (RWD). The aim of this study was to evaluate the capability of generating synthetic neurosurgical data with a generative adversarial network and large language model (LLM) to augment RWD, perform secondary analyses in place of RWD, and train an ML model to predict postoperative outcomes.

Methods: Synthetic data were generated with a conditional tabular generative adversarial network (CTGAN) and the LLM GPT-4o based on a real-world neurosurgical dataset of 140 older adults who underwent neurosurgical interventions. Each model was used to generate datasets at equivalent (n = 140) and amplified (n = 1000) sample sizes. Data fidelity was evaluated by comparing univariate and bivariate statistics to the RWD. Privacy evaluation involved measuring the uniqueness of generated synthetic records. Utility was assessed by: 1) reproducing and extending clinical analyses on predictors of Karnofsky Performance Status (KPS) deterioration at discharge and a prolonged postoperative intensive care unit (ICU) stay, and 2) training a binary ML classifier on amplified synthetic datasets to predict KPS deterioration on RWD.

Results: Both the CTGAN and GPT-4o generated complete, high-fidelity synthetic tabular datasets. GPT-4o matched or exceeded CTGAN across all measured fidelity, utility, and privacy metrics. All significant clinical predictors of KPS deterioration and prolonged ICU stay were retained in the GPT-4o-generated synthetic data, with some differences observed in effect sizes. Preoperative KPS was not preserved as a significant predictor in the CTGAN-generated data. The ML classifier trained on GPT-4o data outperformed the model trained on CTGAN data, achieving a higher F1 score (0.725 vs 0.688) for predicting KPS deterioration.

Conclusions: This study demonstrated a promising ability to produce high-fidelity synthetic neurosurgical data using generative models. Synthetic neurosurgical data present a potential solution to critical limitations in data availability for neurosurgical research. Further investigation is necessary to enhance synthetic data utility for secondary analyses and ML model training, and to evaluate synthetic data generation methods across other datasets, including clinical trial data.

查看原文本刊更多论文

合成神经外科数据生成与生成对抗网络和大型语言模型：对保真度，效用和隐私的调查。

目的：神经外科数据用于临床研究和机器学习（ML）模型开发通常受到数据可用性、样本量和监管约束的限制。合成数据为访问、共享和使用真实数据（RWD）提供了一种潜在的解决方案。本研究的目的是评估使用生成对抗网络和大型语言模型（LLM）生成合成神经外科数据的能力，以增强RWD，执行替代RWD的二次分析，并训练ML模型来预测术后结果。方法：基于140名接受神经外科干预的老年人的真实神经外科数据集，使用条件表格生成对抗网络（CTGAN）和LLM gpt - 40生成合成数据。每个模型被用来生成同等（n = 140）和放大（n = 1000）样本量的数据集。通过比较RWD的单变量和双变量统计来评估数据保真度。隐私评估涉及测量生成的合成记录的唯一性。通过以下方法评估效用：1)对出院时Karnofsky性能状态（KPS）恶化和术后重症监护病房（ICU）住院时间延长的预测因素进行再现和扩展临床分析；2)在放大的合成数据集上训练二元ML分类器来预测RWD时KPS恶化。结果：CTGAN和gpt - 40都生成了完整的、高保真的合成表格数据集。gpt - 40在所有测量的保真度、效用和隐私指标上都匹配或超过了CTGAN。在gpt - 40生成的合成数据中保留了KPS恶化和ICU住院时间延长的所有重要临床预测因子，但在效应大小上观察到一些差异。在ctgan生成的数据中，术前KPS没有被保留为一个重要的预测因子。在gpt - 40数据上训练的ML分类器优于在CTGAN数据上训练的模型，在预测KPS恶化方面获得了更高的F1分数（0.725 vs 0.688）。结论：本研究展示了使用生成模型生成高保真合成神经外科数据的前景。合成神经外科数据为神经外科研究数据可用性的关键限制提供了一个潜在的解决方案。需要进一步研究以增强二次分析和ML模型训练的合成数据效用，并评估跨其他数据集（包括临床试验数据）的合成数据生成方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊