Exploring Detection Methods for Synthetic Medical Datasets Created With a Large Language Model

IF 7.8 1区 医学 Q1 OPHTHALMOLOGY
Andrea Taloni, Giulia Coco, Marco Pellegrini, Matthias Wjst, Niccolò Salgari, Giovanna Carnovale-Scalzo, Vincenzo Scorcia, Massimo Busin, Giuseppe Giannaccare
{"title":"Exploring Detection Methods for Synthetic Medical Datasets Created With a Large Language Model","authors":"Andrea Taloni, Giulia Coco, Marco Pellegrini, Matthias Wjst, Niccolò Salgari, Giovanna Carnovale-Scalzo, Vincenzo Scorcia, Massimo Busin, Giuseppe Giannaccare","doi":"10.1001/jamaophthalmol.2025.0834","DOIUrl":null,"url":null,"abstract":"ImportanceRecently, it was proved that the large language model Generative Pre-trained Transformer 4 (GPT-4; OpenAI) can fabricate synthetic medical datasets designed to support false scientific evidence.ObjectiveTo uncover statistical patterns that may suggest fabrication in datasets produced by large language models and to improve these synthetic datasets by attempting to remove detectable marks of nonauthenticity, investigating the limits of generative artificial intelligence.Design, Setting, and ParticipantsIn this quality improvement study, synthetic datasets were produced for 3 fictional clinical studies designed to compare the outcomes of 2 alternative treatments for specific ocular diseases. Synthetic datasets were produced using the default GPT-4o model and a custom GPT. Data fabrication was conducted in November 2024.ExposurePrompts were submitted to GPT-4o to produce 12 “unrefined” datasets, which underwent forensic examination. Based on the outcomes of this analysis, the custom GPT Synthetic Data Creator was built with detailed instructions to generate 12 “refined” datasets designed to evade authenticity checks. Then, forensic analysis was repeated on these enhanced datasets.Main Outcomes and MeasuresForensic analysis was performed to identify statistical anomalies in demographic data, distribution uniformity, and repetitive patterns of last digits, as well as linear correlations, distribution shape, and outliers of study variables. Datasets were also qualitatively assessed for the presence of unrealistic clinical records.ResultsForensic analysis identified 103 fabrication marks among 304 tests (33.9%) in unrefined datasets. Notable flaws included mismatch between patient names and gender (n = 12), baseline visits occurring during weekends (n = 12), age calculation errors (n = 9), lack of uniformity (n = 4), and repetitive numerical patterns in last digits (n = 7). Very weak correlations (<jats:italic>r</jats:italic> &amp;amp;lt; 0.1) were observed between study variables (n = 12). In addition, variables showed a suspicious distribution shape (n = 6). Compared with unrefined datasets, refined ones showed 29.3% (95% CI, 23.5%-35.1%) fewer signs of fabrication (14 of 304 statistical tests performed [4.6%]). Four refined datasets passed forensic analysis as authentic; however, suspicious distribution shape or other issues were found in others.Conclusions and RelevanceSufficiently sophisticated custom GPTs can perform complex statistical tasks and may be abused to fabricate synthetic datasets that can pass forensic analysis as authentic.","PeriodicalId":14518,"journal":{"name":"JAMA ophthalmology","volume":"6 1","pages":""},"PeriodicalIF":7.8000,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JAMA ophthalmology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1001/jamaophthalmol.2025.0834","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

ImportanceRecently, it was proved that the large language model Generative Pre-trained Transformer 4 (GPT-4; OpenAI) can fabricate synthetic medical datasets designed to support false scientific evidence.ObjectiveTo uncover statistical patterns that may suggest fabrication in datasets produced by large language models and to improve these synthetic datasets by attempting to remove detectable marks of nonauthenticity, investigating the limits of generative artificial intelligence.Design, Setting, and ParticipantsIn this quality improvement study, synthetic datasets were produced for 3 fictional clinical studies designed to compare the outcomes of 2 alternative treatments for specific ocular diseases. Synthetic datasets were produced using the default GPT-4o model and a custom GPT. Data fabrication was conducted in November 2024.ExposurePrompts were submitted to GPT-4o to produce 12 “unrefined” datasets, which underwent forensic examination. Based on the outcomes of this analysis, the custom GPT Synthetic Data Creator was built with detailed instructions to generate 12 “refined” datasets designed to evade authenticity checks. Then, forensic analysis was repeated on these enhanced datasets.Main Outcomes and MeasuresForensic analysis was performed to identify statistical anomalies in demographic data, distribution uniformity, and repetitive patterns of last digits, as well as linear correlations, distribution shape, and outliers of study variables. Datasets were also qualitatively assessed for the presence of unrealistic clinical records.ResultsForensic analysis identified 103 fabrication marks among 304 tests (33.9%) in unrefined datasets. Notable flaws included mismatch between patient names and gender (n = 12), baseline visits occurring during weekends (n = 12), age calculation errors (n = 9), lack of uniformity (n = 4), and repetitive numerical patterns in last digits (n = 7). Very weak correlations (r &amp;lt; 0.1) were observed between study variables (n = 12). In addition, variables showed a suspicious distribution shape (n = 6). Compared with unrefined datasets, refined ones showed 29.3% (95% CI, 23.5%-35.1%) fewer signs of fabrication (14 of 304 statistical tests performed [4.6%]). Four refined datasets passed forensic analysis as authentic; however, suspicious distribution shape or other issues were found in others.Conclusions and RelevanceSufficiently sophisticated custom GPTs can perform complex statistical tasks and may be abused to fabricate synthetic datasets that can pass forensic analysis as authentic.
探索用大型语言模型创建的合成医学数据集的检测方法
重要意义近年来,大型语言模型生成式预训练变压器4 (GPT-4;OpenAI)可以制造合成的医疗数据集,以支持虚假的科学证据。目的揭示由大型语言模型产生的数据集中可能存在的统计模式,并通过尝试去除可检测的非真实性标记来改进这些合成数据集,研究生成式人工智能的局限性。设计、环境和参与者:在这项质量改进研究中,为3个虚构的临床研究生成了综合数据集,旨在比较两种替代治疗特定眼部疾病的结果。使用默认GPT- 40模型和自定义GPT生成合成数据集。数据制作于2024年11月进行。曝光提示被提交给gpt - 40,以产生12个“未精炼”的数据集,这些数据集经过了法医检查。基于此分析的结果,构建了定制的GPT合成数据创建者,其中包含详细的说明,以生成12个“精炼”数据集,旨在逃避真实性检查。然后,在这些增强的数据集上重复法医分析。主要结果和测量方法进行法医分析,以确定人口统计数据中的统计异常、分布均匀性和最后数字的重复模式,以及线性相关性、分布形状和研究变量的异常值。还对数据集进行了定性评估,以确定是否存在不真实的临床记录。结果在未精炼的数据集中,304个检测中鉴定出103个伪造标记(33.9%)。值得注意的缺陷包括患者姓名和性别不匹配(n = 12),基线就诊发生在周末(n = 12),年龄计算错误(n = 9),缺乏一致性(n = 4),以及最后数字重复的数字模式(n = 7)。相关性非常弱(r &lt;在研究变量之间观察到0.1)(n = 12)。此外,变量表现出可疑的分布形状(n = 6)。与未精炼的数据集相比,精炼数据集的伪造迹象减少了29.3% (95% CI, 23.5%-35.1%)(304个统计检验中有14个[4.6%])。四个精细化的数据集通过了法医分析作为真实的;然而,在其他地方发现了可疑的分布形状或其他问题。结论和相关性足够复杂的定制gpt可以执行复杂的统计任务,并可能被滥用来制造可以通过法医分析作为真实的合成数据集。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
JAMA ophthalmology
JAMA ophthalmology OPHTHALMOLOGY-
CiteScore
13.20
自引率
3.70%
发文量
340
期刊介绍: JAMA Ophthalmology, with a rich history of continuous publication since 1869, stands as a distinguished international, peer-reviewed journal dedicated to ophthalmology and visual science. In 2019, the journal proudly commemorated 150 years of uninterrupted service to the field. As a member of the esteemed JAMA Network, a consortium renowned for its peer-reviewed general medical and specialty publications, JAMA Ophthalmology upholds the highest standards of excellence in disseminating cutting-edge research and insights. Join us in celebrating our legacy and advancing the frontiers of ophthalmology and visual science.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信