Leveraging Synthetic Data to Facilitate Research: A Collaborative Model for Analyzing Sensitive National Cancer Registry Data in England.

IF 1.9 4区 医学 Q4 MEDICAL INFORMATICS
George Kafatos, Julia Levy, Sophie Jose, Pooja Hindocha, Olia Archangelidi, Sally Vernon, Lora Frayling
{"title":"Leveraging Synthetic Data to Facilitate Research: A Collaborative Model for Analyzing Sensitive National Cancer Registry Data in England.","authors":"George Kafatos, Julia Levy, Sophie Jose, Pooja Hindocha, Olia Archangelidi, Sally Vernon, Lora Frayling","doi":"10.1007/s43441-025-00820-z","DOIUrl":null,"url":null,"abstract":"<p><p>Real-world data (RWD) are increasingly recognized as critical to advancing drug development and health care delivery, with regulatory bodies increasingly recognising their value. However, stringent governance requirements, while essential for protecting patient privacy, create significant challenges for conducting research. The Cancer Analysis System (CAS), managed by National Health Service (NHS) England, includes a national cancer registry and linked health care datasets. To address data access challenges, Simulacrum, a set of publicly available synthetic datasets generated from the CAS, can be used to carry out preliminary data analysis, hypothesis generation and development of programming code that can be executed to run analyses on CAS data. This paper presents a collaborative operating model that leverages Simulacrum to enable efficient, privacy-compliant analytics. Analysis of 18 projects conducted using this model demonstrated an average duration of 2.3 months from the start of Code Development to Data Release (CDDR). By enabling researchers to conduct privacy-compliant analysis on synthetic data, this approach increases transparency by providing insights into patient-level data while reduces reliance on custodians of sensitive data. Our findings highlight how synthetic data can be leveraged to facilitate efficient research on restricted patient-level RWD, while safeguarding patient privacy. This framework offers a scalable solution for other data custodians that can enable broader use of RWD, accelerating healthcare innovation.</p>","PeriodicalId":23084,"journal":{"name":"Therapeutic innovation & regulatory science","volume":" ","pages":"919-928"},"PeriodicalIF":1.9000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12446092/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Therapeutic innovation & regulatory science","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s43441-025-00820-z","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/6/5 0:00:00","PubModel":"Epub","JCR":"Q4","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0

Abstract

Real-world data (RWD) are increasingly recognized as critical to advancing drug development and health care delivery, with regulatory bodies increasingly recognising their value. However, stringent governance requirements, while essential for protecting patient privacy, create significant challenges for conducting research. The Cancer Analysis System (CAS), managed by National Health Service (NHS) England, includes a national cancer registry and linked health care datasets. To address data access challenges, Simulacrum, a set of publicly available synthetic datasets generated from the CAS, can be used to carry out preliminary data analysis, hypothesis generation and development of programming code that can be executed to run analyses on CAS data. This paper presents a collaborative operating model that leverages Simulacrum to enable efficient, privacy-compliant analytics. Analysis of 18 projects conducted using this model demonstrated an average duration of 2.3 months from the start of Code Development to Data Release (CDDR). By enabling researchers to conduct privacy-compliant analysis on synthetic data, this approach increases transparency by providing insights into patient-level data while reduces reliance on custodians of sensitive data. Our findings highlight how synthetic data can be leveraged to facilitate efficient research on restricted patient-level RWD, while safeguarding patient privacy. This framework offers a scalable solution for other data custodians that can enable broader use of RWD, accelerating healthcare innovation.

Abstract Image

Abstract Image

Abstract Image

利用综合数据促进研究:一个协作模型分析敏感的国家癌症登记数据在英国。
现实世界数据(RWD)越来越被认为是推动药物开发和医疗保健服务的关键,监管机构也越来越认识到它们的价值。然而,严格的治理要求虽然对保护患者隐私至关重要,但也给开展研究带来了重大挑战。癌症分析系统(CAS)由英国国民健康服务(NHS)管理,包括国家癌症登记和相关的医疗保健数据集。为了应对数据访问挑战,Simulacrum(一组由CAS生成的公开合成数据集)可用于进行初步数据分析、假设生成和编程代码开发,这些代码可用于对CAS数据进行分析。本文提出了一种协作操作模型,利用Simulacrum实现高效、符合隐私的分析。对使用该模型进行的18个项目的分析表明,从代码开发开始到数据发布(CDDR)的平均持续时间为2.3个月。通过使研究人员能够对合成数据进行符合隐私的分析,这种方法通过提供对患者级数据的见解来提高透明度,同时减少对敏感数据托管方的依赖。我们的研究结果强调了如何利用合成数据来促进对受限制的患者级RWD的有效研究,同时保护患者隐私。该框架为其他数据托管方提供了可扩展的解决方案,可以更广泛地使用RWD,从而加速医疗保健创新。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Therapeutic innovation & regulatory science
Therapeutic innovation & regulatory science MEDICAL INFORMATICS-PHARMACOLOGY & PHARMACY
CiteScore
3.40
自引率
13.30%
发文量
127
期刊介绍: Therapeutic Innovation & Regulatory Science (TIRS) is the official scientific journal of DIA that strives to advance medical product discovery, development, regulation, and use through the publication of peer-reviewed original and review articles, commentaries, and letters to the editor across the spectrum of converting biomedical science into practical solutions to advance human health. The focus areas of the journal are as follows: Biostatistics Clinical Trials Product Development and Innovation Global Perspectives Policy Regulatory Science Product Safety Special Populations
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信