George Kafatos, Julia Levy, Sophie Jose, Pooja Hindocha, Olia Archangelidi, Sally Vernon, Lora Frayling
{"title":"Leveraging Synthetic Data to Facilitate Research: A Collaborative Model for Analyzing Sensitive National Cancer Registry Data in England.","authors":"George Kafatos, Julia Levy, Sophie Jose, Pooja Hindocha, Olia Archangelidi, Sally Vernon, Lora Frayling","doi":"10.1007/s43441-025-00820-z","DOIUrl":null,"url":null,"abstract":"<p><p>Real-world data (RWD) are increasingly recognized as critical to advancing drug development and health care delivery, with regulatory bodies increasingly recognising their value. However, stringent governance requirements, while essential for protecting patient privacy, create significant challenges for conducting research. The Cancer Analysis System (CAS), managed by National Health Service (NHS) England, includes a national cancer registry and linked health care datasets. To address data access challenges, Simulacrum, a set of publicly available synthetic datasets generated from the CAS, can be used to carry out preliminary data analysis, hypothesis generation and development of programming code that can be executed to run analyses on CAS data. This paper presents a collaborative operating model that leverages Simulacrum to enable efficient, privacy-compliant analytics. Analysis of 18 projects conducted using this model demonstrated an average duration of 2.3 months from the start of Code Development to Data Release (CDDR). By enabling researchers to conduct privacy-compliant analysis on synthetic data, this approach increases transparency by providing insights into patient-level data while reduces reliance on custodians of sensitive data. Our findings highlight how synthetic data can be leveraged to facilitate efficient research on restricted patient-level RWD, while safeguarding patient privacy. This framework offers a scalable solution for other data custodians that can enable broader use of RWD, accelerating healthcare innovation.</p>","PeriodicalId":23084,"journal":{"name":"Therapeutic innovation & regulatory science","volume":" ","pages":"919-928"},"PeriodicalIF":1.9000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12446092/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Therapeutic innovation & regulatory science","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s43441-025-00820-z","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/6/5 0:00:00","PubModel":"Epub","JCR":"Q4","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0
Abstract
Real-world data (RWD) are increasingly recognized as critical to advancing drug development and health care delivery, with regulatory bodies increasingly recognising their value. However, stringent governance requirements, while essential for protecting patient privacy, create significant challenges for conducting research. The Cancer Analysis System (CAS), managed by National Health Service (NHS) England, includes a national cancer registry and linked health care datasets. To address data access challenges, Simulacrum, a set of publicly available synthetic datasets generated from the CAS, can be used to carry out preliminary data analysis, hypothesis generation and development of programming code that can be executed to run analyses on CAS data. This paper presents a collaborative operating model that leverages Simulacrum to enable efficient, privacy-compliant analytics. Analysis of 18 projects conducted using this model demonstrated an average duration of 2.3 months from the start of Code Development to Data Release (CDDR). By enabling researchers to conduct privacy-compliant analysis on synthetic data, this approach increases transparency by providing insights into patient-level data while reduces reliance on custodians of sensitive data. Our findings highlight how synthetic data can be leveraged to facilitate efficient research on restricted patient-level RWD, while safeguarding patient privacy. This framework offers a scalable solution for other data custodians that can enable broader use of RWD, accelerating healthcare innovation.
期刊介绍:
Therapeutic Innovation & Regulatory Science (TIRS) is the official scientific journal of DIA that strives to advance medical product discovery, development, regulation, and use through the publication of peer-reviewed original and review articles, commentaries, and letters to the editor across the spectrum of converting biomedical science into practical solutions to advance human health.
The focus areas of the journal are as follows:
Biostatistics
Clinical Trials
Product Development and Innovation
Global Perspectives
Policy
Regulatory Science
Product Safety
Special Populations