Systematic curation and analysis of ovarian cancer data across multiple electronic record systems held within the UK National Health Service: a tertiary referral centre experience
A. Samani , G. Giannone , L. Mercuri , R. Jiang , Y. Nadkarni , A. Chadha , E. Xing , S. Ghaem-Maghami , J. Krell , D. Lyons , C. Fotopoulou , D. Papadimitriou , B. Glampson , E. Mayer , I. McNeish , L. Tookman
{"title":"Systematic curation and analysis of ovarian cancer data across multiple electronic record systems held within the UK National Health Service: a tertiary referral centre experience","authors":"A. Samani , G. Giannone , L. Mercuri , R. Jiang , Y. Nadkarni , A. Chadha , E. Xing , S. Ghaem-Maghami , J. Krell , D. Lyons , C. Fotopoulou , D. Papadimitriou , B. Glampson , E. Mayer , I. McNeish , L. Tookman","doi":"10.1016/j.esmorw.2025.100150","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Manual curation of real-world data (RWD) for patients with ovarian cancer is complex and costly. We set up a novel collaboration between informatics and clinical teams generating automated data curation at scale. This enabled integrated and timely access to RWD across all ovarian cancer patients treated within a tertiary gynaecological cancer centre of the UK National Health System, setting the basis for research and operational use.</div></div><div><h3>Materials and methods</h3><div>The collaboration defined high-yield, accessible data which were pulled into tables representing various clinical domains followed by a systematic integration, cleaning and analysis within the iCARE Secure Data Environment.</div></div><div><h3>Results</h3><div>We curated data for 1581 patients diagnosed between 1 January 2014 and 31 December 2022. We showed that referrals to the specialist tumour board consistently increased over time while baseline characteristics did not change significantly. The number of patients receiving a new line of therapy decreased in 2020, the first year of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) outbreak. Data robustness was supported using multivariate survival modelling demonstrating the expected impact of known prognostic factors. There was a paucity of available data for some variables (e.g. ethnicity) while others lacked a consistent storage mechanism within source systems (genomic data).</div></div><div><h3>Conclusions</h3><div>Automated curation and analysis of RWD is possible at scale, in real time. Analysis yielded clinical findings consistent with the prevalent literature and showed evolution of treatment practice. While not all unstructured data could be explored, we demonstrate that automated curation of clinically important real-world variables is feasible and can yield robust data for both research and operational purposes.</div></div>","PeriodicalId":100491,"journal":{"name":"ESMO Real World Data and Digital Oncology","volume":"9 ","pages":"Article 100150"},"PeriodicalIF":0.0000,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ESMO Real World Data and Digital Oncology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949820125000396","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Background
Manual curation of real-world data (RWD) for patients with ovarian cancer is complex and costly. We set up a novel collaboration between informatics and clinical teams generating automated data curation at scale. This enabled integrated and timely access to RWD across all ovarian cancer patients treated within a tertiary gynaecological cancer centre of the UK National Health System, setting the basis for research and operational use.
Materials and methods
The collaboration defined high-yield, accessible data which were pulled into tables representing various clinical domains followed by a systematic integration, cleaning and analysis within the iCARE Secure Data Environment.
Results
We curated data for 1581 patients diagnosed between 1 January 2014 and 31 December 2022. We showed that referrals to the specialist tumour board consistently increased over time while baseline characteristics did not change significantly. The number of patients receiving a new line of therapy decreased in 2020, the first year of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) outbreak. Data robustness was supported using multivariate survival modelling demonstrating the expected impact of known prognostic factors. There was a paucity of available data for some variables (e.g. ethnicity) while others lacked a consistent storage mechanism within source systems (genomic data).
Conclusions
Automated curation and analysis of RWD is possible at scale, in real time. Analysis yielded clinical findings consistent with the prevalent literature and showed evolution of treatment practice. While not all unstructured data could be explored, we demonstrate that automated curation of clinically important real-world variables is feasible and can yield robust data for both research and operational purposes.