Semi-automated data provenance tracking for transparent data production and linkage to enhance auditing and quality assurance in Trusted Research Environments.
Katherine O'Sullivan, Milan Markovic, Jaroslaw Dymiter, Bernhard Scheliga, Chinasa Odo, Katie Wilde
{"title":"Semi-automated data provenance tracking for transparent data production and linkage to enhance auditing and quality assurance in Trusted Research Environments.","authors":"Katherine O'Sullivan, Milan Markovic, Jaroslaw Dymiter, Bernhard Scheliga, Chinasa Odo, Katie Wilde","doi":"10.23889/ijpds.v10i2.2464","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>We present a prototype solution for improving transparency and quality assurance of the data linkage process through data provenance tracking designed to assist Data Analysts, researchers and information governance teams in authenticating and auditing data workflows within a Trusted Research Environment (TRE).</p><p><strong>Methods: </strong>Using a participatory design process with Data Analysts, researchers and information governance teams, we undertook a contextual inquiry, user requirements interviews, co-design workshops, low-fidelity prototype evaluations. Public Engagement and Involvement activities underpinned the methods to ensure the project and approaches met the public's trust for semi-automating data processing. These helped inform methods for technical implementation, applying the PROV-O ontology to create a derived ontology following the four-step Linked Open Terms methodology and development of automated scripts to collect provenance information for the data processing workflow.</p><p><strong>Results: </strong>The resulting Provenance Explorer for Trusted Research Environments (PE-TRE) interactive tool displays the data linkage information extracted from a knowledge graph described using the derived SHP ontology and results of rule-based validation checks. User evaluations confirmed PE-TRE would contribute to better quality data linkage and reduce data processing errors.</p><p><strong>Conclusion: </strong>This project demonstrates the next stage in advancing transparency and quality assurance within TREs by semi-automating and systematising data tracking in a single tool throughout the data processing lifecycle, improving transparency, openness and quality assurance.</p>","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"10 2","pages":"2464"},"PeriodicalIF":2.2000,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11931605/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Population Data Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23889/ijpds.v10i2.2464","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: We present a prototype solution for improving transparency and quality assurance of the data linkage process through data provenance tracking designed to assist Data Analysts, researchers and information governance teams in authenticating and auditing data workflows within a Trusted Research Environment (TRE).
Methods: Using a participatory design process with Data Analysts, researchers and information governance teams, we undertook a contextual inquiry, user requirements interviews, co-design workshops, low-fidelity prototype evaluations. Public Engagement and Involvement activities underpinned the methods to ensure the project and approaches met the public's trust for semi-automating data processing. These helped inform methods for technical implementation, applying the PROV-O ontology to create a derived ontology following the four-step Linked Open Terms methodology and development of automated scripts to collect provenance information for the data processing workflow.
Results: The resulting Provenance Explorer for Trusted Research Environments (PE-TRE) interactive tool displays the data linkage information extracted from a knowledge graph described using the derived SHP ontology and results of rule-based validation checks. User evaluations confirmed PE-TRE would contribute to better quality data linkage and reduce data processing errors.
Conclusion: This project demonstrates the next stage in advancing transparency and quality assurance within TREs by semi-automating and systematising data tracking in a single tool throughout the data processing lifecycle, improving transparency, openness and quality assurance.
导言:我们提出了一个原型解决方案,通过数据出处跟踪提高数据链接过程的透明度和质量保证,旨在协助数据分析师、研究人员和信息管理团队在可信研究环境(TRE)中验证和审核数据工作流:我们与数据分析师、研究人员和信息管理团队采用参与式设计流程,进行了背景调查、用户需求访谈、共同设计研讨会和低保真原型评估。公众参与活动是这些方法的基础,以确保项目和方法符合公众对半自动化数据处理的信任。这些活动有助于为技术实施方法提供信息,应用PROV-O本体,按照四步关联开放术语方法创建衍生本体,并开发自动脚本,为数据处理工作流收集出处信息:结果:由此产生的可信研究环境出处资源管理器(PE-TRE)交互式工具显示了从使用衍生的SHP本体描述的知识图谱中提取的数据关联信息,以及基于规则的验证检查结果。用户评估证实,PE-TRE 将有助于提高数据关联的质量并减少数据处理错误:本项目展示了在 TRE 中提高透明度和质量保证的下一阶段工作,即在整个数据处理生命周期中通过一个工具实现数据跟踪的半自动化和系统化,从而提高透明度、公开性和质量保证。