Enhancing Clinical Data Warehouses with Provenance and Large File Management: The gitOmmix Approach for Clinical Omics Data

Maxime WackCRC, HeKA, HEGP, CHNO, Adrien CouletCRC, HeKA, Anita BurgunHEGP, Imagine, Bastien RanceUPCité, HEGP, CRC, HeKA
{"title":"Enhancing Clinical Data Warehouses with Provenance and Large File Management: The gitOmmix Approach for Clinical Omics Data","authors":"Maxime WackCRC, HeKA, HEGP, CHNO, Adrien CouletCRC, HeKA, Anita BurgunHEGP, Imagine, Bastien RanceUPCité, HEGP, CRC, HeKA","doi":"arxiv-2409.03288","DOIUrl":null,"url":null,"abstract":"Background. Clinical data warehouses (CDWs) are essential in the reuse of\nhospital data in observational studies or predictive modeling. However, state\nof-the-art CDW systems present two drawbacks. First, they do not support the\nmanagement of large data files, what is critical in medical genomics,\nradiology, digital pathology, and other domains where such files are generated.\nSecond, they do not provide provenance management or means to represent\nlongitudinal relationships between patient events. Indeed, a disease diagnosis\nand its follow-up rely on multiple analyses. In these cases no relationship\nbetween the data (e.g., a large file) and its associated analysis and decision\ncan be documented.Method. We introduce gitOmmix, an approach that overcomes\nthese limitations, and illustrate its usefulness in the management of medical\nomics data. gitOmmix relies on (i) a file versioning system: git, (ii) an\nextension that handles large files: git-annex, (iii) a provenance knowledge\ngraph: PROV-O, and (iv) an alignment between the git versioning information and\nthe provenance knowledge graph.Results. Capabilities inherited from git and\ngit-annex enable retracing the history of a clinical interpretation back to the\npatient sample, through supporting data and analyses. In addition, the\nprovenance knowledge graph, aligned with the git versioning information,\nenables querying and browsing provenance relationships between these\nelements.Conclusion. gitOmmix adds a provenance layer to CDWs, while scaling to\nlarge files and being agnostic of the CDW system. For these reasons, we think\nthat it is a viable and generalizable solution for omics clinical studies.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Quantitative Methods","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.03288","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Background. Clinical data warehouses (CDWs) are essential in the reuse of hospital data in observational studies or predictive modeling. However, state of-the-art CDW systems present two drawbacks. First, they do not support the management of large data files, what is critical in medical genomics, radiology, digital pathology, and other domains where such files are generated. Second, they do not provide provenance management or means to represent longitudinal relationships between patient events. Indeed, a disease diagnosis and its follow-up rely on multiple analyses. In these cases no relationship between the data (e.g., a large file) and its associated analysis and decision can be documented.Method. We introduce gitOmmix, an approach that overcomes these limitations, and illustrate its usefulness in the management of medical omics data. gitOmmix relies on (i) a file versioning system: git, (ii) an extension that handles large files: git-annex, (iii) a provenance knowledge graph: PROV-O, and (iv) an alignment between the git versioning information and the provenance knowledge graph.Results. Capabilities inherited from git and git-annex enable retracing the history of a clinical interpretation back to the patient sample, through supporting data and analyses. In addition, the provenance knowledge graph, aligned with the git versioning information, enables querying and browsing provenance relationships between these elements.Conclusion. gitOmmix adds a provenance layer to CDWs, while scaling to large files and being agnostic of the CDW system. For these reasons, we think that it is a viable and generalizable solution for omics clinical studies.
利用出处和大文件管理增强临床数据仓库:临床奥米克斯数据的 gitOmmix 方法
背景。临床数据仓库(CDW)对于在观察研究或预测建模中重复使用医院数据至关重要。然而,最先进的临床数据仓库系统有两个缺点。首先,它们不支持对大型数据文件的管理,而这在医学基因组学、放射学、数字病理学和其他会生成此类文件的领域是至关重要的;其次,它们不提供出处管理或表示患者事件之间纵向关系的方法。事实上,疾病诊断及其随访依赖于多种分析。在这种情况下,数据(如大文件)与其相关分析和决策之间的关系无法记录。我们介绍了 gitOmmix,一种克服这些局限性的方法,并说明了它在医学组学数据管理中的实用性。gitOmmix 依赖于:(i) 文件版本系统:git;(ii) 处理大文件的扩展:git-annex;(iii) 出处知识图谱:PROV-O;(iv) 处理大文件的扩展:git-annex:PROV-O,以及 (iv) git 版本信息与出处知识图谱之间的对齐。从 git 和 git-annex 继承的功能可以通过支持数据和分析,将临床解释的历史追溯到病人样本。此外,与 git 版本信息相一致的出处知识图谱还能查询和浏览这些元素之间的出处关系。基于这些原因,我们认为它是一种适用于omics临床研究的可行解决方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信