支持临床和基因组学数据集成的混合云数据湖架构。

IF 2.3 3区 医学 Q2 HEALTH CARE SCIENCES & SERVICES
Health Informatics Journal Pub Date : 2025-04-01 Epub Date: 2025-06-18 DOI:10.1177/14604582251353440
Apollo McOwiti, Heidi Dowst, Fei Zheng, Susan Hilsenbeck, Christopher Amos
{"title":"支持临床和基因组学数据集成的混合云数据湖架构。","authors":"Apollo McOwiti, Heidi Dowst, Fei Zheng, Susan Hilsenbeck, Christopher Amos","doi":"10.1177/14604582251353440","DOIUrl":null,"url":null,"abstract":"<p><p><b>Objective:</b> Cancer centers must quickly integrate clinical genomics data from different vendors for oncology operations and research. Clinical data warehouse architectures are costly to construct and brittle, and they are not readily amenable to the rapid changes in oncology research. We introduce a cost-effective hybrid cloud Data Lake architecture for storing clinical genomic data from different vendors, aiding both clinical and research workflows. <b>Methods:</b> We created a Data Lake architecture based on the zone architecture, with four layers: ingestion, storage, transformation, and interaction. The layers are implemented with a hybrid cloud architecture. Rich metadata created from patient and genomic data enables patient-based queries, with access to data controlled through a data governance workflow. <b>Results:</b> Genomic data are stored in the cloud, synchronized with vendors' storage, and managed by a governance committee. The architecture implementation includes genomic test results from two vendors and supports independent clinical sites. The implementation serves 149 clinicians across 31 disease groups and stores 240 TB of data on 5800 patients at a monthly cost of approximately $350. <b>Conclusion:</b> The Data Lake architecture offers flexibility and scalability, making it suitable for organizations of all sizes to integrate clinical and genomic data efficiently for clinical and research purposes.</p>","PeriodicalId":55069,"journal":{"name":"Health Informatics Journal","volume":"31 2","pages":"14604582251353440"},"PeriodicalIF":2.3000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A hybrid cloud data lake architecture supporting the integration of clinical and genomics data.\",\"authors\":\"Apollo McOwiti, Heidi Dowst, Fei Zheng, Susan Hilsenbeck, Christopher Amos\",\"doi\":\"10.1177/14604582251353440\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p><b>Objective:</b> Cancer centers must quickly integrate clinical genomics data from different vendors for oncology operations and research. Clinical data warehouse architectures are costly to construct and brittle, and they are not readily amenable to the rapid changes in oncology research. We introduce a cost-effective hybrid cloud Data Lake architecture for storing clinical genomic data from different vendors, aiding both clinical and research workflows. <b>Methods:</b> We created a Data Lake architecture based on the zone architecture, with four layers: ingestion, storage, transformation, and interaction. The layers are implemented with a hybrid cloud architecture. Rich metadata created from patient and genomic data enables patient-based queries, with access to data controlled through a data governance workflow. <b>Results:</b> Genomic data are stored in the cloud, synchronized with vendors' storage, and managed by a governance committee. The architecture implementation includes genomic test results from two vendors and supports independent clinical sites. The implementation serves 149 clinicians across 31 disease groups and stores 240 TB of data on 5800 patients at a monthly cost of approximately $350. <b>Conclusion:</b> The Data Lake architecture offers flexibility and scalability, making it suitable for organizations of all sizes to integrate clinical and genomic data efficiently for clinical and research purposes.</p>\",\"PeriodicalId\":55069,\"journal\":{\"name\":\"Health Informatics Journal\",\"volume\":\"31 2\",\"pages\":\"14604582251353440\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2025-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Health Informatics Journal\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1177/14604582251353440\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/6/18 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Health Informatics Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/14604582251353440","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/6/18 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

摘要

目的:癌症中心必须快速整合来自不同供应商的临床基因组学数据,用于肿瘤手术和研究。临床数据仓库架构的构建成本高且脆弱,而且它们不容易适应肿瘤研究的快速变化。我们引入了一个具有成本效益的混合云数据湖架构,用于存储来自不同供应商的临床基因组数据,帮助临床和研究工作流程。方法:我们在区域架构的基础上创建了一个数据湖架构,包含摄取、存储、转换和交互四层。这些层通过混合云架构实现。从患者和基因组数据创建的丰富元数据支持基于患者的查询,并可以访问通过数据治理工作流控制的数据。结果:基因组数据存储在云中,与供应商的存储同步,并由治理委员会管理。该体系结构实现包括来自两个供应商的基因组测试结果,并支持独立的临床站点。该项目为31个疾病组的149名临床医生提供服务,存储5800名患者的240 TB数据,每月费用约为350美元。结论:数据湖架构提供了灵活性和可扩展性,使其适合各种规模的组织有效地整合临床和基因组数据,用于临床和研究目的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A hybrid cloud data lake architecture supporting the integration of clinical and genomics data.

Objective: Cancer centers must quickly integrate clinical genomics data from different vendors for oncology operations and research. Clinical data warehouse architectures are costly to construct and brittle, and they are not readily amenable to the rapid changes in oncology research. We introduce a cost-effective hybrid cloud Data Lake architecture for storing clinical genomic data from different vendors, aiding both clinical and research workflows. Methods: We created a Data Lake architecture based on the zone architecture, with four layers: ingestion, storage, transformation, and interaction. The layers are implemented with a hybrid cloud architecture. Rich metadata created from patient and genomic data enables patient-based queries, with access to data controlled through a data governance workflow. Results: Genomic data are stored in the cloud, synchronized with vendors' storage, and managed by a governance committee. The architecture implementation includes genomic test results from two vendors and supports independent clinical sites. The implementation serves 149 clinicians across 31 disease groups and stores 240 TB of data on 5800 patients at a monthly cost of approximately $350. Conclusion: The Data Lake architecture offers flexibility and scalability, making it suitable for organizations of all sizes to integrate clinical and genomic data efficiently for clinical and research purposes.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Health Informatics Journal
Health Informatics Journal HEALTH CARE SCIENCES & SERVICES-MEDICAL INFORMATICS
CiteScore
7.80
自引率
6.70%
发文量
80
审稿时长
6 months
期刊介绍: Health Informatics Journal is an international peer-reviewed journal. All papers submitted to Health Informatics Journal are subject to peer review by members of a carefully appointed editorial board. The journal operates a conventional single-blind reviewing policy in which the reviewer’s name is always concealed from the submitting author.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信