支持临床和基因组学数据集成的混合云数据湖架构。

IF 2.3 3区医学 Q2 HEALTH CARE SCIENCES & SERVICES

Health Informatics Journal Pub Date : 2025-04-01 Epub Date: 2025-06-18 DOI:10.1177/14604582251353440

Apollo McOwiti, Heidi Dowst, Fei Zheng, Susan Hilsenbeck, Christopher Amos

{"title":"支持临床和基因组学数据集成的混合云数据湖架构。","authors":"Apollo McOwiti, Heidi Dowst, Fei Zheng, Susan Hilsenbeck, Christopher Amos","doi":"10.1177/14604582251353440","DOIUrl":null,"url":null,"abstract":"Objective: Cancer centers must quickly integrate clinical genomics data from different vendors for oncology operations and research. Clinical data warehouse architectures are costly to construct and brittle, and they are not readily amenable to the rapid changes in oncology research. We introduce a cost-effective hybrid cloud Data Lake architecture for storing clinical genomic data from different vendors, aiding both clinical and research workflows. Methods: We created a Data Lake architecture based on the zone architecture, with four layers: ingestion, storage, transformation, and interaction. The layers are implemented with a hybrid cloud architecture. Rich metadata created from patient and genomic data enables patient-based queries, with access to data controlled through a data governance workflow. Results: Genomic data are stored in the cloud, synchronized with vendors' storage, and managed by a governance committee. The architecture implementation includes genomic test results from two vendors and supports independent clinical sites. The implementation serves 149 clinicians across 31 disease groups and stores 240 TB of data on 5800 patients at a monthly cost of approximately $350. Conclusion: The Data Lake architecture offers flexibility and scalability, making it suitable for organizations of all sizes to integrate clinical and genomic data efficiently for clinical and research purposes.","PeriodicalId":55069,"journal":{"name":"Health Informatics Journal","volume":"31 2","pages":"14604582251353440"},"PeriodicalIF":2.3000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A hybrid cloud data lake architecture supporting the integration of clinical and genomics data.\",\"authors\":\"Apollo McOwiti, Heidi Dowst, Fei Zheng, Susan Hilsenbeck, Christopher Amos\",\"doi\":\"10.1177/14604582251353440\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objective: Cancer centers must quickly integrate clinical genomics data from different vendors for oncology operations and research. Clinical data warehouse architectures are costly to construct and brittle, and they are not readily amenable to the rapid changes in oncology research. We introduce a cost-effective hybrid cloud Data Lake architecture for storing clinical genomic data from different vendors, aiding both clinical and research workflows. Methods: We created a Data Lake architecture based on the zone architecture, with four layers: ingestion, storage, transformation, and interaction. The layers are implemented with a hybrid cloud architecture. Rich metadata created from patient and genomic data enables patient-based queries, with access to data controlled through a data governance workflow. Results: Genomic data are stored in the cloud, synchronized with vendors' storage, and managed by a governance committee. The architecture implementation includes genomic test results from two vendors and supports independent clinical sites. The implementation serves 149 clinicians across 31 disease groups and stores 240 TB of data on 5800 patients at a monthly cost of approximately $350. Conclusion: The Data Lake architecture offers flexibility and scalability, making it suitable for organizations of all sizes to integrate clinical and genomic data efficiently for clinical and research purposes.\",\"PeriodicalId\":55069,\"journal\":{\"name\":\"Health Informatics Journal\",\"volume\":\"31 2\",\"pages\":\"14604582251353440\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2025-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Health Informatics Journal\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1177/14604582251353440\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/6/18 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Health Informatics Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/14604582251353440","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/6/18 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

摘要

目的：癌症中心必须快速整合来自不同供应商的临床基因组学数据，用于肿瘤手术和研究。临床数据仓库架构的构建成本高且脆弱，而且它们不容易适应肿瘤研究的快速变化。我们引入了一个具有成本效益的混合云数据湖架构，用于存储来自不同供应商的临床基因组数据，帮助临床和研究工作流程。方法：我们在区域架构的基础上创建了一个数据湖架构，包含摄取、存储、转换和交互四层。这些层通过混合云架构实现。从患者和基因组数据创建的丰富元数据支持基于患者的查询，并可以访问通过数据治理工作流控制的数据。结果：基因组数据存储在云中，与供应商的存储同步，并由治理委员会管理。该体系结构实现包括来自两个供应商的基因组测试结果，并支持独立的临床站点。该项目为31个疾病组的149名临床医生提供服务，存储5800名患者的240 TB数据，每月费用约为350美元。结论：数据湖架构提供了灵活性和可扩展性，使其适合各种规模的组织有效地整合临床和基因组数据，用于临床和研究目的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A hybrid cloud data lake architecture supporting the integration of clinical and genomics data.

Objective: Cancer centers must quickly integrate clinical genomics data from different vendors for oncology operations and research. Clinical data warehouse architectures are costly to construct and brittle, and they are not readily amenable to the rapid changes in oncology research. We introduce a cost-effective hybrid cloud Data Lake architecture for storing clinical genomic data from different vendors, aiding both clinical and research workflows. Methods: We created a Data Lake architecture based on the zone architecture, with four layers: ingestion, storage, transformation, and interaction. The layers are implemented with a hybrid cloud architecture. Rich metadata created from patient and genomic data enables patient-based queries, with access to data controlled through a data governance workflow. Results: Genomic data are stored in the cloud, synchronized with vendors' storage, and managed by a governance committee. The architecture implementation includes genomic test results from two vendors and supports independent clinical sites. The implementation serves 149 clinicians across 31 disease groups and stores 240 TB of data on 5800 patients at a monthly cost of approximately $350. Conclusion: The Data Lake architecture offers flexibility and scalability, making it suitable for organizations of all sizes to integrate clinical and genomic data efficiently for clinical and research purposes.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Health Informatics Journal HEALTH CARE SCIENCES & SERVICES-MEDICAL INFORMATICS

CiteScore

7.80

自引率

6.70%

发文量

审稿时长

6 months

期刊介绍： Health Informatics Journal is an international peer-reviewed journal. All papers submitted to Health Informatics Journal are subject to peer review by members of a carefully appointed editorial board. The journal operates a conventional single-blind reviewing policy in which the reviewer’s name is always concealed from the submitting author.