Apollo McOwiti, Heidi Dowst, Fei Zheng, Susan Hilsenbeck, Christopher Amos
{"title":"支持临床和基因组学数据集成的混合云数据湖架构。","authors":"Apollo McOwiti, Heidi Dowst, Fei Zheng, Susan Hilsenbeck, Christopher Amos","doi":"10.1177/14604582251353440","DOIUrl":null,"url":null,"abstract":"<p><p><b>Objective:</b> Cancer centers must quickly integrate clinical genomics data from different vendors for oncology operations and research. Clinical data warehouse architectures are costly to construct and brittle, and they are not readily amenable to the rapid changes in oncology research. We introduce a cost-effective hybrid cloud Data Lake architecture for storing clinical genomic data from different vendors, aiding both clinical and research workflows. <b>Methods:</b> We created a Data Lake architecture based on the zone architecture, with four layers: ingestion, storage, transformation, and interaction. The layers are implemented with a hybrid cloud architecture. Rich metadata created from patient and genomic data enables patient-based queries, with access to data controlled through a data governance workflow. <b>Results:</b> Genomic data are stored in the cloud, synchronized with vendors' storage, and managed by a governance committee. The architecture implementation includes genomic test results from two vendors and supports independent clinical sites. The implementation serves 149 clinicians across 31 disease groups and stores 240 TB of data on 5800 patients at a monthly cost of approximately $350. <b>Conclusion:</b> The Data Lake architecture offers flexibility and scalability, making it suitable for organizations of all sizes to integrate clinical and genomic data efficiently for clinical and research purposes.</p>","PeriodicalId":55069,"journal":{"name":"Health Informatics Journal","volume":"31 2","pages":"14604582251353440"},"PeriodicalIF":2.3000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A hybrid cloud data lake architecture supporting the integration of clinical and genomics data.\",\"authors\":\"Apollo McOwiti, Heidi Dowst, Fei Zheng, Susan Hilsenbeck, Christopher Amos\",\"doi\":\"10.1177/14604582251353440\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p><b>Objective:</b> Cancer centers must quickly integrate clinical genomics data from different vendors for oncology operations and research. Clinical data warehouse architectures are costly to construct and brittle, and they are not readily amenable to the rapid changes in oncology research. We introduce a cost-effective hybrid cloud Data Lake architecture for storing clinical genomic data from different vendors, aiding both clinical and research workflows. <b>Methods:</b> We created a Data Lake architecture based on the zone architecture, with four layers: ingestion, storage, transformation, and interaction. The layers are implemented with a hybrid cloud architecture. Rich metadata created from patient and genomic data enables patient-based queries, with access to data controlled through a data governance workflow. <b>Results:</b> Genomic data are stored in the cloud, synchronized with vendors' storage, and managed by a governance committee. The architecture implementation includes genomic test results from two vendors and supports independent clinical sites. The implementation serves 149 clinicians across 31 disease groups and stores 240 TB of data on 5800 patients at a monthly cost of approximately $350. <b>Conclusion:</b> The Data Lake architecture offers flexibility and scalability, making it suitable for organizations of all sizes to integrate clinical and genomic data efficiently for clinical and research purposes.</p>\",\"PeriodicalId\":55069,\"journal\":{\"name\":\"Health Informatics Journal\",\"volume\":\"31 2\",\"pages\":\"14604582251353440\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2025-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Health Informatics Journal\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1177/14604582251353440\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/6/18 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Health Informatics Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/14604582251353440","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/6/18 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
A hybrid cloud data lake architecture supporting the integration of clinical and genomics data.
Objective: Cancer centers must quickly integrate clinical genomics data from different vendors for oncology operations and research. Clinical data warehouse architectures are costly to construct and brittle, and they are not readily amenable to the rapid changes in oncology research. We introduce a cost-effective hybrid cloud Data Lake architecture for storing clinical genomic data from different vendors, aiding both clinical and research workflows. Methods: We created a Data Lake architecture based on the zone architecture, with four layers: ingestion, storage, transformation, and interaction. The layers are implemented with a hybrid cloud architecture. Rich metadata created from patient and genomic data enables patient-based queries, with access to data controlled through a data governance workflow. Results: Genomic data are stored in the cloud, synchronized with vendors' storage, and managed by a governance committee. The architecture implementation includes genomic test results from two vendors and supports independent clinical sites. The implementation serves 149 clinicians across 31 disease groups and stores 240 TB of data on 5800 patients at a monthly cost of approximately $350. Conclusion: The Data Lake architecture offers flexibility and scalability, making it suitable for organizations of all sizes to integrate clinical and genomic data efficiently for clinical and research purposes.
期刊介绍:
Health Informatics Journal is an international peer-reviewed journal. All papers submitted to Health Informatics Journal are subject to peer review by members of a carefully appointed editorial board. The journal operates a conventional single-blind reviewing policy in which the reviewer’s name is always concealed from the submitting author.