Cloud Engineering Principles and Technology Enablers for Medical Image Processing-as-a-Service.

Proceedings of the IEEE International Conference on Cloud Engineering. IEEE International Conference on Cloud Engineering Pub Date : 2017-04-01 Epub Date: 2017-05-11 DOI:10.1109/IC2E.2017.23

Shunxing Bao, Andrew J Plassard, Bennett A Landman, Aniruddha Gokhale

{"title":"Cloud Engineering Principles and Technology Enablers for Medical Image Processing-as-a-Service.","authors":"Shunxing Bao, Andrew J Plassard, Bennett A Landman, Aniruddha Gokhale","doi":"10.1109/IC2E.2017.23","DOIUrl":null,"url":null,"abstract":"<p><p>Traditional in-house, laboratory-based medical imaging studies use hierarchical data structures (e.g., NFS file stores) or databases (e.g., COINS, XNAT) for storage and retrieval. The resulting performance from these approaches is, however, impeded by standard network switches since they can saturate network bandwidth during transfer from storage to processing nodes for even moderate-sized studies. To that end, a cloud-based \"medical image processing-as-a-service\" offers promise in utilizing the ecosystem of Apache Hadoop, which is a flexible framework providing distributed, scalable, fault tolerant storage and parallel computational modules, and HBase, which is a NoSQL database built atop Hadoop's distributed file system. Despite this promise, HBase's load distribution strategy of region split and merge is detrimental to the hierarchical organization of imaging data (e.g., project, subject, session, scan, slice). This paper makes two contributions to address these concerns by describing key cloud engineering principles and technology enhancements we made to the Apache Hadoop ecosystem for medical imaging applications. First, we propose a row-key design for HBase, which is a necessary step that is driven by the hierarchical organization of imaging data. Second, we propose a novel data allocation policy within HBase to strongly enforce collocation of hierarchically related imaging data. The proposed enhancements accelerate data processing by minimizing network usage and localizing processing to machines where the data already exist. Moreover, our approach is amenable to the traditional scan, subject, and project-level analysis procedures, and is compatible with standard command line/scriptable image processing software. Experimental results for an illustrative sample of imaging data reveals that our new HBase policy results in a three-fold time improvement in conversion of classic DICOM to NiFTI file formats when compared with the default HBase region split policy, and nearly a six-fold improvement over a commonly available network file system (NFS) approach even for relatively small file sets. Moreover, file access latency is lower than network attached storage.</p>","PeriodicalId":92127,"journal":{"name":"Proceedings of the IEEE International Conference on Cloud Engineering. IEEE International Conference on Cloud Engineering","volume":"2017 ","pages":"127-137"},"PeriodicalIF":0.0000,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5584067/pdf/nihms843798.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the IEEE International Conference on Cloud Engineering. IEEE International Conference on Cloud Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC2E.2017.23","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2017/5/11 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Traditional in-house, laboratory-based medical imaging studies use hierarchical data structures (e.g., NFS file stores) or databases (e.g., COINS, XNAT) for storage and retrieval. The resulting performance from these approaches is, however, impeded by standard network switches since they can saturate network bandwidth during transfer from storage to processing nodes for even moderate-sized studies. To that end, a cloud-based "medical image processing-as-a-service" offers promise in utilizing the ecosystem of Apache Hadoop, which is a flexible framework providing distributed, scalable, fault tolerant storage and parallel computational modules, and HBase, which is a NoSQL database built atop Hadoop's distributed file system. Despite this promise, HBase's load distribution strategy of region split and merge is detrimental to the hierarchical organization of imaging data (e.g., project, subject, session, scan, slice). This paper makes two contributions to address these concerns by describing key cloud engineering principles and technology enhancements we made to the Apache Hadoop ecosystem for medical imaging applications. First, we propose a row-key design for HBase, which is a necessary step that is driven by the hierarchical organization of imaging data. Second, we propose a novel data allocation policy within HBase to strongly enforce collocation of hierarchically related imaging data. The proposed enhancements accelerate data processing by minimizing network usage and localizing processing to machines where the data already exist. Moreover, our approach is amenable to the traditional scan, subject, and project-level analysis procedures, and is compatible with standard command line/scriptable image processing software. Experimental results for an illustrative sample of imaging data reveals that our new HBase policy results in a three-fold time improvement in conversion of classic DICOM to NiFTI file formats when compared with the default HBase region split policy, and nearly a six-fold improvement over a commonly available network file system (NFS) approach even for relatively small file sets. Moreover, file access latency is lower than network attached storage.

Abstract Image

查看原文本刊更多论文

医学图像处理即服务的云工程原理和技术手段。

传统的实验室内部医学成像研究使用分层数据结构（如 NFS 文件存储）或数据库（如 COINS、XNAT）进行存储和检索。然而，这些方法所产生的性能受到标准网络交换机的阻碍，因为即使是中等规模的研究，在从存储到处理节点的传输过程中，网络带宽也会达到饱和。Apache Hadoop 是一个灵活的框架，提供分布式、可扩展、容错存储和并行计算模块；HBase 是一个建立在 Hadoop 分布式文件系统之上的 NoSQL 数据库。尽管有这样的前景，但 HBase 的区域拆分和合并的负载分配策略不利于成像数据的分层组织（如项目、主题、会话、扫描、切片）。为了解决这些问题，本文介绍了关键的云工程原理以及我们针对医学影像应用对 Apache Hadoop 生态系统所做的技术改进。首先，我们提出了 HBase 的行键设计，这是由成像数据的分层组织驱动的必要步骤。其次，我们在 HBase 中提出了一种新颖的数据分配策略，以强力执行分层相关成像数据的搭配。所提出的增强功能可最大限度地减少网络使用量，并将处理工作本地化到已有数据的机器上，从而加快数据处理速度。此外，我们的方法适用于传统的扫描、主题和项目级分析程序，并与标准命令行/脚本图像处理软件兼容。对成像数据样本的实验结果表明，与默认的 HBase 区域分割策略相比，我们的新 HBase 策略将传统 DICOM 文件格式转换为 NiFTI 文件格式的时间缩短了三倍，即使对于相对较小的文件集，也比常用的网络文件系统（NFS）方法缩短了近六倍。此外，文件访问延迟也低于网络附加存储。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the IEEE International Conference on Cloud Engineering. IEEE International Conference on Cloud Engineering

自引率

0.00%

发文量