Cloud Engineering Principles and Technology Enablers for Medical Image Processing-as-a-Service.

Shunxing Bao, Andrew J Plassard, Bennett A Landman, Aniruddha Gokhale
{"title":"Cloud Engineering Principles and Technology Enablers for Medical Image Processing-as-a-Service.","authors":"Shunxing Bao, Andrew J Plassard, Bennett A Landman, Aniruddha Gokhale","doi":"10.1109/IC2E.2017.23","DOIUrl":null,"url":null,"abstract":"<p><p>Traditional in-house, laboratory-based medical imaging studies use hierarchical data structures (e.g., NFS file stores) or databases (e.g., COINS, XNAT) for storage and retrieval. The resulting performance from these approaches is, however, impeded by standard network switches since they can saturate network bandwidth during transfer from storage to processing nodes for even moderate-sized studies. To that end, a cloud-based \"medical image processing-as-a-service\" offers promise in utilizing the ecosystem of Apache Hadoop, which is a flexible framework providing distributed, scalable, fault tolerant storage and parallel computational modules, and HBase, which is a NoSQL database built atop Hadoop's distributed file system. Despite this promise, HBase's load distribution strategy of region split and merge is detrimental to the hierarchical organization of imaging data (e.g., project, subject, session, scan, slice). This paper makes two contributions to address these concerns by describing key cloud engineering principles and technology enhancements we made to the Apache Hadoop ecosystem for medical imaging applications. First, we propose a row-key design for HBase, which is a necessary step that is driven by the hierarchical organization of imaging data. Second, we propose a novel data allocation policy within HBase to strongly enforce collocation of hierarchically related imaging data. The proposed enhancements accelerate data processing by minimizing network usage and localizing processing to machines where the data already exist. Moreover, our approach is amenable to the traditional scan, subject, and project-level analysis procedures, and is compatible with standard command line/scriptable image processing software. Experimental results for an illustrative sample of imaging data reveals that our new HBase policy results in a three-fold time improvement in conversion of classic DICOM to NiFTI file formats when compared with the default HBase region split policy, and nearly a six-fold improvement over a commonly available network file system (NFS) approach even for relatively small file sets. Moreover, file access latency is lower than network attached storage.</p>","PeriodicalId":92127,"journal":{"name":"Proceedings of the IEEE International Conference on Cloud Engineering. IEEE International Conference on Cloud Engineering","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5584067/pdf/nihms843798.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the IEEE International Conference on Cloud Engineering. IEEE International Conference on Cloud Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC2E.2017.23","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2017/5/11 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Traditional in-house, laboratory-based medical imaging studies use hierarchical data structures (e.g., NFS file stores) or databases (e.g., COINS, XNAT) for storage and retrieval. The resulting performance from these approaches is, however, impeded by standard network switches since they can saturate network bandwidth during transfer from storage to processing nodes for even moderate-sized studies. To that end, a cloud-based "medical image processing-as-a-service" offers promise in utilizing the ecosystem of Apache Hadoop, which is a flexible framework providing distributed, scalable, fault tolerant storage and parallel computational modules, and HBase, which is a NoSQL database built atop Hadoop's distributed file system. Despite this promise, HBase's load distribution strategy of region split and merge is detrimental to the hierarchical organization of imaging data (e.g., project, subject, session, scan, slice). This paper makes two contributions to address these concerns by describing key cloud engineering principles and technology enhancements we made to the Apache Hadoop ecosystem for medical imaging applications. First, we propose a row-key design for HBase, which is a necessary step that is driven by the hierarchical organization of imaging data. Second, we propose a novel data allocation policy within HBase to strongly enforce collocation of hierarchically related imaging data. The proposed enhancements accelerate data processing by minimizing network usage and localizing processing to machines where the data already exist. Moreover, our approach is amenable to the traditional scan, subject, and project-level analysis procedures, and is compatible with standard command line/scriptable image processing software. Experimental results for an illustrative sample of imaging data reveals that our new HBase policy results in a three-fold time improvement in conversion of classic DICOM to NiFTI file formats when compared with the default HBase region split policy, and nearly a six-fold improvement over a commonly available network file system (NFS) approach even for relatively small file sets. Moreover, file access latency is lower than network attached storage.

Abstract Image

Abstract Image

Abstract Image

医学图像处理即服务的云工程原理和技术手段。
传统的实验室内部医学成像研究使用分层数据结构(如 NFS 文件存储)或数据库(如 COINS、XNAT)进行存储和检索。然而,这些方法所产生的性能受到标准网络交换机的阻碍,因为即使是中等规模的研究,在从存储到处理节点的传输过程中,网络带宽也会达到饱和。Apache Hadoop 是一个灵活的框架,提供分布式、可扩展、容错存储和并行计算模块;HBase 是一个建立在 Hadoop 分布式文件系统之上的 NoSQL 数据库。尽管有这样的前景,但 HBase 的区域拆分和合并的负载分配策略不利于成像数据的分层组织(如项目、主题、会话、扫描、切片)。为了解决这些问题,本文介绍了关键的云工程原理以及我们针对医学影像应用对 Apache Hadoop 生态系统所做的技术改进。首先,我们提出了 HBase 的行键设计,这是由成像数据的分层组织驱动的必要步骤。其次,我们在 HBase 中提出了一种新颖的数据分配策略,以强力执行分层相关成像数据的搭配。所提出的增强功能可最大限度地减少网络使用量,并将处理工作本地化到已有数据的机器上,从而加快数据处理速度。此外,我们的方法适用于传统的扫描、主题和项目级分析程序,并与标准命令行/脚本图像处理软件兼容。对成像数据样本的实验结果表明,与默认的 HBase 区域分割策略相比,我们的新 HBase 策略将传统 DICOM 文件格式转换为 NiFTI 文件格式的时间缩短了三倍,即使对于相对较小的文件集,也比常用的网络文件系统(NFS)方法缩短了近六倍。此外,文件访问延迟也低于网络附加存储。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信