A cloud-based brain connectivity analysis tool

2017 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2017-09-01 DOI:10.1109/HPEC.2017.8091080

L. Brattain, Mihnea Bulugioiu, Adam Brewster, Mark Hernandez, Heejin Choi, T. Ku, Kwanghun Chung, V. Gadepally

{"title":"A cloud-based brain connectivity analysis tool","authors":"L. Brattain, Mihnea Bulugioiu, Adam Brewster, Mark Hernandez, Heejin Choi, T. Ku, Kwanghun Chung, V. Gadepally","doi":"10.1109/HPEC.2017.8091080","DOIUrl":null,"url":null,"abstract":"With advances in high throughput brain imaging at the cellular and sub-cellular level, there is growing demand for platforms that can support high performance, large-scale brain data processing and analysis. In this paper, we present a novel pipeline that combines Accumulo, D4M, geohashing, and parallel programming to manage large-scale neuron connectivity graphs in a cloud environment. Our brain connectivity graph is represented using vertices (fiber start/end nodes), edges (fiber tracks), and the 3D coordinates of the fiber tracks. For optimal performance, we take the hybrid approach of storing vertices and edges in Accumulo and saving the fiber track 3D coordinates in flat files. Accumulo database operations offer low latency on sparse queries while flat files offer high throughput for storing, querying, and analyzing bulk data. We evaluated our pipeline by using 250 gigabytes of mouse neuron connectivity data. Benchmarking experiments on retrieving vertices and edges from Accumulo demonstrate that we can achieve 1–2 orders of magnitude speedup in retrieval time when compared to the same operation from traditional flat files. The implementation of graph analytics such as Breadth First Search using Accumulo and D4M offers consistent good performance regardless of data size and density, thus is scalable to very large dataset. Indexing of neuron subvolumes is simple and logical with geohashing-based binary tree encoding. This hybrid data management backend is used to drive an interactive web-based 3D graphical user interface, where users can examine the 3D connectivity map in a Google Map-like viewer. Our pipeline is scalable and extensible to other data modalities.","PeriodicalId":364903,"journal":{"name":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPEC.2017.8091080","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

With advances in high throughput brain imaging at the cellular and sub-cellular level, there is growing demand for platforms that can support high performance, large-scale brain data processing and analysis. In this paper, we present a novel pipeline that combines Accumulo, D4M, geohashing, and parallel programming to manage large-scale neuron connectivity graphs in a cloud environment. Our brain connectivity graph is represented using vertices (fiber start/end nodes), edges (fiber tracks), and the 3D coordinates of the fiber tracks. For optimal performance, we take the hybrid approach of storing vertices and edges in Accumulo and saving the fiber track 3D coordinates in flat files. Accumulo database operations offer low latency on sparse queries while flat files offer high throughput for storing, querying, and analyzing bulk data. We evaluated our pipeline by using 250 gigabytes of mouse neuron connectivity data. Benchmarking experiments on retrieving vertices and edges from Accumulo demonstrate that we can achieve 1–2 orders of magnitude speedup in retrieval time when compared to the same operation from traditional flat files. The implementation of graph analytics such as Breadth First Search using Accumulo and D4M offers consistent good performance regardless of data size and density, thus is scalable to very large dataset. Indexing of neuron subvolumes is simple and logical with geohashing-based binary tree encoding. This hybrid data management backend is used to drive an interactive web-based 3D graphical user interface, where users can examine the 3D connectivity map in a Google Map-like viewer. Our pipeline is scalable and extensible to other data modalities.

查看原文本刊更多论文

基于云的大脑连接分析工具

随着细胞和亚细胞水平的高通量脑成像技术的进步，对能够支持高性能、大规模大脑数据处理和分析的平台的需求不断增长。在本文中，我们提出了一种新的管道，它结合了Accumulo、D4M、geohash和并行编程来管理云环境中的大规模神经元连接图。我们的大脑连接图是用顶点(光纤起始/结束节点)、边缘(光纤轨迹)和光纤轨迹的3D坐标来表示的。为了获得最佳性能，我们采用混合方法，将顶点和边缘存储在Accumulo中，并将光纤轨迹三维坐标保存在平面文件中。累加数据库操作为稀疏查询提供了低延迟，而平面文件为存储、查询和分析大量数据提供了高吞吐量。我们通过使用250g的小鼠神经元连接数据来评估我们的管道。从Accumulo中检索顶点和边缘的基准测试实验表明，与传统平面文件的相同操作相比，我们可以在检索时间上实现1-2个数量级的加速。使用Accumulo和D4M的广度优先搜索等图形分析的实现，无论数据大小和密度如何，都能提供一致的良好性能，因此可以扩展到非常大的数据集。使用基于geohash的二叉树编码对神经元子卷进行索引是简单而合乎逻辑的。这个混合数据管理后端用于驱动基于web的交互式3D图形用户界面，用户可以在类似Google地图的查看器中查看3D连接地图。我们的管道是可伸缩的，可扩展到其他数据模式。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 IEEE High Performance Extreme Computing Conference (HPEC)

自引率

0.00%

发文量