A virtual filesystem approach to storage, analysis and delivery of volumetric image data for connectomics

Arthur W. Wetzel, Greg Hood, A. Ropelewski
{"title":"A virtual filesystem approach to storage, analysis and delivery of volumetric image data for connectomics","authors":"Arthur W. Wetzel, Greg Hood, A. Ropelewski","doi":"10.1109/AIPR.2017.8457951","DOIUrl":null,"url":null,"abstract":"Biology and medicine are increasingly driven by analyses of 3D and time series imagery for studies that are not possible with 2D images. Structural data required for building spatially realistic cell and connectomics models are particularly demanding of both resolution and spatial extent. Image capture methods for optical and electron microscopy at gigapixel per second rates are now routine. In combination these factors can currently produce hundreds of terabytes per specimen at data densities up to 1 PB per cubic mm of tissue. New techniques are needed to economically handle these speeds and data scales and to distribute results for on-demand analyses by researchers and students nationwide. A virtual volume file system (VVFS) approach to these problems is suggested by trends in the economics of computation and data storage along with typical data access patterns. In recent years improvements in the speed and cost of computation have dramatically outpaced gains in storage cost and performance. This is particularly true in GPGPU computation where data bandwidth is often the limiting factor for overall throughput. The essence of this VVFS mechanism is to apply on-the-fly computation to replace redundant data storage in critical operations such as registration, rendering and automated recognition. This is accomplished using the Linux Filesystem in UserSpace (FUSE) mechanism to provide file compatible interfaces to programs that operate from data files. This interface produces the appropriate content on-demand as applications such as TensorFlow or other analysis systems access the virtual files. The VVFS provides a flexible framework for connecting multiple program units into large scale applications while also reducing redundant data storage. By moving computation directly into the access path it minimizes data traffic while processing only those parts of the virtual data which end user applications consume.","PeriodicalId":128779,"journal":{"name":"2017 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AIPR.2017.8457951","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Biology and medicine are increasingly driven by analyses of 3D and time series imagery for studies that are not possible with 2D images. Structural data required for building spatially realistic cell and connectomics models are particularly demanding of both resolution and spatial extent. Image capture methods for optical and electron microscopy at gigapixel per second rates are now routine. In combination these factors can currently produce hundreds of terabytes per specimen at data densities up to 1 PB per cubic mm of tissue. New techniques are needed to economically handle these speeds and data scales and to distribute results for on-demand analyses by researchers and students nationwide. A virtual volume file system (VVFS) approach to these problems is suggested by trends in the economics of computation and data storage along with typical data access patterns. In recent years improvements in the speed and cost of computation have dramatically outpaced gains in storage cost and performance. This is particularly true in GPGPU computation where data bandwidth is often the limiting factor for overall throughput. The essence of this VVFS mechanism is to apply on-the-fly computation to replace redundant data storage in critical operations such as registration, rendering and automated recognition. This is accomplished using the Linux Filesystem in UserSpace (FUSE) mechanism to provide file compatible interfaces to programs that operate from data files. This interface produces the appropriate content on-demand as applications such as TensorFlow or other analysis systems access the virtual files. The VVFS provides a flexible framework for connecting multiple program units into large scale applications while also reducing redundant data storage. By moving computation directly into the access path it minimizes data traffic while processing only those parts of the virtual data which end user applications consume.
一种虚拟文件系统方法,用于存储,分析和传输连接组的体积图像数据
生物学和医学越来越多地受到3D和时间序列图像分析的推动,这些研究无法用2D图像进行。构建空间逼真的细胞和连接组模型所需的结构数据对分辨率和空间范围的要求特别高。光学和电子显微镜的图像捕获方法以每秒十亿像素的速率现在是常规的。综合这些因素,目前每个标本可以产生数百tb的数据密度,每立方毫米组织的数据密度高达1 PB。需要新的技术来经济地处理这些速度和数据规模,并将结果分发给全国的研究人员和学生按需分析。根据计算和数据存储的经济趋势以及典型的数据访问模式,建议使用虚拟卷文件系统(VVFS)方法来解决这些问题。近年来,计算速度和成本的提高大大超过了存储成本和性能的提高。在GPGPU计算中尤其如此,其中数据带宽通常是总体吞吐量的限制因素。这种VVFS机制的本质是应用动态计算来取代关键操作中的冗余数据存储,如注册、渲染和自动识别。这是通过使用用户空间中的Linux文件系统(FUSE)机制来实现的,该机制为从数据文件操作的程序提供文件兼容接口。当TensorFlow等应用程序或其他分析系统访问虚拟文件时,该接口按需生成适当的内容。VVFS为将多个程序单元连接到大规模应用程序中提供了一个灵活的框架,同时也减少了冗余的数据存储。通过将计算直接移动到访问路径中,它可以最小化数据流量,同时只处理最终用户应用程序所使用的虚拟数据的那些部分。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信