Streaming Large-Scale Microscopy Data to a Supercomputing Facility.

IF 2.9 4区 工程技术 Q3 MATERIALS SCIENCE, MULTIDISCIPLINARY
Samuel S Welborn, Chris Harris, Stephanie M Ribet, Georgios Varnavides, Colin Ophus, Bjoern Enders, Peter Ercius
{"title":"Streaming Large-Scale Microscopy Data to a Supercomputing Facility.","authors":"Samuel S Welborn, Chris Harris, Stephanie M Ribet, Georgios Varnavides, Colin Ophus, Bjoern Enders, Peter Ercius","doi":"10.1093/mam/ozae109","DOIUrl":null,"url":null,"abstract":"<p><p>Data management is a critical component of modern experimental workflows. As data generation rates increase, transferring data from acquisition servers to processing servers via conventional file-based methods is becoming increasingly impractical. The 4D Camera at the National Center for Electron Microscopy generates data at a nominal rate of 480 Gbit s-1 (87,000 frames s-1), producing a 700 GB dataset in 15 s. To address the challenges associated with storing and processing such quantities of data, we developed a streaming workflow that utilizes a high-speed network to connect the 4D Camera's data acquisition system to supercomputing nodes at the National Energy Research Scientific Computing Center, bypassing intermediate file storage entirely. In this work, we demonstrate the effectiveness of our streaming pipeline in a production setting through an hour-long experiment that generated over 10 TB of raw data, yielding high-quality datasets suitable for advanced analyses. Additionally, we compare the efficacy of this streaming workflow against the conventional file-transfer workflow by conducting a postmortem analysis on historical data from experiments performed by real users. Our findings show that the streaming workflow significantly improves data turnaround time, enables real-time decision-making, and minimizes the potential for human error by eliminating manual user interactions.</p>","PeriodicalId":18625,"journal":{"name":"Microscopy and Microanalysis","volume":" ","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Microscopy and Microanalysis","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1093/mam/ozae109","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATERIALS SCIENCE, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Data management is a critical component of modern experimental workflows. As data generation rates increase, transferring data from acquisition servers to processing servers via conventional file-based methods is becoming increasingly impractical. The 4D Camera at the National Center for Electron Microscopy generates data at a nominal rate of 480 Gbit s-1 (87,000 frames s-1), producing a 700 GB dataset in 15 s. To address the challenges associated with storing and processing such quantities of data, we developed a streaming workflow that utilizes a high-speed network to connect the 4D Camera's data acquisition system to supercomputing nodes at the National Energy Research Scientific Computing Center, bypassing intermediate file storage entirely. In this work, we demonstrate the effectiveness of our streaming pipeline in a production setting through an hour-long experiment that generated over 10 TB of raw data, yielding high-quality datasets suitable for advanced analyses. Additionally, we compare the efficacy of this streaming workflow against the conventional file-transfer workflow by conducting a postmortem analysis on historical data from experiments performed by real users. Our findings show that the streaming workflow significantly improves data turnaround time, enables real-time decision-making, and minimizes the potential for human error by eliminating manual user interactions.

将大规模显微镜数据流传输到超级计算设施。
数据管理是现代实验工作流程的重要组成部分。随着数据生成率的提高,通过传统的基于文件的方法将数据从采集服务器传输到处理服务器变得越来越不切实际。为了解决存储和处理如此大量数据所带来的挑战,我们开发了一种流式工作流程,利用高速网络将 4D 相机的数据采集系统连接到国家能源研究科学计算中心的超级计算节点,完全绕过了中间文件存储。在这项工作中,我们通过一个小时的实验展示了我们的流式管道在生产环境中的有效性,该实验产生了超过 10 TB 的原始数据,并生成了适合高级分析的高质量数据集。此外,我们还通过对真实用户实验的历史数据进行事后分析,比较了流式工作流与传统文件传输工作流的功效。我们的研究结果表明,流式工作流程显著改善了数据周转时间,实现了实时决策,并通过消除手动用户交互最大限度地减少了人为错误的可能性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Microscopy and Microanalysis
Microscopy and Microanalysis 工程技术-材料科学:综合
CiteScore
1.10
自引率
10.70%
发文量
1391
审稿时长
6 months
期刊介绍: Microscopy and Microanalysis publishes original research papers in the fields of microscopy, imaging, and compositional analysis. This distinguished international forum is intended for microscopists in both biology and materials science. The journal provides significant articles that describe new and existing techniques and instrumentation, as well as the applications of these to the imaging and analysis of microstructure. Microscopy and Microanalysis also includes review articles, letters to the editor, and book reviews.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信