Yiyu Ni, Marine A. Denolle, Rob Fatland, Naomi Alterman, Bradley P. Lipovsky, Friedrich Knuth
{"title":"An Object Storage for Distributed Acoustic Sensing","authors":"Yiyu Ni, Marine A. Denolle, Rob Fatland, Naomi Alterman, Bradley P. Lipovsky, Friedrich Knuth","doi":"10.1785/0220230172","DOIUrl":null,"url":null,"abstract":"Abstract Large-scale processing and dissemination of distributed acoustic sensing (DAS) data are among the greatest computational challenges and opportunities of seismological research today. Current data formats and computing infrastructure are not well-adapted or user-friendly for large-scale processing. We propose an innovative, cloud-native solution for DAS seismology using the MinIO open-source object storage framework. We develop data schema for cloud-optimized data formats—Zarr and TileDB, which we deploy on a local object storage service compatible with the Amazon Web Services (AWS) storage system. We benchmark reading and writing performance for various data schema using canonical use cases in seismology. We test our framework on a local server and AWS. We find much-improved performance in compute time and memory throughout when using TileDB and Zarr compared to the conventional HDF5 data format. We demonstrate the platform with a computing heavy use case in seismology: ambient noise seismology of DAS data. We process one month of data, pairing all 2089 channels within 24 hr using AWS Batch autoscaling.","PeriodicalId":21687,"journal":{"name":"Seismological Research Letters","volume":"16 1","pages":"0"},"PeriodicalIF":2.6000,"publicationDate":"2023-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Seismological Research Letters","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1785/0220230172","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GEOCHEMISTRY & GEOPHYSICS","Score":null,"Total":0}
引用次数: 0
Abstract
Abstract Large-scale processing and dissemination of distributed acoustic sensing (DAS) data are among the greatest computational challenges and opportunities of seismological research today. Current data formats and computing infrastructure are not well-adapted or user-friendly for large-scale processing. We propose an innovative, cloud-native solution for DAS seismology using the MinIO open-source object storage framework. We develop data schema for cloud-optimized data formats—Zarr and TileDB, which we deploy on a local object storage service compatible with the Amazon Web Services (AWS) storage system. We benchmark reading and writing performance for various data schema using canonical use cases in seismology. We test our framework on a local server and AWS. We find much-improved performance in compute time and memory throughout when using TileDB and Zarr compared to the conventional HDF5 data format. We demonstrate the platform with a computing heavy use case in seismology: ambient noise seismology of DAS data. We process one month of data, pairing all 2089 channels within 24 hr using AWS Batch autoscaling.
分布式声传感(DAS)数据的大规模处理和传播是当今地震研究中最大的计算挑战和机遇之一。当前的数据格式和计算基础设施不能很好地适应大规模处理或对用户友好。我们使用MinIO开源对象存储框架为DAS地震学提出了一种创新的云原生解决方案。我们为云优化的数据格式——zarr和TileDB开发了数据模式,并将其部署在与Amazon Web Services (AWS)存储系统兼容的本地对象存储服务上。我们使用地震学中的规范用例对各种数据模式的读写性能进行基准测试。我们在本地服务器和AWS上测试我们的框架。我们发现,与传统的HDF5数据格式相比,使用TileDB和Zarr在计算时间和内存方面有了很大的提高。我们用一个计算量大的地震学用例来演示该平台:DAS数据的环境噪声地震学。我们处理一个月的数据,使用AWS批处理自动缩放在24小时内配对所有2089个通道。