软件定义的存储，用于使用deltaFS索引的海量目录进行快速轨迹查询

Proceedings of the 2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems Pub Date : 2017-11-12 DOI:10.1145/3149393.3149398

Qing Zheng, George Amvrosiadis, Saurabh Kadekodi, Garth A. Gibson, C. Cranor, B. Settlemyer, G. Grider, Fan Guo

{"title":"软件定义的存储，用于使用deltaFS索引的海量目录进行快速轨迹查询","authors":"Qing Zheng, George Amvrosiadis, Saurabh Kadekodi, Garth A. Gibson, C. Cranor, B. Settlemyer, G. Grider, Fan Guo","doi":"10.1145/3149393.3149398","DOIUrl":null,"url":null,"abstract":"In this paper we introduce the Indexed Massive Directory, a new technique for indexing data within DeltaFS. With its design as a scalable, server-less file system for HPC platforms, DeltaFS scales file system metadata performance with application scale. The Indexed Massive Directory is a novel extension to the DeltaFS data plane, enabling in-situ indexing of massive amounts of data written to a single directory simultaneously, and in an arbitrarily large number of files. We achieve this through a memory-efficient indexing mechanism for reordering and indexing data, and a log-structured storage layout to pack small writes into large log objects, all while ensuring compute node resources are used frugally. We demonstrate the efficiency of this indexing mechanism through VPIC, a widely-used simulation code that scales to trillions of particles. With DeltaFS, we modify VPIC to create a file for each particle to receive writes of that particle's output data. Dynamically indexing the directory's underlying storage allows us to achieve a 5000x speedup in single particle trajectory queries, which require reading all data for a single particle. This speedup increases with application scale while the overhead is fixed at 3% of available memory.","PeriodicalId":262458,"journal":{"name":"Proceedings of the 2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Software-defined storage for fast trajectory queries using a deltaFS indexed massive directory\",\"authors\":\"Qing Zheng, George Amvrosiadis, Saurabh Kadekodi, Garth A. Gibson, C. Cranor, B. Settlemyer, G. Grider, Fan Guo\",\"doi\":\"10.1145/3149393.3149398\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we introduce the Indexed Massive Directory, a new technique for indexing data within DeltaFS. With its design as a scalable, server-less file system for HPC platforms, DeltaFS scales file system metadata performance with application scale. The Indexed Massive Directory is a novel extension to the DeltaFS data plane, enabling in-situ indexing of massive amounts of data written to a single directory simultaneously, and in an arbitrarily large number of files. We achieve this through a memory-efficient indexing mechanism for reordering and indexing data, and a log-structured storage layout to pack small writes into large log objects, all while ensuring compute node resources are used frugally. We demonstrate the efficiency of this indexing mechanism through VPIC, a widely-used simulation code that scales to trillions of particles. With DeltaFS, we modify VPIC to create a file for each particle to receive writes of that particle's output data. Dynamically indexing the directory's underlying storage allows us to achieve a 5000x speedup in single particle trajectory queries, which require reading all data for a single particle. This speedup increases with application scale while the overhead is fixed at 3% of available memory.\",\"PeriodicalId\":262458,\"journal\":{\"name\":\"Proceedings of the 2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-11-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3149393.3149398\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3149393.3149398","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

摘要

本文介绍了DeltaFS中用于数据索引的一种新技术——索引海量目录。DeltaFS设计为可扩展的、无服务器的HPC平台文件系统，可以根据应用程序规模扩展文件系统元数据性能。索引大目录是DeltaFS数据平面的一种新颖扩展，支持对同时写入单个目录的大量数据以及任意数量的文件进行原位索引。我们通过一种内存高效的索引机制来重新排序和索引数据，以及一种日志结构化的存储布局来将小的写入打包到大的日志对象中，同时确保节省计算节点资源。我们通过VPIC证明了这种索引机制的效率，VPIC是一种广泛使用的模拟代码，可扩展到数万亿个粒子。使用DeltaFS，我们修改VPIC，为每个粒子创建一个文件，以接收对该粒子输出数据的写操作。动态索引目录的底层存储允许我们在单粒子轨迹查询中实现5000x的加速，这需要读取单个粒子的所有数据。当开销固定在可用内存的3%时，这种加速会随着应用程序的规模而增加。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Software-defined storage for fast trajectory queries using a deltaFS indexed massive directory

In this paper we introduce the Indexed Massive Directory, a new technique for indexing data within DeltaFS. With its design as a scalable, server-less file system for HPC platforms, DeltaFS scales file system metadata performance with application scale. The Indexed Massive Directory is a novel extension to the DeltaFS data plane, enabling in-situ indexing of massive amounts of data written to a single directory simultaneously, and in an arbitrarily large number of files. We achieve this through a memory-efficient indexing mechanism for reordering and indexing data, and a log-structured storage layout to pack small writes into large log objects, all while ensuring compute node resources are used frugally. We demonstrate the efficiency of this indexing mechanism through VPIC, a widely-used simulation code that scales to trillions of particles. With DeltaFS, we modify VPIC to create a file for each particle to receive writes of that particle's output data. Dynamically indexing the directory's underlying storage allows us to achieve a 5000x speedup in single particle trajectory queries, which require reading all data for a single particle. This speedup increases with application scale while the overhead is fixed at 3% of available memory.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems

自引率

0.00%

发文量