DStore: A Lightweight Scalable Learning Model Repository with Fine-Grain Tensor-Level Access

Proceedings of the 37th International Conference on Supercomputing Pub Date : 2023-06-21 DOI:10.1145/3577193.3593730

Meghana Madhyastha, Robert Underwood, R. Burns, Bogdan Nicolae

{"title":"DStore: A Lightweight Scalable Learning Model Repository with Fine-Grain Tensor-Level Access","authors":"Meghana Madhyastha, Robert Underwood, R. Burns, Bogdan Nicolae","doi":"10.1145/3577193.3593730","DOIUrl":null,"url":null,"abstract":"The ability to share and reuse deep learning (DL) models is a key driver that facilitates the rapid adoption of artificial intelligence (AI) in both industrial and scientific applications. However, state-of-the-art approaches to store and access DL models efficiently at scale lag behind. Most often, DL models are serialized by using various formats (e.g., HDF5, SavedModel) and stored as files on POSIX file systems. While simple and portable, such an approach exhibits high serialization and I/O overheads, especially under concurrency. Additionally, the emergence of advanced AI techniques (transfer learning, sensitivity analysis, explainability, etc.) introduces the need for fine-grained access to tensors to facilitate the extraction and reuse of individual or subsets of tensors. Such patterns are underserved by state-of-the-art approaches. Requiring tensors to be read in bulk incurs suboptimal performance, scales poorly, and/or overutilizes network bandwidth. In this paper we propose a lightweight, distributed, RDMA-enabled learning model repository that addresses these challenges. Specifically we introduce several ideas: compact architecture graph representation with stable hashing and client-side metadata caching, scalable load balancing on multiple providers, RDMA-optimized data staging, and direct access to raw tensor data. We evaluate our proposal in extensive experiments that involve different access patterns using learning models of diverse shapes and sizes. Our evaluations show a significant improvement (between 2 and 30× over a variety of state-of-the-art model storage approaches while scaling to half the Cooley cluster at the Argonne Leadership Computing Facility.","PeriodicalId":424155,"journal":{"name":"Proceedings of the 37th International Conference on Supercomputing","volume":"76 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 37th International Conference on Supercomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3577193.3593730","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

The ability to share and reuse deep learning (DL) models is a key driver that facilitates the rapid adoption of artificial intelligence (AI) in both industrial and scientific applications. However, state-of-the-art approaches to store and access DL models efficiently at scale lag behind. Most often, DL models are serialized by using various formats (e.g., HDF5, SavedModel) and stored as files on POSIX file systems. While simple and portable, such an approach exhibits high serialization and I/O overheads, especially under concurrency. Additionally, the emergence of advanced AI techniques (transfer learning, sensitivity analysis, explainability, etc.) introduces the need for fine-grained access to tensors to facilitate the extraction and reuse of individual or subsets of tensors. Such patterns are underserved by state-of-the-art approaches. Requiring tensors to be read in bulk incurs suboptimal performance, scales poorly, and/or overutilizes network bandwidth. In this paper we propose a lightweight, distributed, RDMA-enabled learning model repository that addresses these challenges. Specifically we introduce several ideas: compact architecture graph representation with stable hashing and client-side metadata caching, scalable load balancing on multiple providers, RDMA-optimized data staging, and direct access to raw tensor data. We evaluate our proposal in extensive experiments that involve different access patterns using learning models of diverse shapes and sizes. Our evaluations show a significant improvement (between 2 and 30× over a variety of state-of-the-art model storage approaches while scaling to half the Cooley cluster at the Argonne Leadership Computing Facility.

查看原文本刊更多论文

DStore:具有细粒度张量级访问的轻量级可扩展学习模型存储库

共享和重用深度学习(DL)模型的能力是促进人工智能(AI)在工业和科学应用中快速采用的关键驱动因素。然而，最先进的大规模存储和访问深度学习模型的方法落后了。大多数情况下，DL模型通过使用各种格式(例如HDF5, SavedModel)进行序列化，并作为文件存储在POSIX文件系统上。虽然简单且可移植，但这种方法显示出很高的序列化和I/O开销，特别是在并发性下。此外，先进的人工智能技术(迁移学习、敏感性分析、可解释性等)的出现引入了对细粒度访问张量的需求，以促进张量的单个或子集的提取和重用。最先进的方法无法满足这种模式。要求大量读取张量会导致性能不佳、可伸缩性差和/或过度使用网络带宽。在本文中，我们提出了一个轻量级的、分布式的、支持rdma的学习模型存储库来解决这些挑战。具体来说，我们介绍了几个想法:紧凑的架构图表示，稳定的散列和客户端元数据缓存，多个提供者上的可扩展负载平衡，rdma优化的数据分段，以及直接访问原始张量数据。我们在广泛的实验中评估了我们的建议，这些实验涉及使用不同形状和大小的学习模型的不同访问模式。我们的评估显示，在扩展到阿贡领导计算设施的Cooley集群的一半时，与各种最先进的模型存储方法相比，有了显着的改进(在2到30倍之间)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 37th International Conference on Supercomputing

自引率

0.00%

发文量