M-LAB: scheduling space exploration of multitasks on tiled deep learning accelerators

International Conference on Algorithms, Microchips and Network Applications Pub Date : 2024-06-08 DOI:10.1117/12.3032039

Bingya Zhang, Sheng Zhang

{"title":"M-LAB: scheduling space exploration of multitasks on tiled deep learning accelerators","authors":"Bingya Zhang, Sheng Zhang","doi":"10.1117/12.3032039","DOIUrl":null,"url":null,"abstract":"With the increasing commercialization of deep neural networks (DNN), there is a growing need for running multiple neural networks simultaneously on an accelerator. This creates a new space to explore the allocation of computing resources and the order of computation. However, the majority of current research in multi-DNN scheduling relies predominantly on newly developed accelerators or employs heuristic methods aimed primarily at reducing DRAM traffic, increasing throughput and improving Service Level Agreements (SLA) satisfaction. These approaches often lead to poor portability, incompatibility with other optimization methods, and markedly high energy consumption. In this paper, we introduce a novel scheduling framework, M-LAB, that all scheduling of data is at layer level instead of network level, which means our framework is compatible with the research of inter-layer scheduling, with significant improvement in energy consumption and speed. To facilitate layer-level scheduling, M-LAB eliminates the conventional network boundaries, transforming these dependencies into a layer-to-layer format. Subsequently, M-LAB explores the scheduling space by amalgamating inter-layer and intra-layer scheduling, which allows for a more nuanced and efficient scheduling strategy tailored to the specific needs of multiple neural networks. Compared with current works, M-LAB achieves 2.06x-4.85x speed-up and 2.27-4.12x cost reduction.","PeriodicalId":342847,"journal":{"name":"International Conference on Algorithms, Microchips and Network Applications","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Algorithms, Microchips and Network Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.3032039","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

With the increasing commercialization of deep neural networks (DNN), there is a growing need for running multiple neural networks simultaneously on an accelerator. This creates a new space to explore the allocation of computing resources and the order of computation. However, the majority of current research in multi-DNN scheduling relies predominantly on newly developed accelerators or employs heuristic methods aimed primarily at reducing DRAM traffic, increasing throughput and improving Service Level Agreements (SLA) satisfaction. These approaches often lead to poor portability, incompatibility with other optimization methods, and markedly high energy consumption. In this paper, we introduce a novel scheduling framework, M-LAB, that all scheduling of data is at layer level instead of network level, which means our framework is compatible with the research of inter-layer scheduling, with significant improvement in energy consumption and speed. To facilitate layer-level scheduling, M-LAB eliminates the conventional network boundaries, transforming these dependencies into a layer-to-layer format. Subsequently, M-LAB explores the scheduling space by amalgamating inter-layer and intra-layer scheduling, which allows for a more nuanced and efficient scheduling strategy tailored to the specific needs of multiple neural networks. Compared with current works, M-LAB achieves 2.06x-4.85x speed-up and 2.27-4.12x cost reduction.

查看原文本刊更多论文

M-LAB：瓦片式深度学习加速器上多任务的调度空间探索

随着深度神经网络（DNN）日益商业化，在加速器上同时运行多个神经网络的需求日益增长。这为探索计算资源的分配和计算顺序创造了新的空间。然而，目前在多神经网络调度方面的大部分研究主要依赖于新开发的加速器，或采用启发式方法，主要目的是减少 DRAM 流量、提高吞吐量和服务水平协议（SLA）满意度。这些方法往往导致可移植性差、与其他优化方法不兼容以及明显的高能耗。在本文中，我们引入了一种新的调度框架 M-LAB，所有数据的调度都是在层级而非网络层级进行的，这意味着我们的框架与层间调度的研究兼容，能耗和速度都有显著提高。为了促进层级调度，M-LAB 消除了传统的网络边界，将这些依赖关系转化为层对层的形式。随后，M-LAB 将层间调度和层内调度融合在一起，探索调度空间，从而根据多个神经网络的特定需求量身定制更细致、更高效的调度策略。与现有研究相比，M-LAB 的速度提高了 2.06 倍-4.85 倍，成本降低了 2.27 倍-4.12 倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Conference on Algorithms, Microchips and Network Applications

自引率

0.00%

发文量