M-LAB: scheduling space exploration of multitasks on tiled deep learning accelerators

Bingya Zhang, Sheng Zhang
{"title":"M-LAB: scheduling space exploration of multitasks on tiled deep learning accelerators","authors":"Bingya Zhang, Sheng Zhang","doi":"10.1117/12.3032039","DOIUrl":null,"url":null,"abstract":"With the increasing commercialization of deep neural networks (DNN), there is a growing need for running multiple neural networks simultaneously on an accelerator. This creates a new space to explore the allocation of computing resources and the order of computation. However, the majority of current research in multi-DNN scheduling relies predominantly on newly developed accelerators or employs heuristic methods aimed primarily at reducing DRAM traffic, increasing throughput and improving Service Level Agreements (SLA) satisfaction. These approaches often lead to poor portability, incompatibility with other optimization methods, and markedly high energy consumption. In this paper, we introduce a novel scheduling framework, M-LAB, that all scheduling of data is at layer level instead of network level, which means our framework is compatible with the research of inter-layer scheduling, with significant improvement in energy consumption and speed. To facilitate layer-level scheduling, M-LAB eliminates the conventional network boundaries, transforming these dependencies into a layer-to-layer format. Subsequently, M-LAB explores the scheduling space by amalgamating inter-layer and intra-layer scheduling, which allows for a more nuanced and efficient scheduling strategy tailored to the specific needs of multiple neural networks. Compared with current works, M-LAB achieves 2.06x-4.85x speed-up and 2.27-4.12x cost reduction.","PeriodicalId":342847,"journal":{"name":"International Conference on Algorithms, Microchips and Network Applications","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Algorithms, Microchips and Network Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.3032039","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

With the increasing commercialization of deep neural networks (DNN), there is a growing need for running multiple neural networks simultaneously on an accelerator. This creates a new space to explore the allocation of computing resources and the order of computation. However, the majority of current research in multi-DNN scheduling relies predominantly on newly developed accelerators or employs heuristic methods aimed primarily at reducing DRAM traffic, increasing throughput and improving Service Level Agreements (SLA) satisfaction. These approaches often lead to poor portability, incompatibility with other optimization methods, and markedly high energy consumption. In this paper, we introduce a novel scheduling framework, M-LAB, that all scheduling of data is at layer level instead of network level, which means our framework is compatible with the research of inter-layer scheduling, with significant improvement in energy consumption and speed. To facilitate layer-level scheduling, M-LAB eliminates the conventional network boundaries, transforming these dependencies into a layer-to-layer format. Subsequently, M-LAB explores the scheduling space by amalgamating inter-layer and intra-layer scheduling, which allows for a more nuanced and efficient scheduling strategy tailored to the specific needs of multiple neural networks. Compared with current works, M-LAB achieves 2.06x-4.85x speed-up and 2.27-4.12x cost reduction.
M-LAB:瓦片式深度学习加速器上多任务的调度空间探索
随着深度神经网络(DNN)日益商业化,在加速器上同时运行多个神经网络的需求日益增长。这为探索计算资源的分配和计算顺序创造了新的空间。然而,目前在多神经网络调度方面的大部分研究主要依赖于新开发的加速器,或采用启发式方法,主要目的是减少 DRAM 流量、提高吞吐量和服务水平协议(SLA)满意度。这些方法往往导致可移植性差、与其他优化方法不兼容以及明显的高能耗。在本文中,我们引入了一种新的调度框架 M-LAB,所有数据的调度都是在层级而非网络层级进行的,这意味着我们的框架与层间调度的研究兼容,能耗和速度都有显著提高。为了促进层级调度,M-LAB 消除了传统的网络边界,将这些依赖关系转化为层对层的形式。随后,M-LAB 将层间调度和层内调度融合在一起,探索调度空间,从而根据多个神经网络的特定需求量身定制更细致、更高效的调度策略。与现有研究相比,M-LAB 的速度提高了 2.06 倍-4.85 倍,成本降低了 2.27 倍-4.12 倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信