CMDS: Cross-layer Dataflow Optimization for DNN Accelerators Exploiting Multi-bank Memories

Man Shi, Steven Colleman, Charlotte VanDeMieroop, Antony Joseph, M. Meijer, W. Dehaene, M. Verhelst
{"title":"CMDS: Cross-layer Dataflow Optimization for DNN Accelerators Exploiting Multi-bank Memories","authors":"Man Shi, Steven Colleman, Charlotte VanDeMieroop, Antony Joseph, M. Meijer, W. Dehaene, M. Verhelst","doi":"10.1109/ISQED57927.2023.10129330","DOIUrl":null,"url":null,"abstract":"Deep neural networks (DNN) use a wide range of network topologies to achieve high accuracy within diverse applications. This model diversity makes it impossible to identify a single \"dataflow\" (execution schedule) to perform optimally across all possible layers and network topologies. Several frameworks support the exploration of the best dataflow for a given DNN layer and hardware. However, switching the dataflow from one layer to the next layer within one DNN model can result in hardware inefficiencies stemming from memory data layout mismatch among the layers. Unfortunately, all existing frameworks treat each layer independently and typically model memories as black boxes (one large monolithic wide memory), which ignores the data layout and can not deal with the data layout dependencies of sequential layers. These frameworks are not capable of doing dataflow cross-layer optimization. This work, hence, aims at cross-layer dataflow optimization, taking the data dependency and data layout reshuffling overheads among layers into account. Additionally, we propose to exploit the multibank memories typically present in modern DNN accelerators towards efficiently reshuffling data to support more dataflow at low overhead. These innovations are supported through the Cross-layer Memory-aware Dataflow Scheduler (CMDS). CMDS can model DNN execution energy/latency while considering the different data layout requirements due to the varied optimal dataflow of layers. Compared with the state-of-the-art (SOTA), which performs layer-optimized memory-unaware scheduling, CMDS achieves up to 5.5× energy reduction and 1.35× latency reduction with negligible hardware cost.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 24th International Symposium on Quality Electronic Design (ISQED)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISQED57927.2023.10129330","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Deep neural networks (DNN) use a wide range of network topologies to achieve high accuracy within diverse applications. This model diversity makes it impossible to identify a single "dataflow" (execution schedule) to perform optimally across all possible layers and network topologies. Several frameworks support the exploration of the best dataflow for a given DNN layer and hardware. However, switching the dataflow from one layer to the next layer within one DNN model can result in hardware inefficiencies stemming from memory data layout mismatch among the layers. Unfortunately, all existing frameworks treat each layer independently and typically model memories as black boxes (one large monolithic wide memory), which ignores the data layout and can not deal with the data layout dependencies of sequential layers. These frameworks are not capable of doing dataflow cross-layer optimization. This work, hence, aims at cross-layer dataflow optimization, taking the data dependency and data layout reshuffling overheads among layers into account. Additionally, we propose to exploit the multibank memories typically present in modern DNN accelerators towards efficiently reshuffling data to support more dataflow at low overhead. These innovations are supported through the Cross-layer Memory-aware Dataflow Scheduler (CMDS). CMDS can model DNN execution energy/latency while considering the different data layout requirements due to the varied optimal dataflow of layers. Compared with the state-of-the-art (SOTA), which performs layer-optimized memory-unaware scheduling, CMDS achieves up to 5.5× energy reduction and 1.35× latency reduction with negligible hardware cost.
利用多库存储器的DNN加速器的跨层数据流优化
深度神经网络(DNN)使用广泛的网络拓扑结构,在不同的应用中实现高精度。这种模型的多样性使得不可能确定一个单一的“数据流”(执行计划),以便在所有可能的层和网络拓扑结构中实现最佳执行。有几个框架支持探索给定深度神经网络层和硬件的最佳数据流。然而,在一个DNN模型中,将数据流从一层切换到下一层可能会导致硬件效率低下,原因是各层之间的内存数据布局不匹配。不幸的是,所有现有的框架都独立对待每一层,并且通常将内存建模为黑盒(一个大的单片宽内存),这忽略了数据布局,无法处理顺序层的数据布局依赖关系。这些框架不能进行数据流跨层优化。因此,本工作的目标是跨层数据流优化,考虑了层间的数据依赖和数据布局重组开销。此外,我们建议利用现代深度神经网络加速器中通常存在的多库存储器来有效地重组数据,以低开销支持更多的数据流。这些创新通过跨层内存感知数据流调度器(CMDS)得到支持。CMDS可以对DNN执行能量/延迟进行建模,同时考虑由于层的最佳数据流不同而导致的不同数据布局要求。与执行层优化内存不感知调度的最先进(SOTA)相比,CMDS实现了高达5.5倍的能耗降低和1.35倍的延迟降低,而硬件成本可以忽略不计。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信