Work-in-Progress: DORY: Lightweight Memory Hierarchy Management for Deep NN Inference on IoT Endnodes

A. Burrello, Francesco Conti, Angelo Garofalo, D. Rossi, L. Benini
{"title":"Work-in-Progress: DORY: Lightweight Memory Hierarchy Management for Deep NN Inference on IoT Endnodes","authors":"A. Burrello, Francesco Conti, Angelo Garofalo, D. Rossi, L. Benini","doi":"10.1145/3349567.3351726","DOIUrl":null,"url":null,"abstract":"IoT endnodes often couple a small and fast L1 scratchpad memory with higher-capacity but lower bandwidth and speed L2 background memory. The absence of a coherent hardware cache hierarchy saves energy but comes at the cost of labor-intensive explicit memory management, complicating the deployment of algorithms with large data memory footprint, such as Deep Neural Network (DNN) inference. In this work, we present DORY, a lightweight software-cache dedicated to DNN Deployment Oriented to memoRY. DORY leverages static data tiling and DMA-based double buffering to hide the complexity of manual L1-L2 memory traffic management. DORY enables storage of activations and weights in L2 with less than 4% performance overhead with respect to direct execution in L1. We show that a 142 kB DNN achieving 79.9% on CIFAR-10 runs 3.2x faster compared to its execution directly from L2 memory while consuming 1.9x less energy.","PeriodicalId":194982,"journal":{"name":"2019 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3349567.3351726","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

IoT endnodes often couple a small and fast L1 scratchpad memory with higher-capacity but lower bandwidth and speed L2 background memory. The absence of a coherent hardware cache hierarchy saves energy but comes at the cost of labor-intensive explicit memory management, complicating the deployment of algorithms with large data memory footprint, such as Deep Neural Network (DNN) inference. In this work, we present DORY, a lightweight software-cache dedicated to DNN Deployment Oriented to memoRY. DORY leverages static data tiling and DMA-based double buffering to hide the complexity of manual L1-L2 memory traffic management. DORY enables storage of activations and weights in L2 with less than 4% performance overhead with respect to direct execution in L1. We show that a 142 kB DNN achieving 79.9% on CIFAR-10 runs 3.2x faster compared to its execution directly from L2 memory while consuming 1.9x less energy.
DORY:物联网终端节点上深度神经网络推理的轻量级内存层次管理
物联网终端节点通常将小而快速的L1刮刮板内存与更高容量但带宽和速度较低的L2后台内存相结合。缺乏一致的硬件缓存层次结构可以节省能源,但代价是劳动密集型的显式内存管理,使具有大数据内存占用的算法(如深度神经网络(DNN)推理)的部署复杂化。在这项工作中,我们提出了DORY,一个轻量级的软件缓存,专门用于面向内存的DNN部署。DORY利用静态数据平铺和基于dma的双缓冲来隐藏手动L1-L2内存流量管理的复杂性。DORY支持在L2中存储激活和权重,相对于在L1中直接执行而言,性能开销不到4%。我们表明,与直接从L2内存执行相比,在CIFAR-10上实现79.9%的142 kB DNN运行速度快3.2倍,消耗的能量少1.9倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信