Work-in-Progress: DORY: Lightweight Memory Hierarchy Management for Deep NN Inference on IoT Endnodes

2019 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS) Pub Date : 2019-10-01 DOI:10.1145/3349567.3351726

A. Burrello, Francesco Conti, Angelo Garofalo, D. Rossi, L. Benini

引用次数: 6

Abstract

IoT endnodes often couple a small and fast L1 scratchpad memory with higher-capacity but lower bandwidth and speed L2 background memory. The absence of a coherent hardware cache hierarchy saves energy but comes at the cost of labor-intensive explicit memory management, complicating the deployment of algorithms with large data memory footprint, such as Deep Neural Network (DNN) inference. In this work, we present DORY, a lightweight software-cache dedicated to DNN Deployment Oriented to memoRY. DORY leverages static data tiling and DMA-based double buffering to hide the complexity of manual L1-L2 memory traffic management. DORY enables storage of activations and weights in L2 with less than 4% performance overhead with respect to direct execution in L1. We show that a 142 kB DNN achieving 79.9% on CIFAR-10 runs 3.2x faster compared to its execution directly from L2 memory while consuming 1.9x less energy.

查看原文本刊更多论文

DORY:物联网终端节点上深度神经网络推理的轻量级内存层次管理

物联网终端节点通常将小而快速的L1刮刮板内存与更高容量但带宽和速度较低的L2后台内存相结合。缺乏一致的硬件缓存层次结构可以节省能源，但代价是劳动密集型的显式内存管理，使具有大数据内存占用的算法(如深度神经网络(DNN)推理)的部署复杂化。在这项工作中，我们提出了DORY，一个轻量级的软件缓存，专门用于面向内存的DNN部署。DORY利用静态数据平铺和基于dma的双缓冲来隐藏手动L1-L2内存流量管理的复杂性。DORY支持在L2中存储激活和权重，相对于在L1中直接执行而言，性能开销不到4%。我们表明，与直接从L2内存执行相比，在CIFAR-10上实现79.9%的142 kB DNN运行速度快3.2倍，消耗的能量少1.9倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)

自引率

0.00%

发文量