DORY:物联网终端节点上深度神经网络推理的轻量级内存层次管理

2019 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS) Pub Date : 2019-10-01 DOI:10.1145/3349567.3351726

A. Burrello, Francesco Conti, Angelo Garofalo, D. Rossi, L. Benini

{"title":"DORY:物联网终端节点上深度神经网络推理的轻量级内存层次管理","authors":"A. Burrello, Francesco Conti, Angelo Garofalo, D. Rossi, L. Benini","doi":"10.1145/3349567.3351726","DOIUrl":null,"url":null,"abstract":"IoT endnodes often couple a small and fast L1 scratchpad memory with higher-capacity but lower bandwidth and speed L2 background memory. The absence of a coherent hardware cache hierarchy saves energy but comes at the cost of labor-intensive explicit memory management, complicating the deployment of algorithms with large data memory footprint, such as Deep Neural Network (DNN) inference. In this work, we present DORY, a lightweight software-cache dedicated to DNN Deployment Oriented to memoRY. DORY leverages static data tiling and DMA-based double buffering to hide the complexity of manual L1-L2 memory traffic management. DORY enables storage of activations and weights in L2 with less than 4% performance overhead with respect to direct execution in L1. We show that a 142 kB DNN achieving 79.9% on CIFAR-10 runs 3.2x faster compared to its execution directly from L2 memory while consuming 1.9x less energy.","PeriodicalId":194982,"journal":{"name":"2019 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Work-in-Progress: DORY: Lightweight Memory Hierarchy Management for Deep NN Inference on IoT Endnodes\",\"authors\":\"A. Burrello, Francesco Conti, Angelo Garofalo, D. Rossi, L. Benini\",\"doi\":\"10.1145/3349567.3351726\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"IoT endnodes often couple a small and fast L1 scratchpad memory with higher-capacity but lower bandwidth and speed L2 background memory. The absence of a coherent hardware cache hierarchy saves energy but comes at the cost of labor-intensive explicit memory management, complicating the deployment of algorithms with large data memory footprint, such as Deep Neural Network (DNN) inference. In this work, we present DORY, a lightweight software-cache dedicated to DNN Deployment Oriented to memoRY. DORY leverages static data tiling and DMA-based double buffering to hide the complexity of manual L1-L2 memory traffic management. DORY enables storage of activations and weights in L2 with less than 4% performance overhead with respect to direct execution in L1. We show that a 142 kB DNN achieving 79.9% on CIFAR-10 runs 3.2x faster compared to its execution directly from L2 memory while consuming 1.9x less energy.\",\"PeriodicalId\":194982,\"journal\":{\"name\":\"2019 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3349567.3351726\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3349567.3351726","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

物联网终端节点通常将小而快速的L1刮刮板内存与更高容量但带宽和速度较低的L2后台内存相结合。缺乏一致的硬件缓存层次结构可以节省能源，但代价是劳动密集型的显式内存管理，使具有大数据内存占用的算法(如深度神经网络(DNN)推理)的部署复杂化。在这项工作中，我们提出了DORY，一个轻量级的软件缓存，专门用于面向内存的DNN部署。DORY利用静态数据平铺和基于dma的双缓冲来隐藏手动L1-L2内存流量管理的复杂性。DORY支持在L2中存储激活和权重，相对于在L1中直接执行而言，性能开销不到4%。我们表明，与直接从L2内存执行相比，在CIFAR-10上实现79.9%的142 kB DNN运行速度快3.2倍，消耗的能量少1.9倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Work-in-Progress: DORY: Lightweight Memory Hierarchy Management for Deep NN Inference on IoT Endnodes

IoT endnodes often couple a small and fast L1 scratchpad memory with higher-capacity but lower bandwidth and speed L2 background memory. The absence of a coherent hardware cache hierarchy saves energy but comes at the cost of labor-intensive explicit memory management, complicating the deployment of algorithms with large data memory footprint, such as Deep Neural Network (DNN) inference. In this work, we present DORY, a lightweight software-cache dedicated to DNN Deployment Oriented to memoRY. DORY leverages static data tiling and DMA-based double buffering to hide the complexity of manual L1-L2 memory traffic management. DORY enables storage of activations and weights in L2 with less than 4% performance overhead with respect to direct execution in L1. We show that a 142 kB DNN achieving 79.9% on CIFAR-10 runs 3.2x faster compared to its execution directly from L2 memory while consuming 1.9x less energy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)

自引率

0.00%

发文量