ResiRCA: A Resilient Energy Harvesting ReRAM Crossbar-Based Accelerator for Intelligent Embedded Processors

2020 IEEE International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2020-02-01 DOI:10.1109/HPCA47549.2020.00034

Keni Qiu, N. Jao, Mengying Zhao, Cyan Subhra Mishra, Gulsum Gudukbay, Sethu Jose, J. Sampson, M. Kandemir, N. Vijaykrishnan

{"title":"ResiRCA: A Resilient Energy Harvesting ReRAM Crossbar-Based Accelerator for Intelligent Embedded Processors","authors":"Keni Qiu, N. Jao, Mengying Zhao, Cyan Subhra Mishra, Gulsum Gudukbay, Sethu Jose, J. Sampson, M. Kandemir, N. Vijaykrishnan","doi":"10.1109/HPCA47549.2020.00034","DOIUrl":null,"url":null,"abstract":"Many recent works have shown substantial efficiency boosts from performing inference tasks on Internet of Things (IoT) nodes rather than merely transmitting raw sensor data. However, such tasks, e.g., convolutional neural networks (CNNs), are very compute intensive. They are therefore challenging to complete at sensing-matched latencies in ultra-low-power and energy-harvesting IoT nodes. ReRAM crossbar-based accelerators (RCAs) are an ideal candidate to perform the dominant multiplication-and-accumulation (MAC) operations in CNNs efficiently, but conventional, performance-oriented RCAs, while energy-efficient, are power hungry and ill-optimized for the intermittent and unstable power supply of energy-harvesting IoT nodes. This paper presents the ResiRCA architecture that integrates a new, lightweight, and configurable RCA suitable for energy harvesting environments as an opportunistically executing augmentation to a baseline sense-and-transmit battery-powered IoT node. To maximize ResiRCA throughput under different power levels, we develop the ResiSchedule approach for dynamic RCA reconfiguration. The proposed approach uses loop tiling-based computation decomposition, model duplication within the RCA, and inter-layer pipelining to reduce RCA activation thresholds and more closely track execution costs with dynamic power income. Experimental results show that ResiRCA together with ResiSchedule achieve average speedups and energy efficiency improvements of 8x and 14x respectively compared to a baseline RCA with intermittency-unaware scheduling.","PeriodicalId":339648,"journal":{"name":"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA47549.2020.00034","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 18

Abstract

Many recent works have shown substantial efficiency boosts from performing inference tasks on Internet of Things (IoT) nodes rather than merely transmitting raw sensor data. However, such tasks, e.g., convolutional neural networks (CNNs), are very compute intensive. They are therefore challenging to complete at sensing-matched latencies in ultra-low-power and energy-harvesting IoT nodes. ReRAM crossbar-based accelerators (RCAs) are an ideal candidate to perform the dominant multiplication-and-accumulation (MAC) operations in CNNs efficiently, but conventional, performance-oriented RCAs, while energy-efficient, are power hungry and ill-optimized for the intermittent and unstable power supply of energy-harvesting IoT nodes. This paper presents the ResiRCA architecture that integrates a new, lightweight, and configurable RCA suitable for energy harvesting environments as an opportunistically executing augmentation to a baseline sense-and-transmit battery-powered IoT node. To maximize ResiRCA throughput under different power levels, we develop the ResiSchedule approach for dynamic RCA reconfiguration. The proposed approach uses loop tiling-based computation decomposition, model duplication within the RCA, and inter-layer pipelining to reduce RCA activation thresholds and more closely track execution costs with dynamic power income. Experimental results show that ResiRCA together with ResiSchedule achieve average speedups and energy efficiency improvements of 8x and 14x respectively compared to a baseline RCA with intermittency-unaware scheduling.

查看原文本刊更多论文

一种用于智能嵌入式处理器的弹性能量收集ReRAM交叉棒加速器

最近的许多研究表明，通过在物联网(IoT)节点上执行推理任务，而不仅仅是传输原始传感器数据，可以大幅提高效率。然而，这样的任务，例如卷积神经网络(cnn)，是非常计算密集型的。因此，在超低功耗和能量收集物联网节点中以传感匹配的延迟完成它们是具有挑战性的。基于ReRAM交叉棒的加速器(rca)是cnn中高效执行主要乘法和累积(MAC)操作的理想候选，但传统的、面向性能的rca虽然节能，但功耗大，并且不适合能量收集物联网节点的间歇性和不稳定电源。本文介绍了resca架构，该架构集成了一个新的、轻量级的、可配置的RCA，适合于能量收集环境，作为对基线感知和传输电池供电的物联网节点的机会性执行增强。为了在不同功率水平下最大化resca吞吐量，我们开发了动态RCA重构的resca调度方法。该方法使用基于循环平铺的计算分解、RCA内部的模型复制和层间流水线来降低RCA激活阈值，并更密切地跟踪动态功率收入的执行成本。实验结果表明，与具有间歇不感知调度的基准RCA相比，resca和reschschedule的平均速度和能效分别提高了8倍和14倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)

自引率

0.00%

发文量