XMA: a crossbar-aware multi-task adaption framework via shift-based mask learning method

Proceedings of the 59th ACM/IEEE Design Automation Conference Pub Date : 2022-07-10 DOI:10.1145/3489517.3530458

Fan Zhang, Li Yang, Jian Meng, Jae-sun Seo, Yu Cao, Deliang Fan

{"title":"XMA: a crossbar-aware multi-task adaption framework via shift-based mask learning method","authors":"Fan Zhang, Li Yang, Jian Meng, Jae-sun Seo, Yu Cao, Deliang Fan","doi":"10.1145/3489517.3530458","DOIUrl":null,"url":null,"abstract":"ReRAM crossbar array as a high-parallel fast and energy-efficient structure attracts much attention, especially on the acceleration of Deep Neural Network (DNN) inference on one specific task. However, due to the high energy consumption of weight re-programming and the ReRAM cells' low endurance problem, adapting the crossbar array for multiple tasks has not been well explored. In this paper, we propose XMA, a novel crossbar-aware shift-based mask learning method for multiple task adaption in the ReRAM crossbar DNN accelerator for the first time. XMA leverages the popular mask-based learning algorithm's benefit to mitigate catastrophic forgetting and learn a task-specific, crossbar column-wise, and shift-based multi-level mask, rather than the most commonly used element-wise binary mask, for each new task based on a frozen backbone model. With our crossbar-aware design innovation, the required masking operation to adapt for a new task could be implemented in an existing crossbar-based convolution engine with minimal hardware/memory overhead and, more importantly, no need for power-hungry cell re-programming, unlike prior works. The extensive experimental results show that, compared with state-of-the-art multiple task adaption Piggyback method [1], XMA achieves 3.19% higher accuracy on average, while saving 96.6% memory overhead. Moreover, by eliminating cell re-programming, XMA achieves ~4.3x higher energy efficiency than Piggyback.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 59th ACM/IEEE Design Automation Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3489517.3530458","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

ReRAM crossbar array as a high-parallel fast and energy-efficient structure attracts much attention, especially on the acceleration of Deep Neural Network (DNN) inference on one specific task. However, due to the high energy consumption of weight re-programming and the ReRAM cells' low endurance problem, adapting the crossbar array for multiple tasks has not been well explored. In this paper, we propose XMA, a novel crossbar-aware shift-based mask learning method for multiple task adaption in the ReRAM crossbar DNN accelerator for the first time. XMA leverages the popular mask-based learning algorithm's benefit to mitigate catastrophic forgetting and learn a task-specific, crossbar column-wise, and shift-based multi-level mask, rather than the most commonly used element-wise binary mask, for each new task based on a frozen backbone model. With our crossbar-aware design innovation, the required masking operation to adapt for a new task could be implemented in an existing crossbar-based convolution engine with minimal hardware/memory overhead and, more importantly, no need for power-hungry cell re-programming, unlike prior works. The extensive experimental results show that, compared with state-of-the-art multiple task adaption Piggyback method [1], XMA achieves 3.19% higher accuracy on average, while saving 96.6% memory overhead. Moreover, by eliminating cell re-programming, XMA achieves ~4.3x higher energy efficiency than Piggyback.

查看原文本刊更多论文

基于移位掩模学习方法的跨栏感知多任务自适应框架

ReRAM交叉棒阵列作为一种高并行、快速、节能的结构受到了广泛的关注，特别是在加速深度神经网络对特定任务的推理方面。然而，由于重量重编程的高能量消耗和ReRAM单元的低耐力问题，使交叉杆阵列适应多任务尚未得到很好的探索。本文首次在ReRAM交叉棒深度神经网络加速器中提出了一种新的基于交叉棒感知位移的掩模学习方法XMA，用于多任务自适应。XMA利用流行的基于掩码的学习算法的优点来减轻灾难性的遗忘，并为每个基于固定主干模型的新任务学习特定于任务的、跨栏的、基于列的和基于移位的多级掩码，而不是最常用的基于元素的二进制掩码。通过我们的交叉棒感知设计创新，适应新任务所需的掩蔽操作可以在现有的基于交叉棒的卷积引擎中实现，硬件/内存开销最小，更重要的是，不需要像以前的工作那样需要耗电的单元重新编程。大量的实验结果表明，与目前最先进的多任务自适应Piggyback方法[1]相比，XMA的准确率平均提高了3.19%，内存开销节省了96.6%。此外，通过消除细胞重编程，XMA实现了比Piggyback高4.3倍的能源效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 59th ACM/IEEE Design Automation Conference

自引率

0.00%

发文量