- Book学术

发布求助

文献互助智能选刊最新文献

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2017-02-01 DOI:10.1109/HPCA.2017.55

Linghao Song, Xuehai Qian, Hai Helen Li, Yiran Chen

{"title":"PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning","authors":"Linghao Song, Xuehai Qian, Hai Helen Li, Yiran Chen","doi":"10.1109/HPCA.2017.55","DOIUrl":null,"url":null,"abstract":"Convolution neural networks (CNNs) are the heart of deep learning applications. Recent works PRIME [1] and ISAAC [2] demonstrated the promise of using resistive random access memory (ReRAM) to perform neural computations in memory. We found that training cannot be efficiently supported with the current schemes. First, they do not consider weight update and complex data dependency in training procedure. Second, ISAAC attempts to increase system throughput with a very deep pipeline. It is only beneficial when a large number of consecutive images can be fed into the architecture. In training, the notion of batch (e.g. 64) limits the number of images can be processed consecutively, because the images in the next batch need to be processed based on the updated weights. Third, the deep pipeline in ISAAC is vulnerable to pipeline bubbles and execution stall. In this paper, we present PipeLayer, a ReRAM-based PIM accelerator for CNNs that support both training and testing. We analyze data dependency and weight update in training algorithms and propose efficient pipeline to exploit inter-layer parallelism. To exploit intra-layer parallelism, we propose highly parallel design based on the notion of parallelism granularity and weight replication. With these design choices, PipeLayer enables the highly pipelined execution of both training and testing, without introducing the potential stalls in previous work. The experiment results show that, PipeLayer achieves the speedups of 42.45x compared with GPU platform on average. The average energy saving of PipeLayer compared with GPU implementation is 7.17x.","PeriodicalId":118950,"journal":{"name":"2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"146 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"570","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2017.55","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 570

摘要

卷积神经网络(cnn)是深度学习应用的核心。最近的工作PRIME[1]和ISAAC[2]展示了使用电阻随机存取存储器(ReRAM)在内存中执行神经计算的前景。我们发现现有的培训方案不能有效地支持培训。首先，在训练过程中没有考虑权重更新和复杂的数据依赖性。其次，ISAAC试图通过非常深的管道来增加系统吞吐量。只有当大量连续的图像可以被输入到架构中时，它才有用。在训练中，batch(例如64)的概念限制了可以连续处理的图像数量，因为下一个batch中的图像需要根据更新的权重进行处理。第三，ISAAC的深层管道容易出现管道气泡和执行失速。在本文中，我们提出了PipeLayer，一个基于reram的cnn PIM加速器，支持训练和测试。我们分析了训练算法中的数据依赖和权值更新，并提出了有效的管道来利用层间并行性。为了利用层内并行性，我们提出了基于并行度粒度和权值复制概念的高度并行设计。通过这些设计选择，PipeLayer可以高度流水线地执行训练和测试，而不会在之前的工作中引入潜在的停顿。实验结果表明，与GPU平台相比，PipeLayer的平均速度提高了42.45倍。与GPU实现相比，PipeLayer平均节能7.17倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning

Convolution neural networks (CNNs) are the heart of deep learning applications. Recent works PRIME [1] and ISAAC [2] demonstrated the promise of using resistive random access memory (ReRAM) to perform neural computations in memory. We found that training cannot be efficiently supported with the current schemes. First, they do not consider weight update and complex data dependency in training procedure. Second, ISAAC attempts to increase system throughput with a very deep pipeline. It is only beneficial when a large number of consecutive images can be fed into the architecture. In training, the notion of batch (e.g. 64) limits the number of images can be processed consecutively, because the images in the next batch need to be processed based on the updated weights. Third, the deep pipeline in ISAAC is vulnerable to pipeline bubbles and execution stall. In this paper, we present PipeLayer, a ReRAM-based PIM accelerator for CNNs that support both training and testing. We analyze data dependency and weight update in training algorithms and propose efficient pipeline to exploit inter-layer parallelism. To exploit intra-layer parallelism, we propose highly parallel design based on the notion of parallelism granularity and weight replication. With these design choices, PipeLayer enables the highly pipelined execution of both training and testing, without introducing the potential stalls in previous work. The experiment results show that, PipeLayer achieves the speedups of 42.45x compared with GPU platform on average. The average energy saving of PipeLayer compared with GPU implementation is 7.17x.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)

自引率

0.00%

发文量