Impact of the Array Shape and Memory Bandwidth on the Execution Time of CNN Systolic Arrays

Eduardo Yago, Pau Castelló, S. Petit, M. E. Gómez, J. Sahuquillo
{"title":"Impact of the Array Shape and Memory Bandwidth on the Execution Time of CNN Systolic Arrays","authors":"Eduardo Yago, Pau Castelló, S. Petit, M. E. Gómez, J. Sahuquillo","doi":"10.1109/DSD51259.2020.00086","DOIUrl":null,"url":null,"abstract":"The use of Convolutional Neural Networks (CNN) has experienced a huge rise over the last recent years and its popularity has increased exponentially, mainly due to its application both for image recognition and certain applications related to artificial intelligence. The new applications of CNN request computing demands that are difficult to address by conventional processors.As a consequence, accelerators –both prototypes and commercial products– focusing on CNN computation have been proposed. Among these accelerators, those based on systolic arrays have acquired a special relevance; some examples are the Google’s TPU and Eyeriss.Current research has focused on regular squared systolic arrays and most existing work assumes that there is enough memory bandwidth to feed the systolic array with input data. In this paper we explore the design of non-squared systolic arrays and address the impact of the memory bandwidth from a performance perspective.This work makes two main contributions. First, we found that some workloads with non-squared arrays achieve similar performance to systolic arrays twice as large, which can translate in area and/or energy benefits.Second, we present a performance comparison varying the main memory bandwidth for current DRAM devices. The analysis reveals that main memory bandwidth has a great impact on performance and that the decision of which technology use is key for the system performance. For the 64x64 array size it is necessary to use HBM2 memory to avoid the slowdown that would introduce cheaper technologies (e.g. DDR5 and DDR4).","PeriodicalId":128527,"journal":{"name":"2020 23rd Euromicro Conference on Digital System Design (DSD)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 23rd Euromicro Conference on Digital System Design (DSD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSD51259.2020.00086","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

The use of Convolutional Neural Networks (CNN) has experienced a huge rise over the last recent years and its popularity has increased exponentially, mainly due to its application both for image recognition and certain applications related to artificial intelligence. The new applications of CNN request computing demands that are difficult to address by conventional processors.As a consequence, accelerators –both prototypes and commercial products– focusing on CNN computation have been proposed. Among these accelerators, those based on systolic arrays have acquired a special relevance; some examples are the Google’s TPU and Eyeriss.Current research has focused on regular squared systolic arrays and most existing work assumes that there is enough memory bandwidth to feed the systolic array with input data. In this paper we explore the design of non-squared systolic arrays and address the impact of the memory bandwidth from a performance perspective.This work makes two main contributions. First, we found that some workloads with non-squared arrays achieve similar performance to systolic arrays twice as large, which can translate in area and/or energy benefits.Second, we present a performance comparison varying the main memory bandwidth for current DRAM devices. The analysis reveals that main memory bandwidth has a great impact on performance and that the decision of which technology use is key for the system performance. For the 64x64 array size it is necessary to use HBM2 memory to avoid the slowdown that would introduce cheaper technologies (e.g. DDR5 and DDR4).
阵列形状和内存带宽对CNN收缩阵列执行时间的影响
卷积神经网络(CNN)的使用在过去几年里经历了巨大的增长,其受欢迎程度呈指数级增长,主要是由于它在图像识别和某些与人工智能相关的应用中的应用。CNN的新应用要求传统处理器难以满足的计算需求。因此,已经提出了专注于CNN计算的加速器(包括原型和商业产品)。在这些加速器中,基于收缩阵列的加速器具有特殊的相关性;例如谷歌的TPU和Eyeriss。目前的研究主要集中在正则平方收缩阵列上,大多数现有的工作都假设有足够的内存带宽来为收缩阵列提供输入数据。在本文中,我们探讨了非平方收缩阵列的设计,并从性能角度解决了内存带宽的影响。这项工作有两个主要贡献。首先,我们发现一些使用非平方阵列的工作负载的性能与收缩阵列相似,收缩阵列是收缩阵列的两倍,这可以转化为面积和/或能量效益。其次,我们对当前DRAM设备的主存储器带宽进行了性能比较。分析表明,主存带宽对系统性能的影响很大,选用何种技术是决定系统性能好坏的关键。对于64x64阵列大小,有必要使用HBM2内存以避免引入更便宜的技术(例如DDR5和DDR4)的减速。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信