Michael James, Marvin Tom, P. Groeneveld, V. Kibardin
{"title":"ISPD 2020 Physical Mapping of Neural Networks on a Wafer-Scale Deep Learning Accelerator","authors":"Michael James, Marvin Tom, P. Groeneveld, V. Kibardin","doi":"10.1145/3372780.3380846","DOIUrl":null,"url":null,"abstract":"This paper introduces a special case of the floorplanning problem for optimizing neural networks to run on a wafer-scale computing engine. From a compute perspective, neural networks can be represented by a deeply layered structure of compute kernels. During the training of a neural network, gradient descent is used to determine the weight factors. Each layer then uses a local weight tensor to transform \"activations\" and \"gradients\" that are shared among connected kernels according to the topology of the network. This process is computationally intensive and requires high memory and communication bandwidth. Cerebras has developed a novel computer system designed for this work that is powered by a 21.5cm by 21.5cm wafer-scale processor with 400,000 programmable compute cores. It is structured as a regular array of 633 by 633 processing elements, each with its own local high bandwidth SRAM memory and direct high bandwidth connection to its neighboring cores. In addition to supporting traditional execution models for neural network training and inference, this engine has a unique capability to compile and compute every layer of a complete neural network simultaneously. Mapping a neural network in this fashion onto Cerebras' Wafer-Scale Engine (WSE) is reminiscent of the traditional floorplanning problem in physical design. A kernel ends up as a rectangle of x by y compute elements. These are the flexible blocks that need to be placed to optimize performance. This paper describes an ISPD 2020 challenge to develop algorithms and heuristics that produce compiled neural networks that achieve the highest possible performance on the Cerebras WSE.","PeriodicalId":151741,"journal":{"name":"Proceedings of the 2020 International Symposium on Physical Design","volume":"95 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 International Symposium on Physical Design","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3372780.3380846","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15

摘要

本文介绍了优化神经网络在晶圆级计算引擎上运行的平面规划问题的一个特例。从计算的角度来看,神经网络可以用计算核的深层结构来表示。在神经网络的训练过程中,采用梯度下降法确定权重因子。然后,每个层使用一个局部权重张量来转换“激活”和“梯度”,这些“激活”和“梯度”是根据网络的拓扑结构在连接的核之间共享的。这个过程计算量大,需要很高的内存和通信带宽。Cerebras公司为此开发了一种新型计算机系统,该系统由21.5cm × 21.5cm的晶圆级处理器驱动,拥有40万个可编程计算核心。它的结构是一个由633 × 633个处理元素组成的常规数组,每个处理元素都有自己的本地高带宽SRAM存储器,并直接高带宽连接到邻近的内核。除了支持神经网络训练和推理的传统执行模型外,该引擎还具有独特的能力,可以同时编译和计算完整神经网络的每一层。以这种方式将神经网络映射到Cerebras的晶圆级引擎(WSE)上,让人想起物理设计中的传统平面规划问题。核函数是一个x × y计算元素的矩形。这些是需要放置以优化性能的灵活块。本文描述了ISPD 2020的挑战,即开发算法和启发式算法,生成在Cerebras WSE上实现最高性能的编译神经网络。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
ISPD 2020 Physical Mapping of Neural Networks on a Wafer-Scale Deep Learning Accelerator
This paper introduces a special case of the floorplanning problem for optimizing neural networks to run on a wafer-scale computing engine. From a compute perspective, neural networks can be represented by a deeply layered structure of compute kernels. During the training of a neural network, gradient descent is used to determine the weight factors. Each layer then uses a local weight tensor to transform "activations" and "gradients" that are shared among connected kernels according to the topology of the network. This process is computationally intensive and requires high memory and communication bandwidth. Cerebras has developed a novel computer system designed for this work that is powered by a 21.5cm by 21.5cm wafer-scale processor with 400,000 programmable compute cores. It is structured as a regular array of 633 by 633 processing elements, each with its own local high bandwidth SRAM memory and direct high bandwidth connection to its neighboring cores. In addition to supporting traditional execution models for neural network training and inference, this engine has a unique capability to compile and compute every layer of a complete neural network simultaneously. Mapping a neural network in this fashion onto Cerebras' Wafer-Scale Engine (WSE) is reminiscent of the traditional floorplanning problem in physical design. A kernel ends up as a rectangle of x by y compute elements. These are the flexible blocks that need to be placed to optimize performance. This paper describes an ISPD 2020 challenge to develop algorithms and heuristics that produce compiled neural networks that achieve the highest possible performance on the Cerebras WSE.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信