通过基于pim的架构设计实现高效的胶囊网络处理

2020 IEEE International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2019-11-07 DOI:10.1109/HPCA47549.2020.00051

Xingyao Zhang, S. Song, Chenhao Xie, Jing Wang, Wei-gong Zhang, Xin Fu

{"title":"通过基于pim的架构设计实现高效的胶囊网络处理","authors":"Xingyao Zhang, S. Song, Chenhao Xie, Jing Wang, Wei-gong Zhang, Xin Fu","doi":"10.1109/HPCA47549.2020.00051","DOIUrl":null,"url":null,"abstract":"In recent years, the CNNs have achieved great successes in the image processing tasks, e.g., image recognition and object detection. Unfortunately, traditional CNN's classification is found to be easily misled by increasingly complex image features due to the usage of pooling operations, hence unable to preserve accurate position and pose information of the objects. To address this challenge, a novel neural network structure called Capsule Network has been proposed, which introduces equivariance through capsules to significantly enhance the learning ability for image segmentation and object detection. Due to its requirement of performing a high volume of matrix operations, CapsNets have been generally accelerated on modern GPU platforms that provide highly optimized software library for common deep learning tasks. However, based on our performance characterization on modern GPUs, CapsNets exhibit low efficiency due to the special program and execution features of their routing procedure, including massive unshareable intermediate variables and intensive synchronizations, which are very difficult to optimize at software level. To address these challenges, we propose a hybrid computing architecture design named PIM-CapsNet. It preserves GPU's on-chip computing capability for accelerating CNN types of layers in CapsNet, while pipelining with an off-chip in-memory acceleration solution that effectively tackles routing procedure's inefficiency by leveraging the processing-in-memory capability of today's 3D stacked memory. Using routing procedure's inherent parallellization feature, our design enables hierarchical improvements on CapsNet inference efficiency through minimizing data movement and maximizing parallel processing in memory. Evaluation results demonstrate that our proposed design can achieve substantial improvement on both performance and energy savings for CapsNet inference, with almost zero accuracy loss. The results also suggest good performance scalability in optimizing the routing procedure with increasing network size.","PeriodicalId":339648,"journal":{"name":"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":"{\"title\":\"Enabling Highly Efficient Capsule Networks Processing Through A PIM-Based Architecture Design\",\"authors\":\"Xingyao Zhang, S. Song, Chenhao Xie, Jing Wang, Wei-gong Zhang, Xin Fu\",\"doi\":\"10.1109/HPCA47549.2020.00051\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, the CNNs have achieved great successes in the image processing tasks, e.g., image recognition and object detection. Unfortunately, traditional CNN's classification is found to be easily misled by increasingly complex image features due to the usage of pooling operations, hence unable to preserve accurate position and pose information of the objects. To address this challenge, a novel neural network structure called Capsule Network has been proposed, which introduces equivariance through capsules to significantly enhance the learning ability for image segmentation and object detection. Due to its requirement of performing a high volume of matrix operations, CapsNets have been generally accelerated on modern GPU platforms that provide highly optimized software library for common deep learning tasks. However, based on our performance characterization on modern GPUs, CapsNets exhibit low efficiency due to the special program and execution features of their routing procedure, including massive unshareable intermediate variables and intensive synchronizations, which are very difficult to optimize at software level. To address these challenges, we propose a hybrid computing architecture design named PIM-CapsNet. It preserves GPU's on-chip computing capability for accelerating CNN types of layers in CapsNet, while pipelining with an off-chip in-memory acceleration solution that effectively tackles routing procedure's inefficiency by leveraging the processing-in-memory capability of today's 3D stacked memory. Using routing procedure's inherent parallellization feature, our design enables hierarchical improvements on CapsNet inference efficiency through minimizing data movement and maximizing parallel processing in memory. Evaluation results demonstrate that our proposed design can achieve substantial improvement on both performance and energy savings for CapsNet inference, with almost zero accuracy loss. The results also suggest good performance scalability in optimizing the routing procedure with increasing network size.\",\"PeriodicalId\":339648,\"journal\":{\"name\":\"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-11-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"18\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCA47549.2020.00051\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA47549.2020.00051","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 18

摘要

近年来，cnn在图像识别、目标检测等图像处理任务上取得了巨大的成功。遗憾的是，由于使用池化操作，传统的CNN分类容易被越来越复杂的图像特征所误导，无法保持物体准确的位置和姿态信息。为了解决这一挑战，提出了一种新的神经网络结构，称为胶囊网络，通过胶囊引入等方差，显著提高了图像分割和目标检测的学习能力。由于需要执行大量的矩阵运算，capnet在现代GPU平台上普遍加速，为常见的深度学习任务提供高度优化的软件库。然而，基于我们在现代gpu上的性能表征，由于其路由过程的特殊程序和执行特征，包括大量不可共享的中间变量和密集的同步，capnet表现出较低的效率，这在软件层面上很难优化。为了应对这些挑战，我们提出了一种名为PIM-CapsNet的混合计算体系结构设计。它保留了GPU的片上计算能力，可以在CapsNet中加速CNN类型的层，同时采用片外内存加速解决方案，通过利用当今3D堆叠内存的内存处理能力，有效地解决了路由过程的低效率问题。利用路由过程固有的并行化特性，我们的设计可以通过最小化数据移动和最大化内存中的并行处理来分层改进CapsNet推理效率。评估结果表明，我们提出的设计可以在CapsNet推理的性能和节能方面取得实质性的改进，并且几乎没有精度损失。结果还表明，随着网络规模的增加，优化路由过程具有良好的性能可扩展性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Enabling Highly Efficient Capsule Networks Processing Through A PIM-Based Architecture Design

In recent years, the CNNs have achieved great successes in the image processing tasks, e.g., image recognition and object detection. Unfortunately, traditional CNN's classification is found to be easily misled by increasingly complex image features due to the usage of pooling operations, hence unable to preserve accurate position and pose information of the objects. To address this challenge, a novel neural network structure called Capsule Network has been proposed, which introduces equivariance through capsules to significantly enhance the learning ability for image segmentation and object detection. Due to its requirement of performing a high volume of matrix operations, CapsNets have been generally accelerated on modern GPU platforms that provide highly optimized software library for common deep learning tasks. However, based on our performance characterization on modern GPUs, CapsNets exhibit low efficiency due to the special program and execution features of their routing procedure, including massive unshareable intermediate variables and intensive synchronizations, which are very difficult to optimize at software level. To address these challenges, we propose a hybrid computing architecture design named PIM-CapsNet. It preserves GPU's on-chip computing capability for accelerating CNN types of layers in CapsNet, while pipelining with an off-chip in-memory acceleration solution that effectively tackles routing procedure's inefficiency by leveraging the processing-in-memory capability of today's 3D stacked memory. Using routing procedure's inherent parallellization feature, our design enables hierarchical improvements on CapsNet inference efficiency through minimizing data movement and maximizing parallel processing in memory. Evaluation results demonstrate that our proposed design can achieve substantial improvement on both performance and energy savings for CapsNet inference, with almost zero accuracy loss. The results also suggest good performance scalability in optimizing the routing procedure with increasing network size.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)

自引率

0.00%

发文量