Shifting Capsule Networks from the Cloud to the Deep Edge

ACM Transactions on Intelligent Systems and Technology (TIST) Pub Date : 2021-10-06 DOI:10.1145/3544562

Miguel Costa, Diogo Costa, T. Gomes, S. Pinto

{"title":"Shifting Capsule Networks from the Cloud to the Deep Edge","authors":"Miguel Costa, Diogo Costa, T. Gomes, S. Pinto","doi":"10.1145/3544562","DOIUrl":null,"url":null,"abstract":"Capsule networks (CapsNets) are an emerging trend in image processing. In contrast to a convolutional neural network, CapsNets are not vulnerable to object deformation, as the relative spatial information of the objects is preserved across the network. However, their complexity is mainly related to the capsule structure and the dynamic routing mechanism, which makes it almost unreasonable to deploy a CapsNet, in its original form, in a resource-constrained device powered by a small microcontroller (MCU). In an era where intelligence is rapidly shifting from the cloud to the edge, this high complexity imposes serious challenges to the adoption of CapsNets at the very edge. To tackle this issue, we present an API for the execution of quantized CapsNets in Arm Cortex-M and RISC-V MCUs. Our software kernels extend the Arm CMSIS-NN and RISC-V PULP-NN to support capsule operations with 8-bit integers as operands. Along with it, we propose a framework to perform post-training quantization of a CapsNet. Results show a reduction in memory footprint of almost 75%, with accuracy loss ranging from 0.07% to 0.18%. In terms of throughput, our Arm Cortex-M API enables the execution of primary capsule and capsule layers with medium-sized kernels in just 119.94 and 90.60 ms, respectively (STM32H755ZIT6U, Cortex-M7 @ 480 MHz). For the GAP-8 SoC (RISC-V RV32IMCXpulp @ 170 MHz), the latency drops to 7.02 and 38.03 ms, respectively.","PeriodicalId":123526,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology (TIST)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Intelligent Systems and Technology (TIST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3544562","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Capsule networks (CapsNets) are an emerging trend in image processing. In contrast to a convolutional neural network, CapsNets are not vulnerable to object deformation, as the relative spatial information of the objects is preserved across the network. However, their complexity is mainly related to the capsule structure and the dynamic routing mechanism, which makes it almost unreasonable to deploy a CapsNet, in its original form, in a resource-constrained device powered by a small microcontroller (MCU). In an era where intelligence is rapidly shifting from the cloud to the edge, this high complexity imposes serious challenges to the adoption of CapsNets at the very edge. To tackle this issue, we present an API for the execution of quantized CapsNets in Arm Cortex-M and RISC-V MCUs. Our software kernels extend the Arm CMSIS-NN and RISC-V PULP-NN to support capsule operations with 8-bit integers as operands. Along with it, we propose a framework to perform post-training quantization of a CapsNet. Results show a reduction in memory footprint of almost 75%, with accuracy loss ranging from 0.07% to 0.18%. In terms of throughput, our Arm Cortex-M API enables the execution of primary capsule and capsule layers with medium-sized kernels in just 119.94 and 90.60 ms, respectively (STM32H755ZIT6U, Cortex-M7 @ 480 MHz). For the GAP-8 SoC (RISC-V RV32IMCXpulp @ 170 MHz), the latency drops to 7.02 and 38.03 ms, respectively.

查看原文本刊更多论文

将胶囊网络从云端转移到深边缘

胶囊网络(CapsNets)是图像处理领域的一个新兴趋势。与卷积神经网络相比，capnet不容易受到物体变形的影响，因为物体的相对空间信息在网络中被保留。然而，它们的复杂性主要与胶囊结构和动态路由机制有关，这使得在由小型微控制器(MCU)驱动的资源受限设备中以其原始形式部署CapsNet几乎是不合理的。在一个智能正迅速从云端转移到边缘的时代，这种高度的复杂性给在边缘采用capnet带来了严峻的挑战。为了解决这个问题，我们提出了一个API，用于在Arm Cortex-M和RISC-V mcu中执行量化capnet。我们的软件内核扩展了Arm CMSIS-NN和RISC-V PULP-NN，以支持8位整数作为操作数的胶囊操作。与此同时，我们提出了一个框架来执行CapsNet的训练后量化。结果显示，内存占用减少了近75%，准确性损失在0.07%到0.18%之间。在吞吐量方面，我们的Arm Cortex-M API能够分别在119.94 ms和90.60 ms内执行中等内核的主胶囊和胶囊层(STM32H755ZIT6U, Cortex-M7 @ 480 MHz)。对于GAP-8 SoC (RISC-V RV32IMCXpulp @ 170 MHz)，延迟分别降至7.02 ms和38.03 ms。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Intelligent Systems and Technology (TIST)

自引率

0.00%

发文量