有效的像素阵列处理，支持三元神经网络的边缘推理

IF 1.8 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC

Journal of Low Power Electronics and Applications Pub Date : 2022-10-30 DOI:10.3390/jlpea12040057

Sepehr Tabrizchi, Shaahin Angizi, A. Roohi

{"title":"有效的像素阵列处理，支持三元神经网络的边缘推理","authors":"Sepehr Tabrizchi, Shaahin Angizi, A. Roohi","doi":"10.3390/jlpea12040057","DOIUrl":null,"url":null,"abstract":"Convolutional Neural Networks (CNNs), due to their recent successes, have gained lots of attention in various vision-based applications. They have proven to produce incredible results, especially on big data, that require high processing demands. However, CNN processing demands have limited their usage in embedded edge devices with constrained energy budgets and hardware. This paper proposes an efficient new architecture, namely Ocelli includes a ternary compute pixel (TCP) consisting of a CMOS-based pixel and a compute add-on. The proposed Ocelli architecture offers several features; (I) Because of the compute add-on, TCPs can produce ternary values (i.e., −1, 0, +1) regarding the light intensity as pixels’ inputs; (II) Ocelli realizes analog convolutions enabling low-precision ternary weight neural networks. Since the first layer’s convolution operations are the performance bottleneck of accelerators, Ocelli mitigates the overhead of analog buffers and analog-to-digital converters. Moreover, our design supports a zero-skipping scheme to further power reduction; (III) Ocelli exploits non-volatile magnetic RAMs to store CNN’s weights, which remarkably reduces the static power consumption; and finally, (IV) Ocelli has two modes, including sensing and processing. Once the object is detected, the architecture switches to the typical sensing mode to capture the image. Compared to the conventional pixels, it achieves an average 10% efficiency on its lane detection power consumption compared with existing edge detection algorithms. Moreover, considering different CNN workloads, our design shows more than 23% power efficiency over conventional designs, while it can achieve better accuracy.","PeriodicalId":38100,"journal":{"name":"Journal of Low Power Electronics and Applications","volume":" ","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2022-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Ocelli: Efficient Processing-in-Pixel Array Enabling Edge Inference of Ternary Neural Networks\",\"authors\":\"Sepehr Tabrizchi, Shaahin Angizi, A. Roohi\",\"doi\":\"10.3390/jlpea12040057\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Convolutional Neural Networks (CNNs), due to their recent successes, have gained lots of attention in various vision-based applications. They have proven to produce incredible results, especially on big data, that require high processing demands. However, CNN processing demands have limited their usage in embedded edge devices with constrained energy budgets and hardware. This paper proposes an efficient new architecture, namely Ocelli includes a ternary compute pixel (TCP) consisting of a CMOS-based pixel and a compute add-on. The proposed Ocelli architecture offers several features; (I) Because of the compute add-on, TCPs can produce ternary values (i.e., −1, 0, +1) regarding the light intensity as pixels’ inputs; (II) Ocelli realizes analog convolutions enabling low-precision ternary weight neural networks. Since the first layer’s convolution operations are the performance bottleneck of accelerators, Ocelli mitigates the overhead of analog buffers and analog-to-digital converters. Moreover, our design supports a zero-skipping scheme to further power reduction; (III) Ocelli exploits non-volatile magnetic RAMs to store CNN’s weights, which remarkably reduces the static power consumption; and finally, (IV) Ocelli has two modes, including sensing and processing. Once the object is detected, the architecture switches to the typical sensing mode to capture the image. Compared to the conventional pixels, it achieves an average 10% efficiency on its lane detection power consumption compared with existing edge detection algorithms. Moreover, considering different CNN workloads, our design shows more than 23% power efficiency over conventional designs, while it can achieve better accuracy.\",\"PeriodicalId\":38100,\"journal\":{\"name\":\"Journal of Low Power Electronics and Applications\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2022-10-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Low Power Electronics and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/jlpea12040057\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Low Power Electronics and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/jlpea12040057","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 2

摘要

卷积神经网络（CNNs）由于其最近的成功，在各种基于视觉的应用中受到了广泛的关注。事实证明，它们可以产生令人难以置信的结果，尤其是在需要高处理要求的大数据方面。然而，CNN处理需求限制了它们在能源预算和硬件受限的嵌入式边缘设备中的使用。本文提出了一种高效的新架构，即Ocelli包括一个由基于CMOS的像素和一个计算插件组成的三元计算像素（TCP）。拟议的Ocelli架构提供了几个特点；（I）由于计算附加功能，TCPs可以产生关于光强度的三元值（即−1、0、+1）作为像素的输入；（II） Ocelli实现了模拟卷积，实现了低精度的三元权重神经网络。由于第一层的卷积运算是加速器的性能瓶颈，Ocelli减少了模拟缓冲器和模数转换器的开销。此外，我们的设计支持零跳方案，以进一步降低功率；（III） Ocelli利用非易失性磁RAM来存储CNN的权重，这显著降低了静态功耗；最后，（IV）Ocelli有两种模式，包括传感和处理。一旦检测到物体，体系结构就切换到典型的感测模式来捕捉图像。与传统像素相比，与现有的边缘检测算法相比，它在车道检测功耗方面实现了平均10%的效率。此外，考虑到不同的CNN工作负载，我们的设计显示出比传统设计高出23%以上的功率效率，同时可以实现更好的精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Ocelli: Efficient Processing-in-Pixel Array Enabling Edge Inference of Ternary Neural Networks

Convolutional Neural Networks (CNNs), due to their recent successes, have gained lots of attention in various vision-based applications. They have proven to produce incredible results, especially on big data, that require high processing demands. However, CNN processing demands have limited their usage in embedded edge devices with constrained energy budgets and hardware. This paper proposes an efficient new architecture, namely Ocelli includes a ternary compute pixel (TCP) consisting of a CMOS-based pixel and a compute add-on. The proposed Ocelli architecture offers several features; (I) Because of the compute add-on, TCPs can produce ternary values (i.e., −1, 0, +1) regarding the light intensity as pixels’ inputs; (II) Ocelli realizes analog convolutions enabling low-precision ternary weight neural networks. Since the first layer’s convolution operations are the performance bottleneck of accelerators, Ocelli mitigates the overhead of analog buffers and analog-to-digital converters. Moreover, our design supports a zero-skipping scheme to further power reduction; (III) Ocelli exploits non-volatile magnetic RAMs to store CNN’s weights, which remarkably reduces the static power consumption; and finally, (IV) Ocelli has two modes, including sensing and processing. Once the object is detected, the architecture switches to the typical sensing mode to capture the image. Compared to the conventional pixels, it achieves an average 10% efficiency on its lane detection power consumption compared with existing edge detection algorithms. Moreover, considering different CNN workloads, our design shows more than 23% power efficiency over conventional designs, while it can achieve better accuracy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Low Power Electronics and Applications Engineering-Electrical and Electronic Engineering

CiteScore

3.60

自引率

14.30%

发文量

审稿时长

11 weeks