A 4.57 μW@120fps Vision System of Sensing with Computing for BNN-Based Perception Applications

2021 IEEE Asian Solid-State Circuits Conference (A-SSCC) Pub Date : 2021-11-07 DOI:10.1109/A-SSCC53895.2021.9634759

Han Xu, Zheyu Liu, Ziwei Li, Erxiang Ren, Maimaiti Nazhamati, F. Qiao, Li Luo, Qi Wei, Xinjun Liu, Huazhong Yang

{"title":"A 4.57 μW@120fps Vision System of Sensing with Computing for BNN-Based Perception Applications","authors":"Han Xu, Zheyu Liu, Ziwei Li, Erxiang Ren, Maimaiti Nazhamati, F. Qiao, Li Luo, Qi Wei, Xinjun Liu, Huazhong Yang","doi":"10.1109/A-SSCC53895.2021.9634759","DOIUrl":null,"url":null,"abstract":"In AIoT era, intelligent vision perception systems are widely deployed in edges. As shown in Fig. 1, due to limited energy budget, terminal devices usually adopt hierarchical processing architecture. A coarse object detection algorithm runs in always-on mode, and gets ready to trigger subsequent complex algorithms for precise recognition or segmentation. In conventional digital vision processing frameworks, light-induced photocurrents must be transformed to voltage ${\\mathrm {(I_{ph}-to-V)}}$, converted to digital signals (A-to-D), transferred on-board to processors and exchanged between memory and processing elements. Smart vision chips provide promising solutions for cutting down these power overheads, such as placing analog processing circuits near the pixel array [2], customizing the analog-to-digital converter (ADC) which is capable of convolution [3] or adding processing circuits deeply into pixels to perform in-sensor current-domain MAC operations [4]. However, the photocurrent conversion ${\\mathrm {(I_{ph}-to-V)}}$ circuits are still reserved in those works; besides, they could only complete 1st-layer convolution for low-level features extraction, and are unable to process subsequent layers for end-to-end perception tasks, which limits the processing capability with small CNN model. Additionally, systems that implement whole CNN algorithms are also proposed by integrating CIS with an analog processor in one chip [5] or stacking a CIS chip with a digital processor chip [6]. But power overheads on data transmission and memory access are still unsolved because these designs separate sensing and computing, and adopt conventional Von Neumann architecture with much memory access.","PeriodicalId":286139,"journal":{"name":"2021 IEEE Asian Solid-State Circuits Conference (A-SSCC)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Asian Solid-State Circuits Conference (A-SSCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/A-SSCC53895.2021.9634759","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

In AIoT era, intelligent vision perception systems are widely deployed in edges. As shown in Fig. 1, due to limited energy budget, terminal devices usually adopt hierarchical processing architecture. A coarse object detection algorithm runs in always-on mode, and gets ready to trigger subsequent complex algorithms for precise recognition or segmentation. In conventional digital vision processing frameworks, light-induced photocurrents must be transformed to voltage ${\mathrm {(I_{ph}-to-V)}}$, converted to digital signals (A-to-D), transferred on-board to processors and exchanged between memory and processing elements. Smart vision chips provide promising solutions for cutting down these power overheads, such as placing analog processing circuits near the pixel array [2], customizing the analog-to-digital converter (ADC) which is capable of convolution [3] or adding processing circuits deeply into pixels to perform in-sensor current-domain MAC operations [4]. However, the photocurrent conversion ${\mathrm {(I_{ph}-to-V)}}$ circuits are still reserved in those works; besides, they could only complete 1st-layer convolution for low-level features extraction, and are unable to process subsequent layers for end-to-end perception tasks, which limits the processing capability with small CNN model. Additionally, systems that implement whole CNN algorithms are also proposed by integrating CIS with an analog processor in one chip [5] or stacking a CIS chip with a digital processor chip [6]. But power overheads on data transmission and memory access are still unsolved because these designs separate sensing and computing, and adopt conventional Von Neumann architecture with much memory access.

查看原文本刊更多论文

基于bnn的感知应用4.57 μW@120fps计算传感视觉系统

在AIoT时代，智能视觉感知系统被广泛部署在边缘。如图1所示，由于能量预算有限，终端设备通常采用分层处理架构。一个粗略的目标检测算法在永远打开模式下运行，并准备触发后续的复杂算法进行精确的识别或分割。在传统的数字视觉处理框架中，光感应光电流必须转换为电压${\ mathm {(I_{ph}到v)}}$，转换为数字信号(a到d)，传输到板载处理器，并在存储器和处理元件之间交换。智能视觉芯片为降低这些功耗开销提供了有前途的解决方案，例如在像素阵列[2]附近放置模拟处理电路，定制能够卷积[3]的模数转换器(ADC)，或者在像素中深度添加处理电路以执行传感器内电流域MAC操作[4]。但是，这些作品中仍然保留了光电流转换${\ mathm {(I_{ph} to v)}}$电路;此外，它们只能完成第一层卷积进行低级特征提取，无法处理后续层进行端到端的感知任务，这限制了小CNN模型的处理能力。此外，还提出了通过将CIS与模拟处理器集成在一个芯片[5]中或将CIS芯片与数字处理器芯片[6]堆叠在一起来实现整个CNN算法的系统。但是，由于这些设计将传感和计算分开，采用传统的冯·诺依曼架构，并且具有大量的内存访问，因此数据传输和内存访问的功率开销仍然没有解决。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE Asian Solid-State Circuits Conference (A-SSCC)

自引率

0.00%

发文量