A 4.57 μW@120fps Vision System of Sensing with Computing for BNN-Based Perception Applications

Han Xu, Zheyu Liu, Ziwei Li, Erxiang Ren, Maimaiti Nazhamati, F. Qiao, Li Luo, Qi Wei, Xinjun Liu, Huazhong Yang
{"title":"A 4.57 μW@120fps Vision System of Sensing with Computing for BNN-Based Perception Applications","authors":"Han Xu, Zheyu Liu, Ziwei Li, Erxiang Ren, Maimaiti Nazhamati, F. Qiao, Li Luo, Qi Wei, Xinjun Liu, Huazhong Yang","doi":"10.1109/A-SSCC53895.2021.9634759","DOIUrl":null,"url":null,"abstract":"In AIoT era, intelligent vision perception systems are widely deployed in edges. As shown in Fig. 1, due to limited energy budget, terminal devices usually adopt hierarchical processing architecture. A coarse object detection algorithm runs in always-on mode, and gets ready to trigger subsequent complex algorithms for precise recognition or segmentation. In conventional digital vision processing frameworks, light-induced photocurrents must be transformed to voltage ${\\mathrm {(I_{ph}-to-V)}}$, converted to digital signals (A-to-D), transferred on-board to processors and exchanged between memory and processing elements. Smart vision chips provide promising solutions for cutting down these power overheads, such as placing analog processing circuits near the pixel array [2], customizing the analog-to-digital converter (ADC) which is capable of convolution [3] or adding processing circuits deeply into pixels to perform in-sensor current-domain MAC operations [4]. However, the photocurrent conversion ${\\mathrm {(I_{ph}-to-V)}}$ circuits are still reserved in those works; besides, they could only complete 1st-layer convolution for low-level features extraction, and are unable to process subsequent layers for end-to-end perception tasks, which limits the processing capability with small CNN model. Additionally, systems that implement whole CNN algorithms are also proposed by integrating CIS with an analog processor in one chip [5] or stacking a CIS chip with a digital processor chip [6]. But power overheads on data transmission and memory access are still unsolved because these designs separate sensing and computing, and adopt conventional Von Neumann architecture with much memory access.","PeriodicalId":286139,"journal":{"name":"2021 IEEE Asian Solid-State Circuits Conference (A-SSCC)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Asian Solid-State Circuits Conference (A-SSCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/A-SSCC53895.2021.9634759","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

In AIoT era, intelligent vision perception systems are widely deployed in edges. As shown in Fig. 1, due to limited energy budget, terminal devices usually adopt hierarchical processing architecture. A coarse object detection algorithm runs in always-on mode, and gets ready to trigger subsequent complex algorithms for precise recognition or segmentation. In conventional digital vision processing frameworks, light-induced photocurrents must be transformed to voltage ${\mathrm {(I_{ph}-to-V)}}$, converted to digital signals (A-to-D), transferred on-board to processors and exchanged between memory and processing elements. Smart vision chips provide promising solutions for cutting down these power overheads, such as placing analog processing circuits near the pixel array [2], customizing the analog-to-digital converter (ADC) which is capable of convolution [3] or adding processing circuits deeply into pixels to perform in-sensor current-domain MAC operations [4]. However, the photocurrent conversion ${\mathrm {(I_{ph}-to-V)}}$ circuits are still reserved in those works; besides, they could only complete 1st-layer convolution for low-level features extraction, and are unable to process subsequent layers for end-to-end perception tasks, which limits the processing capability with small CNN model. Additionally, systems that implement whole CNN algorithms are also proposed by integrating CIS with an analog processor in one chip [5] or stacking a CIS chip with a digital processor chip [6]. But power overheads on data transmission and memory access are still unsolved because these designs separate sensing and computing, and adopt conventional Von Neumann architecture with much memory access.
基于bnn的感知应用4.57 μW@120fps计算传感视觉系统
在AIoT时代,智能视觉感知系统被广泛部署在边缘。如图1所示,由于能量预算有限,终端设备通常采用分层处理架构。一个粗略的目标检测算法在永远打开模式下运行,并准备触发后续的复杂算法进行精确的识别或分割。在传统的数字视觉处理框架中,光感应光电流必须转换为电压${\ mathm {(I_{ph}到v)}}$,转换为数字信号(a到d),传输到板载处理器,并在存储器和处理元件之间交换。智能视觉芯片为降低这些功耗开销提供了有前途的解决方案,例如在像素阵列[2]附近放置模拟处理电路,定制能够卷积[3]的模数转换器(ADC),或者在像素中深度添加处理电路以执行传感器内电流域MAC操作[4]。但是,这些作品中仍然保留了光电流转换${\ mathm {(I_{ph} to v)}}$电路;此外,它们只能完成第一层卷积进行低级特征提取,无法处理后续层进行端到端的感知任务,这限制了小CNN模型的处理能力。此外,还提出了通过将CIS与模拟处理器集成在一个芯片[5]中或将CIS芯片与数字处理器芯片[6]堆叠在一起来实现整个CNN算法的系统。但是,由于这些设计将传感和计算分开,采用传统的冯·诺依曼架构,并且具有大量的内存访问,因此数据传输和内存访问的功率开销仍然没有解决。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信