A 0.2-to-3.6TOPS/W Programmable Convolutional Imager SoC with In-Sensor Current-Domain Ternary-Weighted MAC Operations for Feature Extraction and Region-of-Interest Detection
{"title":"A 0.2-to-3.6TOPS/W Programmable Convolutional Imager SoC with In-Sensor Current-Domain Ternary-Weighted MAC Operations for Feature Extraction and Region-of-Interest Detection","authors":"M. Lefebvre, Ludovic Moreau, R. Dekimpe, D. Bol","doi":"10.1109/ISSCC42613.2021.9365839","DOIUrl":null,"url":null,"abstract":"Mixed-signal vision chips are becoming increasingly popular for low-power embedded computer vision applications on smartphones, wearables and IoT nodes, as they meet stringent power and area constraints while maintaining a sufficient level of accuracy for low- to medium-level image processing tasks. On the one hand, in-sensor processing [1, 2] enables massively parallel operation but relies on pixel-level processing elements that degrade the pixel pitch and restrict the convolutional receptive field to neighboring pixels [1], precluding multi-scale operation. On the other hand, near-sensor processing [3–5] can operate at multiple scales by pixel downsampling [3] or binning [4] but entails significant power and area overhead as an analog memory is required to store pixel values awaiting processing. In addition, previous near-sensor processing SoCs are generally application-specific and thus suffer from limited versatility. In this paper, we present a 65nm QQVGA convolutional imager SoC codenamed SleepSpotter capable of versatile feature extraction and region-of-interest (RoI) detection based on in-sensor current-domain MAC operations. It operates at 6 different scales, features programmable filter size (F), stride (S), and ternary filter weights (1.5b). It reaches a minimum energy of 2.5pJ/pixel•frame•filter and a peak efficiency of 3.6TOPS/W, with 29% pixel area overhead for enabling the convolution and without the need for an analog memory.","PeriodicalId":371093,"journal":{"name":"2021 IEEE International Solid- State Circuits Conference (ISSCC)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Solid- State Circuits Conference (ISSCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSCC42613.2021.9365839","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 24
Abstract
Mixed-signal vision chips are becoming increasingly popular for low-power embedded computer vision applications on smartphones, wearables and IoT nodes, as they meet stringent power and area constraints while maintaining a sufficient level of accuracy for low- to medium-level image processing tasks. On the one hand, in-sensor processing [1, 2] enables massively parallel operation but relies on pixel-level processing elements that degrade the pixel pitch and restrict the convolutional receptive field to neighboring pixels [1], precluding multi-scale operation. On the other hand, near-sensor processing [3–5] can operate at multiple scales by pixel downsampling [3] or binning [4] but entails significant power and area overhead as an analog memory is required to store pixel values awaiting processing. In addition, previous near-sensor processing SoCs are generally application-specific and thus suffer from limited versatility. In this paper, we present a 65nm QQVGA convolutional imager SoC codenamed SleepSpotter capable of versatile feature extraction and region-of-interest (RoI) detection based on in-sensor current-domain MAC operations. It operates at 6 different scales, features programmable filter size (F), stride (S), and ternary filter weights (1.5b). It reaches a minimum energy of 2.5pJ/pixel•frame•filter and a peak efficiency of 3.6TOPS/W, with 29% pixel area overhead for enabling the convolution and without the need for an analog memory.