Yuchuan Gong, Qingsong Liu, Luying Que, Conghan Jia, Jiahui Huang, Ye Liu, Jiayan Gan, Yuxiang Xie, Yong Zhou, Lili Liu, Xiaoqiang Xiang, L. Chang, Jun Zhou
{"title":"RAODAT: An Energy-Efficient Reconfigurable AI-based Object Detection and Tracking Processor with Online Learning","authors":"Yuchuan Gong, Qingsong Liu, Luying Que, Conghan Jia, Jiahui Huang, Ye Liu, Jiayan Gan, Yuxiang Xie, Yong Zhou, Lili Liu, Xiaoqiang Xiang, L. Chang, Jun Zhou","doi":"10.1109/A-SSCC53895.2021.9634785","DOIUrl":"https://doi.org/10.1109/A-SSCC53895.2021.9634785","url":null,"abstract":"Smart robots (e.g. drones) for object detection & tracking demand for embedded intelligent processors. Neural network (NN) processors have been designed to accelerate NN for pattern recognition [1] [2]. However, these designs lack special processing engines for object detection & tracking such as bounding box (bbox) calculation and selection. Also, their architectures are designed for general AI tasks resulting in redundancy/inefficiency in performing object detection & tracking. An object detection processor has been proposed previously [3], but it only supports specific detection NN and does not support object tracking. Object tracking processors have also been proposed [4] [5], but these designs do not support object detection and thus cannot be used for object search. This paper presents RAODAT, which to the best of our knowledge is the first reconfigurable AI-based object detection and tracking processor with online learning. It has three key features: 1) An object detection & tracking architecture with reconfigurable NN and detection/tracking engines for programmable object detection & tracking tasks, 2) An object learning architecture with shared NN inference/learning engine and automatic label generation engine to support object tracking with online learning, 3) Layer- & stride-aware computing techniques to improve the NN computation efficiency.","PeriodicalId":286139,"journal":{"name":"2021 IEEE Asian Solid-State Circuits Conference (A-SSCC)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126987010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Han Xu, Zheyu Liu, Ziwei Li, Erxiang Ren, Maimaiti Nazhamati, F. Qiao, Li Luo, Qi Wei, Xinjun Liu, Huazhong Yang
{"title":"A 4.57 μW@120fps Vision System of Sensing with Computing for BNN-Based Perception Applications","authors":"Han Xu, Zheyu Liu, Ziwei Li, Erxiang Ren, Maimaiti Nazhamati, F. Qiao, Li Luo, Qi Wei, Xinjun Liu, Huazhong Yang","doi":"10.1109/A-SSCC53895.2021.9634759","DOIUrl":"https://doi.org/10.1109/A-SSCC53895.2021.9634759","url":null,"abstract":"In AIoT era, intelligent vision perception systems are widely deployed in edges. As shown in Fig. 1, due to limited energy budget, terminal devices usually adopt hierarchical processing architecture. A coarse object detection algorithm runs in always-on mode, and gets ready to trigger subsequent complex algorithms for precise recognition or segmentation. In conventional digital vision processing frameworks, light-induced photocurrents must be transformed to voltage ${mathrm {(I_{ph}-to-V)}}$, converted to digital signals (A-to-D), transferred on-board to processors and exchanged between memory and processing elements. Smart vision chips provide promising solutions for cutting down these power overheads, such as placing analog processing circuits near the pixel array [2], customizing the analog-to-digital converter (ADC) which is capable of convolution [3] or adding processing circuits deeply into pixels to perform in-sensor current-domain MAC operations [4]. However, the photocurrent conversion ${mathrm {(I_{ph}-to-V)}}$ circuits are still reserved in those works; besides, they could only complete 1st-layer convolution for low-level features extraction, and are unable to process subsequent layers for end-to-end perception tasks, which limits the processing capability with small CNN model. Additionally, systems that implement whole CNN algorithms are also proposed by integrating CIS with an analog processor in one chip [5] or stacking a CIS chip with a digital processor chip [6]. But power overheads on data transmission and memory access are still unsolved because these designs separate sensing and computing, and adopt conventional Von Neumann architecture with much memory access.","PeriodicalId":286139,"journal":{"name":"2021 IEEE Asian Solid-State Circuits Conference (A-SSCC)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115187423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Brian Crafton, S. Spetalnick, Jong-Hyeok Yoon, Wei Wu, Carlos Tokunaga, V. De, A. Raychowdhury
{"title":"CIM-SECDED: A 40nm 64Kb Compute In-Memory RRAM Macro with ECC Enabling Reliable Operation","authors":"Brian Crafton, S. Spetalnick, Jong-Hyeok Yoon, Wei Wu, Carlos Tokunaga, V. De, A. Raychowdhury","doi":"10.1109/A-SSCC53895.2021.9634742","DOIUrl":"https://doi.org/10.1109/A-SSCC53895.2021.9634742","url":null,"abstract":"Resistive RAM (RRAM) is a promising candidate for compute in-memory (CIM) applications owing to its natural multiply-and-accumulate structure in a 1T-1R bitcell, high-bit density, non-volatility, and voltage and process compatibility. These properties seek to advance applications such as AI with higher throughput and bit-density. However, due to process, temperature, and write-to-write variations the resistive state of each RRAM undergoes both spatial and temporal variations. Significant effort has been made to reduce the impact of device variation using iterative write verify (IWV) or training-aware approaches [1]. Unfortunately, traditional ECC is not compatible with CIM when multiple cells are read simultaneously on the same bitline. To address this issue at the circuit level, this paper presents a 64Kb RRAM macro in 40nm CMOS supporting SECDED (single error correction, double error detection) scheme compatible with CIM for any number of parallel row accesses. Compared to prior work, our results indicate that CIM-SECDED (1) improves bit error rate (BER) by up to $69.2 times $ for compute in-memory (2) relaxes the constraints on resistance variations and directly lowers IWV and write voltages. As a result, when applied to AI workloads we achieve (1) 24.4% (29.9%) accuracy improvement on the CIFAR10 (ImageNet) dataset (2) and consequently, improved endurance though lowering write voltage requirements [2].","PeriodicalId":286139,"journal":{"name":"2021 IEEE Asian Solid-State Circuits Conference (A-SSCC)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122460282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Changhyeon Kim, Dongyoung Rim, Jeongwon Choe, D. Kam, Giyoon Park, Seokki Kim, Youngjoo Lee
{"title":"FPGA-Based Ordered Statistic Decoding Architecture for B5G/6G URLLC IIOT Networks","authors":"Changhyeon Kim, Dongyoung Rim, Jeongwon Choe, D. Kam, Giyoon Park, Seokki Kim, Youngjoo Lee","doi":"10.1109/A-SSCC53895.2021.9634714","DOIUrl":"https://doi.org/10.1109/A-SSCC53895.2021.9634714","url":null,"abstract":"The ordered statistic decoding (OSD) approach for short-length BCH codes has been continuously considered as one of the promising error-correction codes by achieving a block error rate (BLER) of less than $10^{-6}$, which is attractive to the ultra-reliable and low-latency communication (URLLC) for industrial IoT (IIOT) solutions [1], [2]. However, it is hard to directly realize the conventional OSD algorithm because of the compute-intensive Gaussian elimination and iterative reprocessing steps. Based on the recent segmentation discarding decoding (SDD) approach [3], in this work, we newly present an ultralow-latency OSD architecture reducing the decoding latency by 12 times, which is implemented at an FPGA-based verification platform.","PeriodicalId":286139,"journal":{"name":"2021 IEEE Asian Solid-State Circuits Conference (A-SSCC)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114510313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Saito Shibata, Reiji Miura, Yoshiki Sawabe, K. Shiba, Atsutake Kosuge, M. Hamada, T. Kuroda
{"title":"A 5-GHz 0.15-mm2 Collision Avoidable RFID Employing Complementary Pass-transistor Adiabatic Logic with an Inductively Connected External Antenna","authors":"Saito Shibata, Reiji Miura, Yoshiki Sawabe, K. Shiba, Atsutake Kosuge, M. Hamada, T. Kuroda","doi":"10.1109/A-SSCC53895.2021.9634815","DOIUrl":"https://doi.org/10.1109/A-SSCC53895.2021.9634815","url":null,"abstract":"RFID (Radio Frequency Identification) is a promising technology for many applications such as automatic checkout at shops, inventory tracking at warehouses, product tracking in logistics, and so on. However, practical application is still limited because even the current state-of-the-art RFIDs do not meet all the requirements in terms of production cost, communication range, and reliable operation such as collision avoidance.","PeriodicalId":286139,"journal":{"name":"2021 IEEE Asian Solid-State Circuits Conference (A-SSCC)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122136079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Auto-Calibration Technique for Current-Based Bandgap Voltage Reference","authors":"U. Chi-Wa, M. Law, C. Lam, R. Martins","doi":"10.1109/A-SSCC53895.2021.9634776","DOIUrl":"https://doi.org/10.1109/A-SSCC53895.2021.9634776","url":null,"abstract":"For the industrial application, the bandgap voltage reference (BGR) requires calibration after fabrication to ensure accuracy [1–3] which may lead to expensive labor costs. Some BGRs in the literature were reported without trimming [4,5], however, their output voltages will drift due to device aging and stress [6]. The proposed auto-calibration technique can eliminate such a process thus saving costs. Furthermore, the $beta$ of the BJT is small in the advance process such that the variation of $beta/(beta+1)$ in $V_{EB}=(kT/q)ln[(I_{E}/I_{S})beta/(beta+1)]$ has a great influence on VEB, resulting in residual temperature coefficient (TC) variation after the conventional one-point trimming. This implies an error in the output, thus reduce stability and lifetime of the system.","PeriodicalId":286139,"journal":{"name":"2021 IEEE Asian Solid-State Circuits Conference (A-SSCC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129742150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A ± 20-ppm -50°C-105°C 1-µA 32.768-kHz Clock Generator with a System-HFXO-Assisted Background Calibration","authors":"Chun-Yu Lin, Yu-Wei Huang, Tsung-Hsien Lin","doi":"10.1109/A-SSCC53895.2021.9634827","DOIUrl":"https://doi.org/10.1109/A-SSCC53895.2021.9634827","url":null,"abstract":"A kHz-range clock is required in many applications. For example, an IoT device is in the sleep mode most of the time and often needs a kHz clock for the timer or time-stamp purposes [1]. For compact device size, implementing a kHz clock using a low-frequency crystal oscillator (LFXO) is not preferred because an extra kHz crystal (Xtal) is required [2]. Alternatively, the kHz clock can be generated by dividing a high-frequency XO (HFXO) signal through dividers. (An MHz-range HFXO is usually available to serve as the system clock for computation and communication purposes in an SOC.) However, the division approach requires the HFXO and dividers remain active even in the sleep mode, which consumes large power [3]. Some works exploit on-chip oscillators to produce a kHz clock. Such oscillators are PVT sensitive and prone to inferior frequency stability [4], [5]. MEMS-based clock generator achieves excellent performance [6]. However, this is at the cost of complex temperature trimming and an additional MEMS resonator.","PeriodicalId":286139,"journal":{"name":"2021 IEEE Asian Solid-State Circuits Conference (A-SSCC)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126316034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wen-Liang Zeng, Caolei Pan, C. Lam, Sai-Weng Sin, Chenchang Zhan, R. Martins
{"title":"A 95% Peak Efficiency Modified KY (Boost) Converter for IoT with Continuous Flying Capacitor Charging in DCM","authors":"Wen-Liang Zeng, Caolei Pan, C. Lam, Sai-Weng Sin, Chenchang Zhan, R. Martins","doi":"10.1109/A-SSCC53895.2021.9634724","DOIUrl":"https://doi.org/10.1109/A-SSCC53895.2021.9634724","url":null,"abstract":"DC-DC converters are required to achieve high efficiency over wide loading range and compact size for loT applications as shown in Fig. 1(a). However, many designs applied more than one control method to satisfy such requirements [1,2,4,5], which demands complex control system that involves mode selection subsystem, causing efficiency penalty and large chip area. The conventional boost converter has discontinuous output current, which degrades efficiency and output voltage ripple. As a hybrid converter with switched capacitor and inductor, the KY converter overcomes the above drawbacks. However, the charging time of the flying capacitor is seriously limited by the inner operation logic in discontinuous conduction mode (DCM) operation, resulting in small output loading capability and low efficiency.","PeriodicalId":286139,"journal":{"name":"2021 IEEE Asian Solid-State Circuits Conference (A-SSCC)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125247416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jae-Woo Park, Dong-Seok Kang, Injae Park, Minsu Park, Xuefan Jin, Kyu-Dong Hwang, Daehan Kwon, J. Chun
{"title":"A 21Gb/s Duobinary Transceiver for GDDR interfaces with an Adaptive Equalizer","authors":"Jae-Woo Park, Dong-Seok Kang, Injae Park, Minsu Park, Xuefan Jin, Kyu-Dong Hwang, Daehan Kwon, J. Chun","doi":"10.1109/A-SSCC53895.2021.9634179","DOIUrl":"https://doi.org/10.1109/A-SSCC53895.2021.9634179","url":null,"abstract":"A duobinary transceiver for Graphics Double Data rate (GDDR) memory interfaces is implemented in a 28nm CMOS technology. The proposed voltage-mode driver complies with the GDDR impedance specifications without sacrificing the ratio of level mismatch (RLM). The quarter-rate time-interleaved successive approximation duobinary receiver reduces the forwarded clock frequency and minimizes the capacitive loading of the front-end analog equalizer. Also, an equalizer adaptation scheme applicable to duobinary signaling is proposed. The transceiver achieves a BER of 10$^{-11}$ at 21 Gb/s with 1.62-mW/Gb/s energy efficiency.","PeriodicalId":286139,"journal":{"name":"2021 IEEE Asian Solid-State Circuits Conference (A-SSCC)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134267527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}