Jihoon Kim, Juhyoung Lee, Jinsu Lee, H. Yoo, Joo-Young Kim
{"title":"Z-PIM: An Energy-Efficient Sparsity Aware Processing-In-Memory Architecture with Fully-Variable Weight Precision","authors":"Jihoon Kim, Juhyoung Lee, Jinsu Lee, H. Yoo, Joo-Young Kim","doi":"10.1109/VLSICircuits18222.2020.9163015","DOIUrl":"https://doi.org/10.1109/VLSICircuits18222.2020.9163015","url":null,"abstract":"This paper presents Z-PIM, an energy-efficient processing-in-memory (PIM) architecture that supports zero-skipping operations and fully-variable weight bit-precision for efficient deep neural network (DNN). The 8T-SRAM cell based bit-serial operation with hierarchical bit-line structure enables variable weight precision and reduces bit-line switching by 95.42% in convolution layers of VGG-16. Z-PIM handles abundant zeros in weight data by skip-reading their corresponding input data while read-sequence rearranging and pipelining improves throughput by 66.1%. In addition, diagonal accumulation logic is proposed to accumulate both partial-sums for bit-serial operation and spatial products. As a result, the Z-PIM chip fabricated in a 65nm process consumes average 5.294mW power and achieves 0.31–49.12 TOPS/W energy efficiency for convolution operations as sparsity and weight bit-precision vary from 0.1 to 0.9 and 1b to 16b, respectively.","PeriodicalId":252787,"journal":{"name":"2020 IEEE Symposium on VLSI Circuits","volume":"61 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126771181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dongyang Jiang, Liang Qi, Sai-Weng Sin, F. Maloberti, R. Martins
{"title":"A 5MHz-BW, 86.1dB-SNDR 4X Time-Interleaved 2nd-Order ΔΣ Modulator with Digital Feedforward Extrapolation in 28nm CMOS","authors":"Dongyang Jiang, Liang Qi, Sai-Weng Sin, F. Maloberti, R. Martins","doi":"10.1109/VLSICircuits18222.2020.9162798","DOIUrl":"https://doi.org/10.1109/VLSICircuits18222.2020.9162798","url":null,"abstract":"This paper presents a 4X Time-Interleaved (TI) 2nd-order discrete-time (DT) ΔΣ Modulator (DSM) using digital feedforward extrapolation. Three feedforward paths digitize one channel information first and then extrapolate the other channels fully in the digital domain. Hence, this DSM only needs two opamps in one channel to realize four interleaving paths, thus reducing analog hardware overheads. With the sampling clock @ 520MHz, this 28nm CMOS prototype achieves an equivalent output sampling rate of 2.08GS/s, 208× OSR, 86.1dB SNDR, and 98dB SFDR over a 5MHz BW, while consuming 23.1mW. It results in an FOMS of 169.5dB.","PeriodicalId":252787,"journal":{"name":"2020 IEEE Symposium on VLSI Circuits","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127784747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-Sensor Platform with Five-Order-of-Magnitude System Power Adaptation down to 3.1nW and Sustained Operation under Moonlight Harvesting","authors":"Longyang Lin, Saurabh Jain, M. Alioto","doi":"10.1109/VLSICircuits18222.2020.9162898","DOIUrl":"https://doi.org/10.1109/VLSICircuits18222.2020.9162898","url":null,"abstract":"A sensor node with system power tuning is presented for 5-order-of-magnitude adaptation to harvested power. Coordinated tuning of unified voltage/capacitive/light sensor interface, MCU and direct MPPT with no intermediate power conversion scales system power to 3.1nW at 0.3V. Operation at 1lux (moonlight) with 4.1×4.1mm2 light harvester is shown.","PeriodicalId":252787,"journal":{"name":"2020 IEEE Symposium on VLSI Circuits","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115227127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Z. Guo, J. Wiedemer, Yusung Kim, P. S. Ramamoorthy, P. B. Sathyaprasad, Smita Shridharan, Daeyeon Kim, E. Karl
{"title":"A 10nm SRAM Design using Gate-Modulated Self-Collapse Write Assist Enabling 175mV VMIN Reduction with Negligible Power Overhead","authors":"Z. Guo, J. Wiedemer, Yusung Kim, P. S. Ramamoorthy, P. B. Sathyaprasad, Smita Shridharan, Daeyeon Kim, E. Karl","doi":"10.1109/VLSICircuits18222.2020.9162782","DOIUrl":"https://doi.org/10.1109/VLSICircuits18222.2020.9162782","url":null,"abstract":"A 21Mb/mm2 SRAM design using 0.0367um2 HCC bitcell on a 10nm CMOS technology is presented. Gate-modulated self-collapse (GSC) write assist is utilized to enable 175mV reduction in VMIN with minimal energy overhead. Instance area overhead is limited to 3–5% by implementing the GSC circuitry in a row-based configuration with modified SRAM bitcells.","PeriodicalId":252787,"journal":{"name":"2020 IEEE Symposium on VLSI Circuits","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126095679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jingcheng Wang, Hyochan An, Qirui Zhang, Hun-Seok Kim, D. Blaauw, D. Sylvester
{"title":"1.03pW/b Ultra-Low Leakage Voltage-Stacked SRAM for Intelligent Edge Processors","authors":"Jingcheng Wang, Hyochan An, Qirui Zhang, Hun-Seok Kim, D. Blaauw, D. Sylvester","doi":"10.1109/VLSICircuits18222.2020.9162843","DOIUrl":"https://doi.org/10.1109/VLSICircuits18222.2020.9162843","url":null,"abstract":"A stacked voltage domain SRAM is proposed where arrays are split into two sets (top and bottom) with their supplies connected in series. System supply current is reused by top and bottom sets, and supply voltage is divided among the two sets of arrays, enabling seamless integration of very low voltage SRAM retention in a larger system with a nominal supply, without need for an efficiency-reducing LDO. An array swapping approach provides stable access to arbitrary banks within one system clock cycle. A comprehensive sizing strategy (W&L) is employed to optimally balance hold stability and bitcell size. Integrated in an IoT imaging system in 40nm CMOS, the proposed 8.9Mb SRAM achieves 1.03pW/bit leakage, a >100× reduction over conventional SRAM in the same technology.","PeriodicalId":252787,"journal":{"name":"2020 IEEE Symposium on VLSI Circuits","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121505647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A 3.2-to-3.8GHz Calibration-Free Harmonic-Mixer-Based Dual-Feedback Fractional-N PLL Achieving –66dBc Worst-Case In-Band Fractional Spur","authors":"Masaru Osada, Zule Xu, T. Iizuka","doi":"10.1109/VLSICircuits18222.2020.9162799","DOIUrl":"https://doi.org/10.1109/VLSICircuits18222.2020.9162799","url":null,"abstract":"A dual-feedback architecture for a fractional-N PLL is proposed to achieve low spurs and to suppress the phase noise degradation from the Delta-Sigma Modulator (DSM). With the assistance of 1 auxiliary PLL, the proposed architecture avoids noise amplification that occurs in conventional architectures. The feasibility of the proposed architecture is demonstrated in a calibration-free 3.2-to-3.8GHz analog fractional-N PLL that achieves –69dBc out-of-band spur and –66dBc worst-case in-band fractional spur.","PeriodicalId":252787,"journal":{"name":"2020 IEEE Symposium on VLSI Circuits","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126476827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sangyeob Kim, Juhyoung Lee, Sanghoon Kang, Jinmook Lee, H. Yoo
{"title":"A 146.52 TOPS/W Deep-Neural-Network Learning Processor with Stochastic Coarse-Fine Pruning and Adaptive Input/Output/Weight Skipping","authors":"Sangyeob Kim, Juhyoung Lee, Sanghoon Kang, Jinmook Lee, H. Yoo","doi":"10.1109/VLSICircuits18222.2020.9162795","DOIUrl":"https://doi.org/10.1109/VLSICircuits18222.2020.9162795","url":null,"abstract":"An energy efficient Deep-Neural-Network (DNN) learning processor is proposed for on-chip learning and iterative weight pruning (WP). This work has three key features: 1) stochastic coarse-fine pruning reduced computation workload by 99.7% compared with previous WP algorithm while maintaining high weight sparsity, 2) adaptive input/output/weight skipping (AIOWS) achieved 30.1× higher throughput than previous DNN learning processor [1] for not only the inference but also learning, 3) weight memory shared pruning unit removed on-chip weight memory access for WP. As a result, this work shows 146.52 TOPS/W energy efficiency, which is 5.79× higher than the state-of-the-art [1].","PeriodicalId":252787,"journal":{"name":"2020 IEEE Symposium on VLSI Circuits","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131855282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tengfei Chang, Timothy Claeys, Mališa Vučinić, Xavier Vilajosana, Titan Yuan, B. Wheeler, F. Maksimovic, D. Burnett, Brian G. Kilberg, K. Pister, T. Watteyne
{"title":"Industrial IoT with Crystal-Free Mote-on-Chip","authors":"Tengfei Chang, Timothy Claeys, Mališa Vučinić, Xavier Vilajosana, Titan Yuan, B. Wheeler, F. Maksimovic, D. Burnett, Brian G. Kilberg, K. Pister, T. Watteyne","doi":"10.1109/VLSICircuits18222.2020.9162981","DOIUrl":"https://doi.org/10.1109/VLSICircuits18222.2020.9162981","url":null,"abstract":"SCμM is a 2×3×0.3 mm3 system-on-chip that contains an ARM Cortex-M0 and a 2.4 GHz IEEE802.15.4 radio. This paper describes the two-step calibration routine needed to run a full 6TiSCH stack on SCμM. It is, to the best of our knowledge, the first time a fully standards-compliant protocol stack runs on a crystal-free radio, such that it can participate in a network with off-the-shelf radios.","PeriodicalId":252787,"journal":{"name":"2020 IEEE Symposium on VLSI Circuits","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130539573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Ultra-Low Latency 7.8–13.6 pJ/b Reconfigurable Neural Network-Assisted Polar Decoder with Multi-Code Length Support","authors":"Chieh-Fang Teng, Chun-Hsiang Chen, A. Wu","doi":"10.1109/vlsicircuits18222.2020.9163022","DOIUrl":"https://doi.org/10.1109/vlsicircuits18222.2020.9163022","url":null,"abstract":"To meet with the stringent requirements of ultra-low latency communication in 5G, this work presents a polar decoder fabricated in TSMC 40nm CMOS featuring: 1) World's first neural network-assisted decoder chip with 8× improvement of convergence rate. 2) Fully reconfigurable architecture to support multi-code length operations with a 2-to-8× hardware utilization rate. 3) Optimized fixed-point design of processing element (PE) to reduce 73% area and 67% power consumption.","PeriodicalId":252787,"journal":{"name":"2020 IEEE Symposium on VLSI Circuits","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129714133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A 28nm 10Mb Embedded Flash Memory for IoT Product with Ultra-Low Power Near-1V Supply Voltage and High Temperature for Grade 1 Operation","authors":"H. Shin, Jisung Kim, Shin-Jae Kang, Sung-Ung Kwak","doi":"10.1109/vlsicircuits18222.2020.9162813","DOIUrl":"https://doi.org/10.1109/vlsicircuits18222.2020.9162813","url":null,"abstract":"In this paper, we present an Embedded Flash Memory (eFlash) based on logic-28nm process for Internet of Things (IoT) product. IoT product requires high performance, low power operation and immune to the high temperature. Based on a power-efficient 28nm process technology, we implemented the ultra-deep sleep mode (<1uA). Through the WL Boosting and Adaptive Control Sensing Scheme (WBACS), we achieved fast read speed (3.2Gbit/s) and robust sensing margin. High voltages can be generated stably in ultra-low power IO 1.1V by using Double-Boost-Clock (DBC). Through the technique that positive/negative Bi-Directional Charge Pump (BDCP), three high voltages required for Program/Erase operation can be generated from two charge pumps. As a result, we have developed an area competitive eFlash IP (Size 1.27 mm2). Based on these technologies, it was confirmed that 28nm-eFlash operates at ultra-low power (Core-VDD 0.85V & IO 1.1V) and high temperature (Tj 150°C) successfully. And these technologies were mounted in the world's first 28nm process MCU-Connectivity One Chip Solution.","PeriodicalId":252787,"journal":{"name":"2020 IEEE Symposium on VLSI Circuits","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133934006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}