Yanchao Wang, Siladitya Dey, Tao He, Lukang Shi, Jiawei Zheng, Manjunath Kareppagoudr, Yi Zhang, Kazuki Sobue, K. Hamashita, K. Tomioka, G. Temes
{"title":"A Hybrid Continuous Time Incremental and SAR Two-Step ADC with 90.5dB DR over 1MHz BW","authors":"Yanchao Wang, Siladitya Dey, Tao He, Lukang Shi, Jiawei Zheng, Manjunath Kareppagoudr, Yi Zhang, Kazuki Sobue, K. Hamashita, K. Tomioka, G. Temes","doi":"10.1109/A-SSCC53895.2021.9634732","DOIUrl":"https://doi.org/10.1109/A-SSCC53895.2021.9634732","url":null,"abstract":"The sensors in real-time data processing IoT devices require high resolution and sub-MHz data converters. They are often implemented as Incremental ADCs (IADCs) [1], [2] because of their efficient oversampling technique and low latency. In discrete-time incremental ADCs (DT-IADC) [2], the nonlinearity and charge injection of the sampling switch degrades the performance, and power hungry opamps are needed to provide fast and accurate settling for the switched capacitor circuits. These limitations may prevent the use of DT-IADCs in high resolution and wide bandwidth (BW) applications. Continuous-time incremental ADCs (CT-IADC) overcome these issues by removing the sampling switches, and CT integrators allow relaxed specifications for the opamps settling accuracy to save power. Hence, CT-IADCs enable higher resolution, faster conversion speed with lower power consumption.","PeriodicalId":286139,"journal":{"name":"2021 IEEE Asian Solid-State Circuits Conference (A-SSCC)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129833106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Sort-Less FPGA-Based Non-Maximum Suppression Accelerator using Multi-Thread Computing and Binary Max Engine for Object Detection","authors":"Chaoming Fang, Habib Derbyshire, Wenyu Sun, Jinshan Yue, Haobing Shi, Yongpan Liu","doi":"10.1109/A-SSCC53895.2021.9634708","DOIUrl":"https://doi.org/10.1109/A-SSCC53895.2021.9634708","url":null,"abstract":"Non-Maximum Suppression (NMS) algorithm is an important post-processing step in object detection networks for various applications [1]. Standard NMS procedure suffers from poor time complexity and large power consumption due to its iterative and greedy search procedure, making it a bottleneck for object detection networks implemented on various processors [2], [3]. Previous NMS accelerators achieved optimization by stacking arithmetic logical units or computing consecutive iterations simultaneously [4] –[6]. However, several challenges exist, as shown in Fig. 1. First, the highly iterative process of NMS will either cause a high time or space complexity if the hardware resources are not designed properly. Second, the standard NMS process requires sorting of the bounding boxes by the score, and such sorting circuits occupy abundant resources and produce massive data movements. Finally, the Intersection Over Union (IOU) calculation requires hardware unfriendly operations like multiplication and division, taking up loads of valuable hardware resources such as DSPs.","PeriodicalId":286139,"journal":{"name":"2021 IEEE Asian Solid-State Circuits Conference (A-SSCC)","volume":"19 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128396177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Feedback Architecture of High Speed True Random Number Generator based on Ring Oscillator","authors":"Xin Cheng, Haowen Zhu, Xinyi Xing, Yunfeng Zhang, Yongqiang Zhang, Guangjun Xie, Zhang Zhang","doi":"10.1109/A-SSCC53895.2021.9634760","DOIUrl":"https://doi.org/10.1109/A-SSCC53895.2021.9634760","url":null,"abstract":"True random number generators (TRNG) are widely used to generate encryption keys in information security systems [1]–[2]. In TRNG, entropy source is a critical module who provides the source of randomness of output bit stream. The unavoidable electrical noise in circuit becomes an ideal entropy source due to its unpredictability. Among the methods of capturing electrical noise, ring oscillator-based entropy source makes the TRNG most robust to deterministic noise and 1/f noise which means the strongest anti-interference capability, so it is simple in structure and easy to integrate [3]. Thus, great research attention has focused on ring oscillator-based TRNGs [3] –[7]. In [4], a high-speed TRNG with 100Mbps output bit rate was proposed, but it took up too much power and area. A TRNG based on tetrahedral ring oscillator was proposed in [5]. Its power consumption was very low but the output bit rate was also very low. A ring oscillator-based TRNG with low output bit rate but high power was proposed in [7]. In a word, none of the above architectures achieve an appropriate compromise between bit rate and power consumption. This work presents a new feedback architecture of TRNG based on tetrahedral ring oscillator. The output random bit stream generates a relative random control voltage that acts on the transmission gates in oscillator through a feedback loop, thus increasing phase jitter of the oscillator and improving output bit rate. Furthermore, an XOR chain-based post-processing unit is added to eliminate the statistical deviations and correlations between raw bits.","PeriodicalId":286139,"journal":{"name":"2021 IEEE Asian Solid-State Circuits Conference (A-SSCC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125873101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zain Taufique, Bingzhao Zhu, G. Coppola, Mahsa Shoaran, Wala Saadeh, Muhammad Awais Bin Altaf
{"title":"An 8.7 μJ/class. FFT accelerator and DNN-based configurable SoC for Multi-Class Chronic Neurological Disorder Detection","authors":"Zain Taufique, Bingzhao Zhu, G. Coppola, Mahsa Shoaran, Wala Saadeh, Muhammad Awais Bin Altaf","doi":"10.1109/A-SSCC53895.2021.9634763","DOIUrl":"https://doi.org/10.1109/A-SSCC53895.2021.9634763","url":null,"abstract":"Chronic Neurological Disorders (CNDs) such as epilepsy [1], [2], migraine [3], and autism [4] can be persistent for extensive periods. Untreated CNDs may lead to perpetual debilities. Therefore, it is crucial to diagnose them at an early stage to perform a timely, meaningful intervention. A routine medical checkup often cannot provide the timely mediation required for CNDs. A chronic attack consists of pre-ictal, ictal, and post-ictal stages, while an effective intervention necessitates CND detection and remedial response during the pre-ictal stage. Therefore, monitoring CNDs 24/7 is crucial, irrespective of patient’s location and clinical state. The electroencephalogram (EEG) is utilized for monitoring and detection of most CNDs in a wearable environment [1]–[6].","PeriodicalId":286139,"journal":{"name":"2021 IEEE Asian Solid-State Circuits Conference (A-SSCC)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126015728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alican Çağlar, S. V. Winckel, S. Brebels, P. Wambacq, J. Craninckx
{"title":"A 4.2mW 4K 6-8GHz CMOS LNA for Superconducting Qubit Readout","authors":"Alican Çağlar, S. V. Winckel, S. Brebels, P. Wambacq, J. Craninckx","doi":"10.1109/A-SSCC53895.2021.9634832","DOIUrl":"https://doi.org/10.1109/A-SSCC53895.2021.9634832","url":null,"abstract":"Millions of qubits need to be employed in a quantum computer to achieve a fault-tolerant quantum operation. To reduce the complexity in such a large-scale system, the control and readout circuitries have been proposed to be placed at the 4 K stage of dilution refrigerators [1]. CMOS technology is commonly used to leverage its scaling to enable large integration of control and readout circuitries with qubits. However, the high-fidelity readout operations require low noise amplifiers (LNAs) with a noise temperature of a few Kelvins. This necessitates the usage of HEMT and parametric amplifiers [2]. Recently reported CMOS LNAs are still far away from attaining such good performance [3–5]. Thus, this is one of the greatest challenges on the way to the fully integrated CMOS readout. Additionally, due to the limited cooling power of dilution refrigerators, low-power solutions are needed for achieving a very good noise performance at 4 K. This paper presents a 28 nm CMOS LNA for qubit readout, which achieves an order of magnitude power reduction compared to its CMOS counterparts while still providing a similar good noise Figure (NF) performance at 4 K.","PeriodicalId":286139,"journal":{"name":"2021 IEEE Asian Solid-State Circuits Conference (A-SSCC)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122558331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Arithmetic Progression Switched-Capacitor DC-DC Converter with Soft VCR Transitions Achieving 93.7% Peak Efficiency and 400 mA Output Current","authors":"Yang Jiang, M. Law, Pui-in Mak, R. Martins","doi":"10.1109/A-SSCC53895.2021.9634798","DOIUrl":"https://doi.org/10.1109/A-SSCC53895.2021.9634798","url":null,"abstract":"Dynamic source adaptation and supply modulation can benefit the power efficiency and system functionality of energy-harvesting interfaces, voltage-scalable SoCs, device drivers, power amplifiers, and others. A switched-capacitor (SC) DC-DC converter can achieve high power conversion efficiency (PCE) and power density at the hundreds-of-mW. Several reconfigurable SC topologies emerged to generate multiple voltage conversion ratios (VCRs) systematically with lower conduction and parasitic losses in steady state [1]–[4]. However, during VCR transitions, the voltage imbalance among the flying capacitors (CFLY) can induce charge redistribution loss. This hard-VCR-transition loss inevitably hurts the overall efficiency and remains unresolved. This work proposes an arithmetic progression (AP) SC DC-DC converter topology for systematic rational VCR generation while featuring soft VCR transitions. It demonstrates fixed voltages with each CFLY irrespective of VCR change to eliminate the CFLY voltage rebalance effect. The proposed AP topology also achieves theoretical optimum in terms of the steady-state slow-/fast switching-limited losses. Due to the inherent property of two-phase quasi-symmetric output charge (QOUT) delivery, it ensures a low output ripple without using a conventional dual-branch converter architecture. We further propose a cross-coupled bootstrapping (XCBS) gate driver, operating at half of switching frequency (fSW/2), to control the flying power switches adaptively. Realizing step-down VCRs of 5:4/3/2/1, the proposed AP converter reaches a measured peak PCE of 93.7% and a maximum output current of 400 mA. Featuring soft VCR transitions, it demonstrates an average PCE of up to 89% under a periodic VCR transition (fVCR_tran) at 100 kHz.","PeriodicalId":286139,"journal":{"name":"2021 IEEE Asian Solid-State Circuits Conference (A-SSCC)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122649445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiangrong Huang, Haikun Jia, Shengnan Dong, W. Deng, Zhihua Wang, B. Chi
{"title":"A 24-30GHz 4-Element Phased Array Transceiver with Low Insertion Loss Compact T/R Switch and Bidirectional Phase Shifter in 65 nm CMOS Technology","authors":"Xiangrong Huang, Haikun Jia, Shengnan Dong, W. Deng, Zhihua Wang, B. Chi","doi":"10.1109/A-SSCC53895.2021.9634813","DOIUrl":"https://doi.org/10.1109/A-SSCC53895.2021.9634813","url":null,"abstract":"The 5G technology greatly expands the field of mobile communication by its high data rate and low latency. The performance improvement of 5G necessities a variety of different technologies including phased array technique. A T/R switch is widely used in phased arrays to reduce the number of antennas. It is one of the most critical modules in the phased array since it influences the performance of the output power for TX and noise Figure (NF) for RX directly. Conventional T/R switches are based on $lambda /4$ transmission line, which is area consuming in lower millimeter-wave frequency range [1]. Recently, lumped equivalent transmission lines based on inductors [2] and transformers [3] are employed to reduce the chip area. However, the lumped transmission lines are usually narrow-band. Furthermore, the inductors or transformers still occupy extra chip area. To address this issue, a compact T/R switch co-designed with PA’s output match network and LNA’s input match network is proposed. Leveraging the existing transformers in the two match-networks, only one extra transistor switch is needed in this compact T/R switch, which greatly reduces the chip area consumption and therefore the insertion loss.","PeriodicalId":286139,"journal":{"name":"2021 IEEE Asian Solid-State Circuits Conference (A-SSCC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117304741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A 389TOPS/W, 1262fps at 1Meps Region Proposal Integrated Circuit for Neuromorphic Vision Sensors in 65nm CMOS","authors":"S. Bose, A. Basu","doi":"10.1109/A-SSCC53895.2021.9634734","DOIUrl":"https://doi.org/10.1109/A-SSCC53895.2021.9634734","url":null,"abstract":"Neuromorphic vision sensors (NVS) [1] are key enablers in traffic monitoring and surveillance systems that exploit the temporal redundancy in video streams to get $gt 2mathrm{X}$ energy savings by blank frame detection (Fig. 1). Such concept of event driven processing has been used to reduce system energy for regular cameras as well [2]. However, an object typically occupies a fraction of the full image frame (Fig. 1) leading to a significant spatial redundancy in the image. Hence, an energy-efficient hardware is required to detect the region of interests (RoIs) in the valid frames to trigger an object recognition engine only for the RoIs. For a binary image, the region proposal can be performed by the connected component labeling (CCL) algorithm [2]. However, CCL scans the image in a raster fashion to calculate the ROIs leading to longer execution time and higher energy dissipation due to enormous data transfer. On the contrary, emerging in-memory [3], [4] and near-memory [5] computing approaches are a way to eliminate the data transfer cost and latency, promising further energy savings. In this paper, we propose 9T-SRAM based near and in-memory computing region proposal integrated circuit (RPIC) leveraging the $1 -mathrm{D}$ projections of the objects on the vertical and horizontal axes. Further, we propose an iterative and selective search (ISS) algorithm to overcome overlapped projections among objects and provide an accurate number of objects and their exacts coordinates.","PeriodicalId":286139,"journal":{"name":"2021 IEEE Asian Solid-State Circuits Conference (A-SSCC)","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124632772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ziyu Li, Weiwei Shan, Chengjun Wu, Haitao Ge, Jun Yang
{"title":"An Efficient and Reliable Negative Margin Timing Error Detection for Neural Network Accelerator without Accuracy Loss in 28nm CMOS","authors":"Ziyu Li, Weiwei Shan, Chengjun Wu, Haitao Ge, Jun Yang","doi":"10.1109/A-SSCC53895.2021.9634809","DOIUrl":"https://doi.org/10.1109/A-SSCC53895.2021.9634809","url":null,"abstract":"Energy-efficient neural network (NN) accelerators are essential for IoT and mobile applications, where PVT variations become severe especially in near-threshold voltage (NTV) range. Recent work [1]–[4] applied error detection and correction (EDAC) based adaptive voltage frequency scaling (AVFS) on NN accelerators to eliminate the excess timing margins while decreasing power supply until detecting timing violations (Fig. 1). By using the fault tolerance of NN accelerators to avoid the error correction, they increased energy efficiency a lot. However, NN has limited tolerance to timing errors since a little timing errors will cause serious loss of accuracy, for example, up to 3% accuracy loss in MNIST [2]. Body swapping and adaptive clock techniques have also been adopted to reduce the accuracy loss [3– 4]. Traditional AVFS system monitors the most critical paths and then decreases the voltage until reaching point of first failure (PoFF). However, NN accelerator’s critical paths have distinct characteristics from conventional circuits that makes common EDAC not efficient and risky when applied in NN.","PeriodicalId":286139,"journal":{"name":"2021 IEEE Asian Solid-State Circuits Conference (A-SSCC)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124606964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A 4.39ps, 1.5GS/s Time–to-Digital Converter with 4× Phase Interpolation Technique and a 2-D Quantization Array","authors":"Yongkuo Ma, Peiyuan Wan, Hongda Zhang, Zhi Wan, Xiaoyu Zhang, Xu Liu, Zhijie Chen","doi":"10.1109/A-SSCC53895.2021.9634753","DOIUrl":"https://doi.org/10.1109/A-SSCC53895.2021.9634753","url":null,"abstract":"With the shrinking supply voltages and scaling process, time-based circuit is becoming more attractive in ultra-deep submicron mixed-signal circuit design compared with the traditional voltage-domain circuits. A time-to-digital converter (TDC) is the key component in time-based circuits, which is used to quantize the time interval between two rising edges, Start and Stop signal. The TDCs are widely used in frequency generation (digital phase-locked loop) [1], data conversion (time-based ADC) [2] and energy-efficient neural network acceleration. The most elementary TDC is the delay-line TDC, which is also the essential component of other TDCs [3] [4], having the merit of simple-structure and low-power. However, the limitation of the minimum intrinsic delay of a single delay-element makes the realization of a high-resolution delay-line TDC difficult. Moreover, its area and power consumption increase exponentially with the quantization bits, while conversion speed is opposite. To resolve this problem, this paper proposed a novel Phase Interpolation time-to-digital converter (PI-TDC) with 2-dimensional quantization array and multiplex delay line technique.","PeriodicalId":286139,"journal":{"name":"2021 IEEE Asian Solid-State Circuits Conference (A-SSCC)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116041040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}