Shifu Wu, K. D. Silva, Snehlata Gutgutia, B. Baas, Massimo Alioto
{"title":"A 1448-Mpixel/s, 84-pJ/Pixel Display Stream Compression Encoder in 28 nm for 4K Video Resolution","authors":"Shifu Wu, K. D. Silva, Snehlata Gutgutia, B. Baas, Massimo Alioto","doi":"10.1109/A-SSCC53895.2021.9634771","DOIUrl":"https://doi.org/10.1109/A-SSCC53895.2021.9634771","url":null,"abstract":"In this work, an energy- and area-efficient Display Stream Compression (DSC) encoder architecture is proposed for energy-constrained systems driving high-resolution internal/external display (e.g., virtual reality headsets, smartphones). As main motivation, relentlessly higher resolutions in video displays (e.g., 4K, 8K) require very high uncompressed processor-display data rates of 30 Gbps for 4K (120 Gbps for 8K) at 120 frames per second (fps) and 10 bits per component (bpc). The data transfer bandwidth to the display cannot keep pace with such demand, making compression a necessity.","PeriodicalId":286139,"journal":{"name":"2021 IEEE Asian Solid-State Circuits Conference (A-SSCC)","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116509617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A 196.2 dBc/Hz FOMT 16.8-to-21.6 GHz Class-F23 VCO with Transformer-Based Optimal Q-factor Tank in 65-nm CMOS","authors":"Feifan Hong, Tianao Ding, Dixian Zhao","doi":"10.1109/A-SSCC53895.2021.9634703","DOIUrl":"https://doi.org/10.1109/A-SSCC53895.2021.9634703","url":null,"abstract":"For the increasing demand of high data rate and wide coverage in high-quality satellite communication, the frequency synthesizer is expected to deliver wide tuning range (TR) and pure spectrum with low power consumption. In order to lower the phase noise (PN), transformer-based, trifilar-coil, and multi-core VCO topologies have emerged in recent years [1–4]. However, at millimeter-wave (mm-Wave) bands, TR becomes narrow and the Q-factor of resonance tank becomes low as the parasitic effect increases, especially for complicated trifilar-coil tank. It severely restricts VCOs’ figure-of-merit (FOM) as shown in Fig. 1. In terms of low power design, single-core VCO utilizing high-order tank to realize waveform shaping exhibits low PN, such as the Class-F topology. Figure 1 shows the conventional two-port Class-F VCO in [1]. The employment of the 1$:mathrm{n}(mathrm{n}gt1)$ transformer amplifies voltage at gate, resulting in transistor entering triode region deeply. Thick-oxide devices are used to withstand large voltage swing, which may decrease switching speed and introduce additional noise. Besides, Q-factor of multi-turn transformer deteriorates at mm-Wave bands.","PeriodicalId":286139,"journal":{"name":"2021 IEEE Asian Solid-State Circuits Conference (A-SSCC)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124341441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seokhyeon Jeong, Yejoong Kim, Yuyang Li, Inhee Lee
{"title":"A Millimeter-Scale Computing System with Adaptive Dynamic Load Power Tracking","authors":"Seokhyeon Jeong, Yejoong Kim, Yuyang Li, Inhee Lee","doi":"10.1109/A-SSCC53895.2021.9634758","DOIUrl":"https://doi.org/10.1109/A-SSCC53895.2021.9634758","url":null,"abstract":"A computing system has been continuously miniaturized, and larger number of small IoT devices become a part of our ubiquitous lives. Recently, the size of the systems reached down to a millimeter scale, and they demonstrate new sensing approaches in ecological, biomedical, and security applications [1] –[3]. As an example, Fig. 1(a) shows a millimeter-scale layer-stacked system, constructed by vertically stacking bare die, maximizing the planar circuit area for a given volume. This platform miniaturizes a computing system down to a millimeter scale mainly by avoiding individually packaged discrete components [4]. The miniaturization results in reduced battery capacity, limiting energy available for the integrated circuits. For example, a 9.8-mm2 thin-film lithium battery stores charge of only $15 mu$ Ah, allowing the average current draw of 21nA (84nW) for system lifetime of 1 month [5]. The lithium battery provides higher voltage (e.g., 4V) than what a standard CMOS transistor can tolerate, requiring a high-efficient Power Management Unit (PMU) that converts the battery voltage to lower voltages (e.g., 1.5V). Due to size constraint, PMUs for the small systems have been designed with on-chip capacitors instead of bulky discrete inductors [4], [6].","PeriodicalId":286139,"journal":{"name":"2021 IEEE Asian Solid-State Circuits Conference (A-SSCC)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127065814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kwangho Lee, W. Jung, Haram Ju, Jinhyung Lee, D. Jeong
{"title":"A 48 Gb/s PAM4 receiver with Baud-rate phase-detector for multi-level signal modulation in 40 nm CMOS","authors":"Kwangho Lee, W. Jung, Haram Ju, Jinhyung Lee, D. Jeong","doi":"10.1109/A-SSCC53895.2021.9634775","DOIUrl":"https://doi.org/10.1109/A-SSCC53895.2021.9634775","url":null,"abstract":"Recently, a receiver (RX) is demanding a high bandwidth data rate. Multi-level signals such as four-level pulse amplitude modulation (PAM-4) are more advantageous than two-level PAM (PAM-2) to meet the required bandwidth. However, the multi-level signals reduce the amplitude of the main cursor (ho), are more affected by inter-symbol interference (ISI), and especially pre-cursor ISI is hard to cancel on RX. Thus, the RX needs a phase detector (PD) that controls pre-cursor ISI to obtain the bit-error rate (BER). On the other hand, a clock and data recovery (CDR) utilizes a Mueller-Muller PD (MMPD) as a Baud-rate PD (BRPD) for power efficiency and reduced clock overhead. However, the MMPD moves a lock point where a first-tap pre-cursor ISI (h-1) becomes zero with an adaptive decision feedback equalizer (DFE) [1]. It makes the CDR vulnerable to noise or causes the lock point to drift. To move the lock point $h_{1}=h_{-1}neq 0$, PDs which add a phase offset are proposed [2],[7]. However, it does not secure a vertical eye margin (VEM) in the multi-level signal, although the adaptive DFE exists. In this paper, a BRPD that is more compatible with multi-level is proposed. The PD locks a point that h0 becomes $mathrm{N}_{mathrm{t}}cdot h_{-1}$ where $mathrm{N}_{mathrm{t}}$ is a target cursor ratio. The $mathrm{N}_{mathrm{t}}$ secures a VEM by controlling h-1 states. Furthermore, the lock point is independent of post-cursor ISIs, and thus, the PD with an adaptive DFE has a unique lock point.","PeriodicalId":286139,"journal":{"name":"2021 IEEE Asian Solid-State Circuits Conference (A-SSCC)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125719536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"1.55mW 2GHz ERBW 7b 800MS/s 3-stage Pipelined SAR ADC in 28nm CMOS using a Kickback-Cancelling 7T-Dynamic Residue Amplifier with only 16fF Input Capacitance","authors":"Hyeonsik Kim, Seonkyung Kim, Jintae Kim","doi":"10.1109/A-SSCC53895.2021.9634748","DOIUrl":"https://doi.org/10.1109/A-SSCC53895.2021.9634748","url":null,"abstract":"Achieving higher effective resolution bandwidth (ERBW) beyond Nyquist frequency is a key design requirement for the slice ADC design in a time-interleaved ADC (TIADC). While SAR ADC is a popular choice for a low-power ADC in advanced CMOS processes, the input capacitance $(mathrm{C}_{in})$ presented by the front-end capacitive DAC (CDAC) to the ADC input limits achievable signal bandwidth. In contrast, pipelined SAR ADCs has more freedom in choosing Cin because the resolution of the 1st stage CDAC can be much lower than the total resolution. Therefore, it is possible to reduce Cin to the thermal noise limit without being limited by the minimum unit capacitance. The downside of the pipelined SAR ADCs is the necessity of a residue amplifier, which often dominates the total power consumption. One can consider using a dynamic amplifier (DA) as a residue amplifier because achieving both high speed and low power is possible when the desired gain is modest [1–2]. Being an open-loop and fully-dynamic, however, the DA suffers from the gain inaccuracy and is vulnerable to the kickback noise. Furthermore, the gain varies significantly over process and temperature. [3] and [4] attempt to solve this issue by temperature-tracking bias but such compensation method requires off-chip and temperature-dependent voltage or resistor to tune the process uncertainty.","PeriodicalId":286139,"journal":{"name":"2021 IEEE Asian Solid-State Circuits Conference (A-SSCC)","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126080660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A 1.92GHz-3.84GHz 0.74ps-1.09ps-Jitter Inductor-less Injection-Locked Frequency Synthesizer with Automatic Frequency Selection and Timing Alignment","authors":"Khoi T. Phan, Y. Chao, H. Luong","doi":"10.1109/A-SSCC53895.2021.9634811","DOIUrl":"https://doi.org/10.1109/A-SSCC53895.2021.9634811","url":null,"abstract":"Sub-harmonic injection-locked technique has become popular to suppress the phase noise and to improve the jitter performance of ring oscillators in inductor-less PLLs [1]. However, conventional injection-locked PLLs (IL-PLLs) require a frequency divider chain for frequency selection and thus suffer from frequency misalignment and imperfection injection timing, which results in jitter degradation and even locking failure [2]. To solve this problem, a dedicated delay cell is used for frequency alignment [3] at a cost of extra power and chip area. Frequency misalignment is calibrated using the quadrature VCO in [4], but its performance is limited by the quadrature error and power consumption.","PeriodicalId":286139,"journal":{"name":"2021 IEEE Asian Solid-State Circuits Conference (A-SSCC)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122718379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jixuan Li, Jiabao Chen, Ka-Fai Un, Wei-Han Yu, Pui-in Mak, R. Martins
{"title":"A 50.4 GOPs/W FPGA-Based MobileNetV2 Accelerator using the Double-Layer MAC and DSP Efficiency Enhancement","authors":"Jixuan Li, Jiabao Chen, Ka-Fai Un, Wei-Han Yu, Pui-in Mak, R. Martins","doi":"10.1109/A-SSCC53895.2021.9634838","DOIUrl":"https://doi.org/10.1109/A-SSCC53895.2021.9634838","url":null,"abstract":"Convolutional neural network (CNN) models, e.g. MobileNetV2 [1] and Xception, are based on depthwise separable convolution. They exhibit over $40 times(64 times)$ reduction of the number of parameters (operations) when compared to the VGG16 for the ImageNet inference, while maintaining the TOP-1 accuracy at 72 %. With an 8-bit quantization, the required memory for storing the model can be further compressed by $4 times$. This multitude of model sizes compression facilitates real-time complex machine learning tasks implemented on a low-power FPGA apt for Internet-of-Things edge computation. Previous effect [2] has improved its computational energy efficiency by exploiting the model sparsity, but the effectiveness drops in already-compressed modern CNN models. As a result, further advancing the CNN accelerator’s energy efficiency with new techniques is desirable. [3] is a scalable adder tree for energy-efficient depthwise separable convolution computation, and [4] is a frame-rate enhancement technique; both failed to handle the extensive memory access during separable convolution that dominates the power consumption of the CNN accelerators. Herein we propose a double-layer multiply-accumulate (MAC) scheme to evaluate two layers within the bottleneck layer in a pipelining manner. It results significant reduction of the memory access to the feature maps. On top of that we also innovate a double-operation digital signal processor (DSP) to enhance the throughput of the accelerator by benefiting the use of a high-precision DSP for computing two fixed-point operations in one clock cycle.","PeriodicalId":286139,"journal":{"name":"2021 IEEE Asian Solid-State Circuits Conference (A-SSCC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133366747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A 10 GHz Dual-Loop PLL with Active Noise Cancellation Achieving 12dB Spur and 29% Noise Reduction","authors":"Yu-Sian Lu, Cheng-Lung Lee, Wei-Zen Chen","doi":"10.1109/A-SSCC53895.2021.9634835","DOIUrl":"https://doi.org/10.1109/A-SSCC53895.2021.9634835","url":null,"abstract":"PLL-based frequency synthesizers with low phase noise and high frequency stability are essential for the next generation wireline and wireless communication systems. In the past, circuit techniques for in band noise suppression have drawn many research efforts, such as using reference injection [1] or phase noise cancellation through a delayed-discriminator based phase detector [3]. The injection locked PLLs (IL-PLL) count on a precise injection timing control to avoid generating high frequency spurs [1]. On the other hand, phase noise cancellation PLLs (PNC-PLL) require a sufficiently long delay time for the low frequency noise detection, and are more appealing for ring-oscillator based PLLs (RO-PLL) where the intrinsic in band noise is relatively high [3]. Both of them are limited by the noise floor of the reference signal, and cannot counteract critical aggressors close to or even higher than the reference frequencies that may encounter in SoC integration. To suppress the out band noise, active noise cancellation with extensive calibration is required [1][4]. The gain and delay matching between the aggressor and noise cancellation paths are vital to the existing techniques. Besides, it demands a stringently low noise level of the auxiliary circuitries to avoid deteriorating the in band noise floor.","PeriodicalId":286139,"journal":{"name":"2021 IEEE Asian Solid-State Circuits Conference (A-SSCC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128981610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A 0.46pJ/bit Ultralow-Power Entropy-Preselection-Based Strong PUF with Worst-Case BER<6.7×10-6","authors":"Jiahao Liu, Yan Zhu, Chi-Hang Chan, R. Martins","doi":"10.1109/A-SSCC53895.2021.9634795","DOIUrl":"https://doi.org/10.1109/A-SSCC53895.2021.9634795","url":null,"abstract":"Internet of things (IoT) devices become ubiquitous, interconnected platforms for everyday tasks, which dictate a growing demand for low-cost security primitives. Physically Unclonable Functions (PUFs) are one of the promising solutions for low-cost key storage and device authentication, where strong PUFs [1–5] are suitable for authentication due to the exponentially large challenge-response pairs (CRPs) space. Early strong PUFs were vulnerable to machine learning (ML) attacks [3], [4], while [1], [2], [5] introduce various nonlinear entropy cells to enhance resilience. However, they all suffer from low energy efficiency because many trivial entropy cells need to be activated for sufficient nonlinearity. Besides, with many enabled cells, a small number of challenge bits flipping only imposes a very small probability for the change on the final response, resulting in a poor standard deviation on their Hamming Weight (HW).","PeriodicalId":286139,"journal":{"name":"2021 IEEE Asian Solid-State Circuits Conference (A-SSCC)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133612651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}