Raghavan Kumar, Vikram B. Suresh, M. Anders, S. Hsu, A. Agarwal, V. De, S. Mathew
{"title":"An 8.3-to-18Gbps Reconfigurable SCA-Resistant/Dual-Core/Blind-Bulk AES Engine in Intel 4 CMOS","authors":"Raghavan Kumar, Vikram B. Suresh, M. Anders, S. Hsu, A. Agarwal, V. De, S. Mathew","doi":"10.1109/ISSCC42614.2022.9731739","DOIUrl":"https://doi.org/10.1109/ISSCC42614.2022.9731739","url":null,"abstract":"Power and electromagnetic (EM) side-channel attacks (SCA) exploit data-dependent power consumption from cryptographic engines to extract embedded secret keys. While series-connected voltage regulators [1], [2] and arithmetic countermeasures like heterogenous Galois-field arithmetic [3] provide acceptable levels of side-channel leakage suppression, they cannot defend against determined adversaries. Random additive masking [4] on the other hand, provides a provably-secure solution [5] that disrupts first-order correlations between measured power/EM signatures and secret keys, while incurring $2times$ overhead in area and power consumption. In this paper, we demonstrate a reconfigurable AES accelerator fabricated in Intel 4 CMOS process with minimum-time-to-disclosure (MTD) $> 1text{B power}/text{EM}$ traces in on-demand SCA-resistant mode, while providing a $2.2times$ boost in encryption performance during a dual-core mode of operation (Fig. 34.4.1). When coupled with side-channel attack detection techniques [6], [7], this approach allows the user to operate at $> 2times$ AES throughput during the safe mode of operation in trusted environments, with the ability to quickly trade-off throughput for a higher level of SCA-resistance when the onset of an attack is detected. In the blind-bulk mode of operation, the accelerator randomly switches at a user-specified rate between SCA-resistant and dual-core modes while encrypting bulk data, providing $1.14-text{to}-1.6times$ boost in encryption throughput with measured MTD $> 50mathrm{M}$ traces.","PeriodicalId":6830,"journal":{"name":"2022 IEEE International Solid- State Circuits Conference (ISSCC)","volume":"198 1","pages":"1-3"},"PeriodicalIF":0.0,"publicationDate":"2022-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84960940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A 26-to-39GHz Broadband Ultra-Compact High-Linearity Switchless Hybrid N/PMOS Bi-Directional PA/LNA Front-End for Multi-Band 5G Large-Scaled MIMO System","authors":"Jeong-Min Park, Hua Wang","doi":"10.1109/ISSCC42614.2022.9731651","DOIUrl":"https://doi.org/10.1109/ISSCC42614.2022.9731651","url":null,"abstract":"The continuous growth of data-rates has stimulated the rapid development of 5G New Radio (NR) in the mm-wave FR2 bands (above 24GHz). Consequently, to compensate for the mm-wave high path loss, large-scaled MIMO arrays have become essential. This calls for compact high-performance mm-wave 5G front-end electronics to integrate many MIMO channels on the same chip for low cost and low form factor. A main challenge for mm-wave 5G MIMOs is to integrate both front-end transmitter (TX) and receiver (RX) chains in each array pixel with a minimum silicon area to form a co-apertured low-cost array [1]. The conventional TRX architecture often consists of a PA and an LNA placed in parallel and combined by a T/R switch to control the TX/RX mode. Although this topology eases the design, it faces chip area increase due to many separate matching networks for the PA/LNA/switch, as well as the switch loss that degrades the PA output power (Pout) and LNA noise figure (NF). On the other hand, though bi-directional mm-wave front-ends are gaining popularity, existing designs only show narrow bandwidth and very limited PA Pout and efficiency.","PeriodicalId":6830,"journal":{"name":"2022 IEEE International Solid- State Circuits Conference (ISSCC)","volume":"11 1","pages":"322-324"},"PeriodicalIF":0.0,"publicationDate":"2022-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84065721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A 78.8fJ/b/mm 12.0Gb/s/Wire Capacitively Driven On-Chip Link Over 5.6mm with an FFE-Combined Ground-Forcing Biasing Technique for DRAM Global Bus Line in 65nm CMOS","authors":"Sangyoon Lee, Jaekwang Yun, Suhwan Kim","doi":"10.1109/ISSCC42614.2022.9731653","DOIUrl":"https://doi.org/10.1109/ISSCC42614.2022.9731653","url":null,"abstract":"Advances in virtual reality, artificial intelligence, and big data have increased demand for high-bandwidth memory. Accordingly, pre-fetch sizes have also increased with DRAM generations, meaning an increased number of global bus lines. An increase to this number is limited as it also increases the chip size; instead, the data-rate per lane can be increased for higher throughput [1]. As the global bus lines are on-chip wires in a DRAM chip, they can be driven capacitively. Prior work [2], [3] has shown the superior efficiency of capacitive drivers, over conventional repeaters, in driving on-chip wires at the cost of a reduced voltage swing. However, as there is no well-defined DC level on the capacitively-driven wires [4], wire biasing is fraught with implementation challenges [3]. To define the DC potential on the interconnect, prior work sent signals differentially [2], [4], [5] or dissipated static power to define the DC level [3]. Unfortunately, these approaches may not be preferable for DRAM chips that require dense and energy-efficient data transfers.","PeriodicalId":6830,"journal":{"name":"2022 IEEE International Solid- State Circuits Conference (ISSCC)","volume":"31 1","pages":"454-456"},"PeriodicalIF":0.0,"publicationDate":"2022-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82573420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vikram B. Suresh, Chandra S. Katta, Srinivasan Rajagopalan, Tao Zhou, A. K. Patel, Raju Rakha, Nikhil Krishna Gopalakrishna, S. Mathew, A. Hukkoo
{"title":"Bonanza Mine: an Ultra-Low-Voltage Energy-Efficient Bitcoin Mining ASIC","authors":"Vikram B. Suresh, Chandra S. Katta, Srinivasan Rajagopalan, Tao Zhou, A. K. Patel, Raju Rakha, Nikhil Krishna Gopalakrishna, S. Mathew, A. Hukkoo","doi":"10.1109/ISSCC42614.2022.9731547","DOIUrl":"https://doi.org/10.1109/ISSCC42614.2022.9731547","url":null,"abstract":"Bitcoin is the leading blockchain-based cryptocurrency used to facilitate peer-to-peer transactions without relying on a centralized clearing house [1]. The conjoined process of transaction validation and currency minting, known as mining, employs the compute-intensive SHA256 double hash as proof-of-work. The one-way property of SHA256 necessitates a brute-force search by sweeping a 32b random input value called nonce. The 232 nonce space search results in energy-intensive pool operations distributed on high-throughput mining systems, executing parallel nonce searches with candidate Merkle roots. Energy-efficient custom ASICs are required for cost-effective mining, where energy costs dominate operational expenses, and the number of hash engines integrated on a single die govern platform cost and peak mining throughput [2]. In this paper, we present BonanzaMine, an energy-efficient mining ASIC fabricated in 7nm CMOS (Fig. 21.3.7), featuring: (i) bitcoin-optimized look-ahead message digest datapath resulting in 33% Cdyn reduction compared to conventional SHA256 digest datapath; (ii) a half-frequency scheduler datapath, reducing sequential and clock power by 33%; (iii) 3-phase latch-based design with stretchable non-overlapping clocks, eliminating min-delay paths; (iv) robust ultra-low-voltage operation at 355mV using board-level voltage-stacking; and (v) mining throughput of 137GHash/s at an energy efficiency of 55J/THash.","PeriodicalId":6830,"journal":{"name":"2022 IEEE International Solid- State Circuits Conference (ISSCC)","volume":"42 1","pages":"354-356"},"PeriodicalIF":0.0,"publicationDate":"2022-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80939988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ying Liu, Zhixuan Wang, W. He, Linxiao Shen, Yihan Zhang, Peiyu Chen, Meng Wu, Hao Zhang, Peng Zhou, Jinguang Liu, Guangyu Sun, Jiayoon Ru, Le Ye, Ru Huang
{"title":"An 82nW 0.53pJ/SOP Clock-Free Spiking Neural Network with 40µs Latency for AloT Wake-Up Functions Using Ultimate-Event-Driven Bionic Architecture and Computing-in-Memory Technique","authors":"Ying Liu, Zhixuan Wang, W. He, Linxiao Shen, Yihan Zhang, Peiyu Chen, Meng Wu, Hao Zhang, Peng Zhou, Jinguang Liu, Guangyu Sun, Jiayoon Ru, Le Ye, Ru Huang","doi":"10.1109/ISSCC42614.2022.9731795","DOIUrl":"https://doi.org/10.1109/ISSCC42614.2022.9731795","url":null,"abstract":"Human brain is a natural ultimate-event-driven (UED) system with low power and real-time response-ability, thanks to the asynchronous propagation and processing of spikes. Power dissipation and latency are major concerns in AloT devices, usually operating in random-sparse-event (RSE) scenarios (Fig. 22.7.1, top). Being event-driven on the system level, an always-on wake-up system (WUS) detects the valid RSEs energy-efficiently and intelligently, and upon detection turns on the power-hungry high-performance system (HPS). Being event-driven on the module level, a prior WUS [1] uses asynchronous feature extraction and synchronous convolutional neural network to detect the RSEs, consuming 148nW-to-1.68µW with 348ms latency. On the circuit level, the Spiking Neural Network (SNN) gives natural event-driven property. However, the prior SNN works did not fully explore this nature. An SNN circuit [2] achieves keyword spotting task at 205nW-to-570nW, but the framing method causes 100ms latency and is not true real-time. The SNN core in [5] uses synchronous digital design, which consumes significant power by the clock tree. The asynchronous-in-global synchronous-in-local [3]–[4] SNN circuits use local clock signals. They need arbiters in each layer to sort the spikes, weakening the parallelism and timing; additionally, the separation of storage and computing consumes more energy for data movement.","PeriodicalId":6830,"journal":{"name":"2022 IEEE International Solid- State Circuits Conference (ISSCC)","volume":"32 1","pages":"372-374"},"PeriodicalIF":0.0,"publicationDate":"2022-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83182851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A 9b-Linear 14GHz Integrating-Mode Phase Interpolator in 5nm FinFET Process","authors":"A. K. Mishra, Yifei Li, Pawan Agarwal, S. Shekhar","doi":"10.1109/ISSCC42614.2022.9731703","DOIUrl":"https://doi.org/10.1109/ISSCC42614.2022.9731703","url":null,"abstract":"Increased data-rates and multi-lane SerDes implementations impose stringent conditions for CDRs to produce low-jitter clocking that is capable of managing frequency and phase offsets. Consequently, high-speed phase interpolators (Pls) must be both low-power and compact for multi-lane requirements, but also high-resolution with respect to the clock period $(mathrm{T}_{text{period}})$, with high static and dynamic phase linearity to minimize the PI jitter. Prior art in Pls is limited to 7-8b measured resolution [1]–[4], and INL of >500fs [1]–[4]. We present a 9b Pl; even with the additional bits, the proposed PI consumes low power of 0.43mW/GHz and a small area. The worst rotation spur is at least 8.1 dB lower (than [4]), and DNL/INL values of 295fs/510fs are $> 144times$ better than prior-art. Implemented in 5nm technology at VDD $=075mathrm{V}$, our design leverages digital and analog techniques easily suited to FinFET operation.","PeriodicalId":6830,"journal":{"name":"2022 IEEE International Solid- State Circuits Conference (ISSCC)","volume":"55 1","pages":"1-3"},"PeriodicalIF":0.0,"publicationDate":"2022-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90826142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seonghyeok Park, Bumjun Kim, Junhee Cho, J. Chun, Jaehyuk Choi, Seong-Jin Kim
{"title":"An 80×60 Flash LiDAR Sensor with In-Pixel Histogramming TDC Based on Quaternary Search and Time-Gated Δ-Intensity Phase Detection for 45m Detectable Range and Background Light Cancellation","authors":"Seonghyeok Park, Bumjun Kim, Junhee Cho, J. Chun, Jaehyuk Choi, Seong-Jin Kim","doi":"10.1109/ISSCC42614.2022.9731112","DOIUrl":"https://doi.org/10.1109/ISSCC42614.2022.9731112","url":null,"abstract":"Light detection and ranging (LiDAR) sensors have become one of the key building blocks to realize metaverse applications with VR/AR in mobile devices and level-5 automotive vehicles. In particular, SPAD-based direct time-of-flight (D-ToF) sensors have emerged as LiDAR sensors because they offer a longer maximum detectable range and higher background light immunity than indirect time-of-flight (I-ToF) sensors with photon-mixing devices [1]. However, their complicated front- and back-end blocks to resolve ToF values as short as 100ps require high-resolution TDCs and several memories, limiting the spatial resolution and the depth accuracy in short ranges. To address this issue, alternative architectures combining both D-ToF and I-ToF techniques have been reported [2, 3]. Direct-indirect-mixed frame synthesis provides accurate depth information by detecting phases in short ranges while creating a sparse depth map with counting photons in long ranges [2]. A two-step histogramming TDC is used in [3] where a coarse D-ToF discriminates distance roughly and a fine I-ToF extracts depth precisely. However, these approaches still suffer from limited depth accuracy [2] or low spatial resolution [3].","PeriodicalId":6830,"journal":{"name":"2022 IEEE International Solid- State Circuits Conference (ISSCC)","volume":"1 1","pages":"98-100"},"PeriodicalIF":0.0,"publicationDate":"2022-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89612826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A HD 31fps $7times 7$-View Light-Field Factorization Processor for Dual-Layer 3D Factored Display","authors":"Li-Qun Weng, Li-De Chen, Hao-Chien Cheng, Anita Zheng, Kai-Ping Lin, Chao-Tsung Huang","doi":"10.1109/ISSCC42614.2022.9731661","DOIUrl":"https://doi.org/10.1109/ISSCC42614.2022.9731661","url":null,"abstract":"Factored displays [1]–[4] are a novel kind of computational display which provides a full-parallax glasses-free 3D viewing experience. Compared to other autostereoscopic techniques, factored displays provide greater depth of field, larger field of view, and smoother perspective switching without sacrificing image resolution. Figure 33.3.1 shows an example: a light field consisting of $7times 7$-perspective multi views (MVs) is factorized into a set of dual-layer display views (DVs), and displaying the front and rear DVs on two corresponding LCDs can multiplicatively approximate the light field for 3D vision. A higher rank of factorization generates more frames for time-multiplexed display and can improve 3D fidelity with more computation. However, the light-field factorization demands massive memory bandwidth and large computation complexity and becomes a bottleneck for real-time factored displays. For instance, 126.8GB/s of DRAM bandwidth and 4.7TFLOPS of computation are required in a rank-4 factorization at 720p HD 30fps. It is expensive and energy-inefficient to realize these demands in general-propose processors. This paper presents a light-field factorization processor to address the design challenges of memory bandwidth and computational complexity through three key contributions: 1) a half-block-based factorization (HBBF) flow to decouple DRAM access from the iterative nature of factorization to save DRAM bandwidth; 2) a sparse-ray-sampling (SRS) method which reduces DRAM bandwidth and hardware complexity simultaneously; and 3) INT-hybrid optimization for the computation of light-field factorization to save chip area.","PeriodicalId":6830,"journal":{"name":"2022 IEEE International Solid- State Circuits Conference (ISSCC)","volume":"7 1","pages":"1-3"},"PeriodicalIF":0.0,"publicationDate":"2022-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86491223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A 96.2nJ/class Neural Signal Processor with Adaptable Intelligence for Seizure Prediction","authors":"Yi-Yen Hsieh, Yu-Cheng Lin, Chia-Hsiang Yang","doi":"10.1109/ISSCC42614.2022.9731759","DOIUrl":"https://doi.org/10.1109/ISSCC42614.2022.9731759","url":null,"abstract":"Epilepsy is a common neurodegenerative disease that affects more than 50 million people worldwide. Closed-loop neuromodulation is a promising solution to epileptic seizure control through an implantable device that delivers stimulation when seizures are sensed. Figure 33.2.1 shows an overview of a closed-loop neuromodulation system that includes a neural-signal acquisition unit for extracting EEGs, a neural signal processor for sensing seizures, and a stimulation unit for electrical stimulation. For epileptic states, a seizure onset indicates where a seizure begins, followed by intense brain activity. Several seizure detectors [1] [2] having reasonable performance have been proposed to sense seizures after onset. However, patients may still suffer from epileptic syndromes, depending on the severity of the seizures. The syndromes can be eliminated if the seizures can be predicted before onset. This also reduces the amount of required stimulation current, thereby extending the battery life of the implantable device. However, the computational complexity of an accurate seizure prediction algorithm is very high, considering a machine learning kernel is usually embedded to tackle the time-varying characteristics of EEGs adaptively. Up to tens of minutes is needed for seizure prediction on a high-end CPU and a real-time, energy-efficient seizure predictor has never been demonstrated in the literature. This work presents a neural signal processor with adaptable intelligence for real-time seizure prediction with low energy.","PeriodicalId":6830,"journal":{"name":"2022 IEEE International Solid- State Circuits Conference (ISSCC)","volume":"18a 1","pages":"1-3"},"PeriodicalIF":0.0,"publicationDate":"2022-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88115618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Foreword Intelligent Silicon for a Sustainable World","authors":"","doi":"10.1109/isscc42614.2022.9731684","DOIUrl":"https://doi.org/10.1109/isscc42614.2022.9731684","url":null,"abstract":"","PeriodicalId":6830,"journal":{"name":"2022 IEEE International Solid- State Circuits Conference (ISSCC)","volume":"73 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89001444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}