Y. Yue, T. Ajayi, Xueyang Liu, Peiwen Xing, Zihan Wang, D. Blaauw, R. Dreslinski, Hun-Seok Kim
{"title":"A Unified Forward Error Correction Accelerator for Multi-Mode Turbo, LDPC, and Polar Decoding","authors":"Y. Yue, T. Ajayi, Xueyang Liu, Peiwen Xing, Zihan Wang, D. Blaauw, R. Dreslinski, Hun-Seok Kim","doi":"10.1145/3531437.3539726","DOIUrl":"https://doi.org/10.1145/3531437.3539726","url":null,"abstract":"Forward error correction (FEC) is a critical component in communication systems as the errors induced by noisy channels can be corrected using the redundancy in the coded message. This paper introduces a novel multi-mode FEC decoder accelerator that can decode Turbo, LDPC, and Polar codes using a unified architecture. The proposed design explores the similarities in these codes to enable energy efficient decoding with minimal overhead in the total area of the unified architecture. Moreover, the proposed design is highly reconfigurable to support various existing and future FEC standards including 3GPP LTE/5G, and IEEE 802.11n WiFi. Implemented in GF 12nm FinFET technology, the design occupies 8.47mm2 of chip area attaining 25% logic and 49% memory area savings compared to a collection of single-mode designs. Running at 250MHz and 0.8V, the decoder achieves per-iteration throughput and energy efficiency of 690Mb/s and 44pJ/b for Turbo; 740Mb/s and 27.4pJ/b for LDPC; and 950Mb/s and 45.8pJ/b for Polar.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125171407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Charge Domain P-8T SRAM Compute-In-Memory with Low-Cost DAC/ADC Operation for 4-bit Input Processing","authors":"Joonhyung Kim, Kyeongho Lee, Jongsun Park","doi":"10.1145/3531437.3539718","DOIUrl":"https://doi.org/10.1145/3531437.3539718","url":null,"abstract":"This paper presents a low cost PMOS-based 8T (P-8T) SRAM Compute-In-Memory (CIM) architecture that efficiently per-forms the multiply-accumulate (MAC) operations between 4-bit input activations and 8-bit weights. First, bit-line (BL) charge-sharing technique is employed to design the low-cost and reliable digital-to-analog conversion of 4-bit input activations in the pro-posed SRAM CIM, where the charge domain analog computing provides variation tolerant and linear MAC outputs. The 16 local arrays are also effectively exploited to implement the analog mul-tiplication unit (AMU) that simultaneously produces 16 multipli-cation results between 4-bit input activations and 1-bit weights. For the hardware cost reduction of analog-to-digital converter (ADC) without sacrificing DNN accuracy, hardware aware system simulations are performed to decide the ADC bit-resolutions and the number of activated rows in the proposed CIM macro. In addition, for the ADC operation, the AMU-based reference col-umns are utilized for generating ADC reference voltages, with which low-cost 4-bit coarse-fine flash ADC has been designed. The 256×80 P-8T SRAM CIM macro implementation using 28nm CMOS process shows that the proposed CIM shows the accuracies of 91.46% and 66.67% with CIFAR-10 and CIFAR-100 dataset, respectively, with the energy efficiency of 50.07-TOPS/W.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130654354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Torrisi, Maria Doglioni, K. Yıldırım, D. Brunelli
{"title":"Visible Light Synchronization for Time-Slotted Energy-Aware Transiently-Powered Communication","authors":"A. Torrisi, Maria Doglioni, K. Yıldırım, D. Brunelli","doi":"10.1145/3531437.3539722","DOIUrl":"https://doi.org/10.1145/3531437.3539722","url":null,"abstract":"Energy-harvesting IoT devices that operate without batteries paved the way for sustainable sensing applications. These devices force applications to run intermittently since the ambient energy is sporadic, leading to frequent power failures. Unexpected power failures introduce several challenges to wireless communication since nodes are not synchronized and stop operating during data transmission. This paper presents a novel self-powered autonomous circuit design to remedy this problem. This circuit uses visible-light communication (VLC) to enable synchronization for time-slotted energy-aware transiently powered communication. Therefore, it aligns the activity phases of the batteryless sensors so that energy status communication occurs when these nodes are active simultaneously. Evaluations showed that our circuit has an ultra-low power consumption, can work with zero energy cost by relying only on the harvested energy, and supports efficient intermittent communication over intermittently powered nodes.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124095235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zahra Azad, Guowei Yang, R. Agrawal, Daniel Petrisko, Michael B. Taylor, A. Joshi
{"title":"RACE: RISC-V SoC for En/decryption Acceleration on the Edge for Homomorphic Computation","authors":"Zahra Azad, Guowei Yang, R. Agrawal, Daniel Petrisko, Michael B. Taylor, A. Joshi","doi":"10.1145/3531437.3539725","DOIUrl":"https://doi.org/10.1145/3531437.3539725","url":null,"abstract":"As more and more edge devices connect to the cloud to use its storage and compute capabilities, they bring in security and data privacy concerns. Homomorphic Encryption (HE) is a promising solution to maintain data privacy by enabling computations on the encrypted user data in the cloud. While there has been a lot of work on accelerating HE computation in the cloud, little attention has been paid to optimize the en/decryption on the edge. Therefore, in this paper, we present RACE, a custom-designed area- and energy-efficient SoC for en/decryption of data for HE. Owing to similar operations in en/decryption, RACE unifies the en/decryption datapath to save area. RACE efficiently exploits techniques like memory reuse and data reordering to utilize minimal amount of on-chip memory. We evaluate RACE using a complete RTL design containing a RISC-V processor and our unified accelerator. Our analysis shows that, for the end-to-end en/decryption, using RACE leads to, on average, 48 × to 39729 × (for a wide range of security parameters) more energy-efficient solution than purely using a processor.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126385709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ranyang Zhou, A. Roohi, Durga Misra, Shaahin Angizi
{"title":"FlexiDRAM: A Flexible in-DRAM Framework to Enable Parallel General-Purpose Computation","authors":"Ranyang Zhou, A. Roohi, Durga Misra, Shaahin Angizi","doi":"10.1145/3531437.3539721","DOIUrl":"https://doi.org/10.1145/3531437.3539721","url":null,"abstract":"In this paper, we propose a Flexible processing-in-DRAM framework named FlexiDRAM that supports the efficient implementation of complex bulk bitwise operations. This framework is developed on top of a new reconfigurable in-DRAM accelerator that leverages the analog operation of DRAM sub-arrays and elevates it to implement XOR2-MAJ3 operations between operands stored in the same bit-line. FlexiDRAM first generates an efficient XOR-MAJ representation of the desired logic and then appropriately allocates DRAM rows to the operands to execute any in-DRAM computation. We develop ISA and software support required to compute in-DRAM operation. FlexiDRAM transforms current memory architecture to a massively parallel computational unit and can be leveraged to significantly reduce the latency and energy consumption of complex workloads. Our extensive circuit-to-architecture simulation results show that averaged across two well-known deep learning workloads, FlexiDRAM achieves ∼ 15 × energy-saving and 13 × speedup over the GPU outperforming recent processing-in-DRAM platforms.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128844301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Ma, Weidong Cao, Fei Qiao, Ayan Chakrabarti, Xuan Zhang
{"title":"HOGEye: Neural Approximation of HOG Feature Extraction in RRAM-Based 3D-Stacked Image Sensors","authors":"T. Ma, Weidong Cao, Fei Qiao, Ayan Chakrabarti, Xuan Zhang","doi":"10.1145/3531437.3539706","DOIUrl":"https://doi.org/10.1145/3531437.3539706","url":null,"abstract":"Many computer vision tasks, ranging from recognition to multi-view registration, operate on feature representation of images rather than raw pixel intensities. However, conventional pipelines for obtaining these representations incur significant energy consumption due to pixel-wise analog-to-digital (A/D) conversions and costly storage and computations. In this paper, we propose HOGEye, an efficient near-pixel implementation for a widely-used feature extraction algorithm—Histograms of Oriented Gradients (HOG). HOGEye moves the key but computation-intensive derivative extraction (DE) and histogram generation (HG) steps into the analog domain by applying a novel neural approximation method in a resistive random-access memory (RRAM)-based 3D-stacked image sensor. The co-location of perception (sensor) and computation (DE and HG) and the alleviation of A/D conversions allow HOGEye design to achieve significant energy saving. With negligible detection rate degradation, the entire HOGEye sensor system consumes less than 48μW@30fps for an image resolution of 256 × 256 (equivalent to 24.3pJ/pixel) while the processing part only consumes 14.1pJ/pixel, achieving more than 2.5 × energy efficiency improvement than the state-of-the-art designs.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115197357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Study on Optimizing Pin Accessibility of Standard Cells in the Post-3 nm Node","authors":"J. Jeong, Jonghyun Ko, Taigon Song","doi":"10.1145/3531437.3539707","DOIUrl":"https://doi.org/10.1145/3531437.3539707","url":null,"abstract":"Nanosheet FETs (NSFETs) are expected to be the post-FinFET device in the technology nodes of 5 nm and beyond. However, despite the high potential of NSFETs, few studies report the impact of NSFETs in the digital VLSI’s perspective. In this paper, we present a study of NSFETs for the optimal standard cell (SDC) library design and pin accessibility-aware layout for less routing congestion and low power consumption. For this objective, we present five novel methodologies to tackle the pin accessibility issues that rise in SDC designs in extremely-low routing resource environments (4 tracks) and emphasize the importance of local trench contact (LTC) in it. Using our methodology, we improve design metrics such as power consumption, total area, and wirelength by -11.0%, -13.2%, and 16.0%, respectively. By our study, we expect the routing congestion issues that additionally occur in advanced technology nodes to be handled and better full-chip designs to be done in 3 nm and beyond.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129419178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Drift-tolerant Coding to Enhance the Energy Efficiency of Multi-Level-Cell Phase-Change Memory","authors":"Yi-Shen Chen, Yuan-Hao Chang, Tei-Wei Kuo","doi":"10.1145/3531437.3539701","DOIUrl":"https://doi.org/10.1145/3531437.3539701","url":null,"abstract":"Phase-Change Memory (PCM) has emerged as a promising memory and storage technology in recent years, and Multi-Level-Cell (MLC) PCM further reduces the per-bit cost to improve its competitiveness by storing multiple bits in each PCM cell. However, MLC PCM has high energy consumption issue in its write operations. In contrast to existing works that try to enhance the energy efficiency of the physical program&verify strategy for MLC PCM, this work proposes a drift-tolerant coding scheme to enable the fast write operation on MLC PCM without sacrificing any data accuracy. By exploiting the resistance drift and asymmetric write characteristic of PCM cells, the proposed scheme can reduce the write energy consumption of MLC PCM significantly. Meanwhile, a segmentation strategy is proposed to further improve the write performance with our coding scheme. A series of analyses and experiments was conducted to evaluate the capability of the proposed scheme. The results show that the proposed scheme can reduce 6.2–17.1% energy consumption and 3.2–11.3% write latency under six representative benchmarks, compared with the existing well-known schemes.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127956273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evolving Skyrmion Racetrack Memory as Energy-Efficient Last-Level Cache Devices","authors":"Ya-Hui Yang, Shuo-Han Chen, Yuan-Hao Chang","doi":"10.1145/3531437.3539709","DOIUrl":"https://doi.org/10.1145/3531437.3539709","url":null,"abstract":"Skyrmion racetrack memory (SK-RM) has been regarded as a promising alternative to replace static random-access memory (SRAM) as a large-size on-chip cache device with high memory density. Different from other nonvolatile random-access memories (NVRAMs), data bits of SK-RM can only be altered or detected at access ports, and shift operations are required to move data bits across access ports along the racetrack. Owing to these special characteristics, word-based mapping and bit-interleaved mapping architectures have been proposed to facilitate reading and writing on SK-RM with different data layouts. Nevertheless, when SK-RM is used as an on-chip cache device, existing mapping architectures lead to the concerns of unpredictable access performance or excessive energy consumption during both data reads and writes. To resolve such concerns, this paper proposes extracting the merits of existing mapping architectures for allowing SK-RM to seamlessly switch its data update policy by considering the write latency requirement of cache accesses. Promising results have been demonstrated through a series of benchmark-driven experiments.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126802678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yigit Tuncel, A. Krishnakumar, Aishwarya Lekshmi Chithra, Younghyun Kim, Ümit Y. Ogras
{"title":"A Domain-Specific System-On-Chip Design for Energy Efficient Wearable Edge AI Applications","authors":"Yigit Tuncel, A. Krishnakumar, Aishwarya Lekshmi Chithra, Younghyun Kim, Ümit Y. Ogras","doi":"10.1145/3531437.3539711","DOIUrl":"https://doi.org/10.1145/3531437.3539711","url":null,"abstract":"Artificial intelligence (AI) based wearable applications collect and process a significant amount of streaming sensor data. Transmitting the raw data to cloud processors wastes scarce energy and threatens user privacy. Wearable edge AI devices should ideally balance two competing requirements: (1) maximizing the energy efficiency using targeted hardware accelerators and (2) providing versatility using general-purpose cores to support arbitrary applications. To this end, we present an open-source domain-specific programmable system-on-chip (SoC) that combines a RISC-V core with a meticulously determined set of accelerators targeting wearable applications. We apply the proposed design method to design an FPGA prototype and six real-life use cases to demonstrate the efficacy of the proposed SoC. Thorough experimental evaluations show that the proposed SoC provides up to 9.1 × faster execution and up to 8.9 × higher energy efficiency than software implementations in FPGA while maintaining programmability.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"170 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115006629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}