ACM Transactions on Design Automation of Electronic Systems (TODAES)最新文献

Introduction to the Special Section on Energy-Efficient AI Chips 高能效AI芯片专题介绍

ACM Transactions on Design Automation of Electronic Systems (TODAES) Pub Date : 2022-09-21 DOI: 10.1145/3538502

V. Chandra, Yiran Chen, Sung-kyu Yoo

{"title":"Introduction to the Special Section on Energy-Efficient AI Chips","authors":"V. Chandra, Yiran Chen, Sung-kyu Yoo","doi":"10.1145/3538502","DOIUrl":"https://doi.org/10.1145/3538502","url":null,"abstract":"Energy efficiency is one of the most important metrics in AI system designs on both servers and mobile devices. Especially, mobile and edge devices require 10-100X better energy-efficient computing for immersive AR/VR applications as well as AI-based apps on smartphones, smart cameras, etc. Due to battery and cost reasons, such emerging applications demand extreme energy efficiency and high performance to run dozens of heavy neural network models in real time and under stringent power budgets. In order to realize 100X improvements in energy efficiency, we require innovative ideas in both software and hardware. In this special issue, which originated from the Highly Efficient Neural Processing (HENP) workshop held in ESWEEK 2020, we aimed at covering state-of-the-art industrial and academic efforts to achieve orders of magnitude better energy efficiency in software and hardware designs for AI chips. Recent neural networks adopt special layers for better compute efficiency. “MVP: An Efficient CNN Accelerator with Matrix, Vector, and Processing-Near-Memory Units” by Lee et al. proposes an approach to improve area-efficiency of systolic array accelerator for depth-wise separable convolution and squeeze-and-excitation layers which are widely adopted on networks for mobile and embedded systems due to efficiency reasons. Reusing computation results is one of the representative methods in reducing computation cost thereby improving energy efficiency. “Energy Efficient Boosting of GEMM Accelerators for DNN via Reuse” by Cicek et al. proposes a novel reuse-centric hardware accelerator for CNN inference based on the proposed improved detection of neuron vector similarity. Embedded devices are often characterized by continuous sensory inputs and limited computing/programming capabilities. “A Low-Power Programmable Machine Learning Hardware Accelerator Design for Intelligent Edge Devices” by Kee et al. proposes a hardware accelerator, called intelligent boosting engine, to accelerate sensor fusion and the SVM-based motion recognition algorithm with limited programmability. Processing sequential inputs under timing and power budgets is a representative design problem in edge applications. In “Energy Efficient LSTM Inference Accelerator for Real-Time Causal Prediction” by Chen et al. the authors take advantage of fine-grained parallelism, pipelined feedforward and recurrent updates in LSTM and present a bit-sparse quantization to reduce the circuit cost by replacing the original multiplication with the bit-shift operation. Adopting reinforcement learning on edge devices for sequential decision-making and control based on image inputs is desirable, but challenging due to the low efficiency of training and the high cost of inference. In “E2HRL: An Energy-Efficient Hardware Accelerator for Hierarchical Deep Reinforcement Learning”, Shiri et al. proposes a scalable hardware architecture called E2HRL which boosts training speed by learning hierarchical policies and ","PeriodicalId":6933,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems (TODAES)","volume":"1 1","pages":"1 - 2"},"PeriodicalIF":0.0,"publicationDate":"2022-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88570604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Low-Overhead and High-Security Cryptographic Circuit Design Utilizing the TIGFET-Based Three-Phase Single-Rail Pulse Register against Side-Channel Attacks 利用基于tigfet的三相单轨脉冲寄存器抗侧信道攻击的低开销、高安全性密码电路设计

ACM Transactions on Design Automation of Electronic Systems (TODAES) Pub Date : 2022-05-24 DOI: 10.1145/3498339

Yanjiang Liu, Tongzhou Qu, Z. Dai

{"title":"A Low-Overhead and High-Security Cryptographic Circuit Design Utilizing the TIGFET-Based Three-Phase Single-Rail Pulse Register against Side-Channel Attacks","authors":"Yanjiang Liu, Tongzhou Qu, Z. Dai","doi":"10.1145/3498339","DOIUrl":"https://doi.org/10.1145/3498339","url":null,"abstract":"Side-channel attack (SCA) reveals confidential information by statistically analyzing physical manifestations, which is the serious threat to cryptographic circuits. Various SCA circuit-level countermeasures have been proposed as fundamental solutions to reduce the side-channel vulnerabilities of cryptographic implementations; however, such approaches introduce non-negligible power and area overheads. Among all of the circuit components, flip-flops are the main source of information leakage. This article proposes a three-phase single-rail pulse register (TSPR) based on the three-independent-gate field effect transistor (TIGFET) to achieve all desired properties with improved metrics of area and security. TIGFET-based TSPR consumes a constant power (MCV is 0.25%), has a low delay (12 ps), and employs only 10 TIGFET devices, which is applicable for the low-overhead and high-security cryptographic circuit design compared to the existing flip-flops. In addition, a set of TIGFET-based combinational basic gates are designed to reduce the area occupation and power consumption as much as possible. As a proof of concept, a simplified advanced encryption algorithm (AES), SM4 block cipher algorithm (SM4), and light-weight cryptographic algorithm (PRESENT) are built with the TIGFET-based library. SCA is implemented on the cryptographic implementations to prove its SCA resilience, and the SCA results show that the correct key of cryptographic circuits with TIGFET-based TSPRs is not guessed within 2,000 power traces.","PeriodicalId":6933,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems (TODAES)","volume":"43 1","pages":"1 - 13"},"PeriodicalIF":0.0,"publicationDate":"2022-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84058029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Low-power Programmable Machine Learning Hardware Accelerator Design for Intelligent Edge Devices 面向智能边缘设备的低功耗可编程机器学习硬件加速器设计

ACM Transactions on Design Automation of Electronic Systems (TODAES) Pub Date : 2022-04-22 DOI: 10.1145/3531479

Minkwan Kee, Gi-Ho Park

引用次数: 0

RASCv2: Enabling Remote Access to Side-Channels for Mission Critical and IoT Systems RASCv2:支持关键任务和物联网系统的侧信道远程访问

ACM Transactions on Design Automation of Electronic Systems (TODAES) Pub Date : 2022-04-13 DOI: 10.1145/3524123

Yunkai Bai, Andrew Stern, Jungmin Park, M. Tehranipoor, Domenic Forte

{"title":"RASCv2: Enabling Remote Access to Side-Channels for Mission Critical and IoT Systems","authors":"Yunkai Bai, Andrew Stern, Jungmin Park, M. Tehranipoor, Domenic Forte","doi":"10.1145/3524123","DOIUrl":"https://doi.org/10.1145/3524123","url":null,"abstract":"The Internet of Things (IoT) and smart devices are currently being deployed in systems such as autonomous vehicles and medical monitoring devices. The introduction of IoT devices into these systems enables network connectivity for data transfer, cloud support, and more, but can also lead to malware injection. Since many IoT devices operate in remote environments, it is also difficult to protect them from physical tampering. Conventional protection approaches rely on software. However, these can be circumvented by the moving target nature of malware or through hardware attacks. Alternatively, insertion of the internal monitoring circuits into IoT chips requires a design trade-off, balancing the requirements of the monitoring circuit and the main circuit. A very promising approach to detecting anomalous behavior in the IoT and other embedded systems is side-channel analysis. To date, however, this can be performed only before deployment due to the cost and size of side-channel setups (e.g., and oscilloscopes, probes) or by internal performance counters. Here, we introduce an external monitoring printed circuit board (PCB) named RASC to provide remote access to side-channels. RASC reduces the complete side-channel analysis system into two small PCBs (2 ( times ) 2 cm), providing the ability to monitor power and electromagnetic (EM) traces of the target device. Additionally, RASC can transmit data and/or alerts of anomalous activities detected to a remote host through Bluetooth. To demonstrate RASCs capabilities, we extract keys from encryption modules such as AES implemented on Arduino and FPGA boards. To illustrate RASC’s defensive capabilities, we also use it to perform malware detection. RASC’s success in power analysis is comparable to an oscilloscope/probe setup but is lightweight and two orders of magnitude cheaper.","PeriodicalId":6933,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems (TODAES)","volume":"356 1","pages":"1 - 25"},"PeriodicalIF":0.0,"publicationDate":"2022-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74848826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Software/Hardware Co-design of 3D NoC-based GPU Architectures for Accelerated Graph Computations 用于加速图形计算的3D GPU架构的软硬件协同设计

ACM Transactions on Design Automation of Electronic Systems (TODAES) Pub Date : 2022-04-04 DOI: 10.1145/3514354

Dwaipayan Choudhury, Reet Barik, Aravind Sukumaran Rajam, A. Kalyanaraman, And Partha Pratim Pande

{"title":"Software/Hardware Co-design of 3D NoC-based GPU Architectures for Accelerated Graph Computations","authors":"Dwaipayan Choudhury, Reet Barik, Aravind Sukumaran Rajam, A. Kalyanaraman, And Partha Pratim Pande","doi":"10.1145/3514354","DOIUrl":"https://doi.org/10.1145/3514354","url":null,"abstract":"Manycore GPU architectures have become the mainstay for accelerating graph computations. One of the primary bottlenecks to performance of graph computations on manycore architectures is the data movement. Since most of the accesses in graph processing are due to vertex neighborhood lookups, locality in graph data structures plays a key role in dictating the degree of data movement. Vertex reordering is a widely used technique to improve data locality within graph data structures. However, these reordering schemes alone are not sufficient as they need to be complemented with efficient task allocation on manycore GPU architectures to reduce latency due to local cache misses. Consequently, in this article, we introduce a software/hardware co-design framework for accelerating graph computations. Our approach couples an architecture-aware vertex reordering with a priority-based task allocation technique. As the task allocation aims to reduce on-chip latency and associated energy, the choice of Network-on-Chip (NoC) as the communication backbone in the manycore platform is an important parameter. By leveraging emerging three-dimensional (3D) integration technology, we propose design of a small-world NoC (SWNoC)-enabled manycore GPU architecture, where the placement of the links connecting the streaming multiprocessors (SMs) and the memory controllers (MCs) follow a power-law distribution. The proposed 3D SWNoC-enabled software/hardware co-design framework achieves 11.1% to 22.9% performance improvement and 16.4% to 32.6% less energy consumption depending on the dataset and the graph application, when compared to the default order of dataset running on a conventional planar mesh architecture.","PeriodicalId":6933,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems (TODAES)","volume":"1 1","pages":"1 - 22"},"PeriodicalIF":0.0,"publicationDate":"2022-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89460774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Fault Localization Scheme for Missing Gate Faults in Reversible Circuits 可逆电路缺门故障的故障定位方法

ACM Transactions on Design Automation of Electronic Systems (TODAES) Pub Date : 2022-03-08 DOI: 10.1145/3503539

Mousum Handique, J. K. Deka, S. Biswas

引用次数: 0

Introduction to the Special Section on High-level Synthesis for FPGA: Next-generation Technologies and Applications FPGA高级合成专题介绍:下一代技术和应用

ACM Transactions on Design Automation of Electronic Systems (TODAES) Pub Date : 2022-03-08 DOI: 10.1145/3519279

C. Pilato, Zhenman Fang, Yuko Hara-Azumi, J. Hwang

{"title":"Introduction to the Special Section on High-level Synthesis for FPGA: Next-generation Technologies and Applications","authors":"C. Pilato, Zhenman Fang, Yuko Hara-Azumi, J. Hwang","doi":"10.1145/3519279","DOIUrl":"https://doi.org/10.1145/3519279","url":null,"abstract":"Due to the end of Dennard scaling and Moore’s law, heterogeneous System-on-Chip (SoC) architectures are replacing complex hyper-pipelined processors to achieve high performance and energy efficiency. Such architectures feature many specialized components that can be used to accelerate selected computational kernels by exploiting more intrinsic parallelism with custom logic. Among them, FPGA devices are becoming common targets for these systems, since they allow fast turnaround time, field upgradability, and easy deployment of hardware/software solutions. However, co-designing FPGA systems still requires a combination of hardware and software design skills that are uncommon in most of the designers. To overcome these issues, designers need to raise the abstraction level from low-level manual designs to high-level approaches. High-level synthesis (HLS) is becoming a key enabling technology, especially for FPGA designs, since it allows designers to describe the functionality of a component at the software level and automatically generate the corresponding hardware description, enabling fast deployment of hardware/software solutions. HLS has been making tremendous progress in many application domains, ranging from Internet of Things and edge computing to data centers and cloud computing. While HLS is becoming more popular, the other side of the coin is that it is pushing the application landscape for hardware acceleration towards unprecedented challenges. On one hand, modern applications must elaborate huge amounts of data, demanding efficient methods for managing memory accesses. On the other hand, HLS is a complex process that produces itself a huge amount of information that can be used to drive further optimizations. In both cases, machine learning is coming to the rescue to extract valuable knowledge and make accurate predictions. In this special section, we have six articles covering both challenges (the first five articles) and application aspects (the last one). These articles show that HLS is a powerful but yet difficult-touse solution. Indeed, many HLS tools offer directives, i.e., source code annotations that trigger specific optimizations, but understanding the optimal combination from a huge design space is still a manual and time-consuming effort. The articles in this special section provide interesting insights on how to automate this exploration process also with the help of machine learning. We hope you will enjoy them and find them as interesting as we did. The special section opens with an article on the compiler-level optimizations for optimizing the pointer synthesis within HLS. “A case for precise, fine-grained pointer synthesis in high-level synthesis,” by N. Ramanathan et al., aims at reducing the gap between application designers, who could make heavy use of pointers to create compact and efficient software descriptions, and hardware designers, which demand precise memory information to implement the corresponding accesses","PeriodicalId":6933,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems (TODAES)","volume":"22 1","pages":"1 - 2"},"PeriodicalIF":0.0,"publicationDate":"2022-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73932609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Case for Precise, Fine-Grained Pointer Synthesis in High-Level Synthesis 高级综合中精确、细粒度指针综合的一个例子

ACM Transactions on Design Automation of Electronic Systems (TODAES) Pub Date : 2022-03-08 DOI: 10.1145/3491430

Nadesh Ramanathan, G. Constantinides, John Wickerson

{"title":"A Case for Precise, Fine-Grained Pointer Synthesis in High-Level Synthesis","authors":"Nadesh Ramanathan, G. Constantinides, John Wickerson","doi":"10.1145/3491430","DOIUrl":"https://doi.org/10.1145/3491430","url":null,"abstract":"This article combines two practical approaches to improve pointer synthesis within HLS tools. Both approaches focus on inefficiencies in how HLS tools treat the points-to graph—a mapping that connects each instruction to the memory locations that it might access at runtime. HLS pointer synthesis first computes the points-to graph via pointer analysis and then implements its connections in hardware, which gives rise to two inefficiencies. First, HLS tools typically favour pointer analysis that is fast, sacrificing precision. Second, they also favour centralising memory connections in hardware for instructions that can point to more than one location. In this article, we demonstrate that a more precise pointer analysis coupled with decentralised memory connections in hardware can substantially reduce the unnecessary sharing of memory resources. We implement both flow- and context-sensitive pointer analysis and fine-grained memory connections in two modern HLS tools, LegUp and Vitis HLS. An evaluation on three benchmark suites, ranging from non-trivial pointer use to standard HLS benchmarks, indicates that when we improve both precision and granularity of pointer synthesis, on average, we can reduce area and latency by around 42% and 37%, respectively.","PeriodicalId":6933,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems (TODAES)","volume":"82 1","pages":"1 - 26"},"PeriodicalIF":0.0,"publicationDate":"2022-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72734484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Synthesis of Clock Networks with a Mode-Reconfigurable Topology 具有模式可重构拓扑结构的时钟网络的综合

ACM Transactions on Design Automation of Electronic Systems (TODAES) Pub Date : 2022-03-08 DOI: 10.1145/3503538

Necati Uysal, Rickard Ewetz

{"title":"Synthesis of Clock Networks with a Mode-Reconfigurable Topology","authors":"Necati Uysal, Rickard Ewetz","doi":"10.1145/3503538","DOIUrl":"https://doi.org/10.1145/3503538","url":null,"abstract":"Modern digital circuits are often required to operate in multiple modes to cater to variable frequency and power requirements. Consequently, the clock networks for such circuits must be synthesized, meeting different timing constraints in different operational modes. The overall power consumption and robustness to variations of a clock network are determined by the topology. However, state-of-the-art clock networks use the same topology in every mode, despite that timing constraints in low- and high-performance modes can be very different. In this article, we propose a clock network with a mode-reconfigurable topology (MRT) for circuits with positive-edge-triggered sequential elements. In high-performance modes, the MRT structure is reconfigured into a near-tree to provide the required robustness to variations. In low-performance modes, the MRT structure is reconfigured into a tree to save power. Non-tree (or near-tree) structures provide robustness to variations by appropriately constructing multiple alternative paths from the clock source to the clock sinks, which neutralizes the negative impact of variations. In MRT structures, OR-gates are used to join multiple alternative paths into a single path. Hence, the MRT structures consume no short-circuit power because there is only one gate driving each net. Moreover, it is straightforward to reconfigure an MRT structure into a tree topology using a single clock gate. In high-performance modes, the experimental results demonstrate that MRT structures have ( 25% ) lower power consumption than state-of-the-art near-tree structures. In low-performance modes, the power consumption of the MRT structure is similar to the power consumption of a clock tree.","PeriodicalId":6933,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems (TODAES)","volume":"135 1","pages":"1 - 22"},"PeriodicalIF":0.0,"publicationDate":"2022-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78249676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Magnetic Core TSV-Inductor Design and Optimization for On-chip DC-DC Converter 片上DC-DC变换器磁芯tsv电感的设计与优化

ACM Transactions on Design Automation of Electronic Systems (TODAES) Pub Date : 2022-03-07 DOI: 10.1145/3507700

Chenyi Wen, Xiao Dong, Baixin Chen, Umamaheswara Rao Tida, Yiyu Shi, Cheng Zhuo

{"title":"Magnetic Core TSV-Inductor Design and Optimization for On-chip DC-DC Converter","authors":"Chenyi Wen, Xiao Dong, Baixin Chen, Umamaheswara Rao Tida, Yiyu Shi, Cheng Zhuo","doi":"10.1145/3507700","DOIUrl":"https://doi.org/10.1145/3507700","url":null,"abstract":"The conventional on-chip spiral inductor consumes a significant top-metal routing area, thereby preventing its popularity in many on-chip applications. Recently through-silicon-via– (TSV) based inductor (also known as a TSV-inductor) with a magnetic core has been proved to be a viable option for the on-chip DC-DC converter. The operating conditions of these inductors play a major role in maximizing the performance and efficiency of the DC-DC converter. However, there is a critical need to study the design and optimization details of magnetic core TSV-inductors with the unique three-dimensional structure embedding magnetic core. This article aims to provide a clear understanding of the modeling details of a magnetic core TSV-inductor and a design and optimization methodology to assist efficient inductor design. Moreover, a machine learning–assisted model combining physical details and artificial neural network is also proposed to extract the equivalent circuit to further facilitate DC-DC converter design. Experimental results show that the optimized TSV-inductor with the magnetic core and air-gap can achieve inductance density improvement of up to 7.7 ( times ) and quality factor improvements of up to 1.6 ( times ) for the same footprint compared with the TSV-inductor without a magnetic core. For on-chip DC-DC converter applications, the converter efficiency can be improved by up to 15.9% and 6.8% compared with the conventional spiral and TSV-inductor without magnetic core, respectively.","PeriodicalId":6933,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems (TODAES)","volume":"156 1","pages":"1 - 23"},"PeriodicalIF":0.0,"publicationDate":"2022-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88449718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3