2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)最新文献

The ZuSE-KI-Mobil AI Accelerator SoC: Overview and a Functional Safety Perspective ZuSE-KI-Mobil AI加速器SoC:概述和功能安全视角

2023 Design, Automation & Test in Europe Conference & Exhibition (DATE) Pub Date : 2023-04-01 DOI: 10.23919/DATE56975.2023.10137257

F. Kempf, Julian Hoefer, T. Harbaum, Juergen Becker, Nael Fasfous, Alexander Frickenstein, Hans-Jörg Vögel, Simon Friedrich, R. Wittig, E. Matús, G. Fettweis, Matthias Lüders, Holger Blume, Jens Benndorf, Darius Grantz, Martin Zeller, Dietmar Engelke, K. Eickel

{"title":"The ZuSE-KI-Mobil AI Accelerator SoC: Overview and a Functional Safety Perspective","authors":"F. Kempf, Julian Hoefer, T. Harbaum, Juergen Becker, Nael Fasfous, Alexander Frickenstein, Hans-Jörg Vögel, Simon Friedrich, R. Wittig, E. Matús, G. Fettweis, Matthias Lüders, Holger Blume, Jens Benndorf, Darius Grantz, Martin Zeller, Dietmar Engelke, K. Eickel","doi":"10.23919/DATE56975.2023.10137257","DOIUrl":"https://doi.org/10.23919/DATE56975.2023.10137257","url":null,"abstract":"ZuSE-KI-Mobil (ZuKIMo) is a nationally funded research project, currently in its intermediate stage. The goal of the ZuKIMo project is to develop a new System-on-Chip (SoC) platform and corresponding ecosystem to enable efficient Artificial Intelligence (AI) applications with specific requirements. With ZuKIMo, we specifically target applications from the mobility domain, i.e. autonomous vehicles and drones. The initial ecosystem is built by a consortium consisting of seven partners from German academia and industry. We develop the SoC platform and its ecosystem around a novel AI accelerator design. The customizable accelerator is conceived from scratch to fulfill the functional and non-functional requirements derived from the ambitious use cases. A tape-out in 22 nm FDX-technology is planned in 2023. Apart from the System-on-Chip hardware design itself, the ZuKIMo ecosystem has the objective of providing software tooling for easy deployment of new use cases and hardware-CNN co-design. Furthermore, AI accelerators in safety-critical applications like our mobility use cases, necessitate the fulfillment of safety requirements. Therefore, we investigate new design methodologies for fault analysis of Deep Neural Networks (DNNs) and introduce our new redundancy mechanism for AI accelerators.","PeriodicalId":340349,"journal":{"name":"2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115117374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Jumping Shift: A Logarithmic Quantization Method for Low-Power CNN Acceleration 跃移:低功率CNN加速的对数量化方法

2023 Design, Automation & Test in Europe Conference & Exhibition (DATE) Pub Date : 2023-04-01 DOI: 10.23919/DATE56975.2023.10137169

Longxing Jiang, David Aledo, R. V. Leuken

{"title":"Jumping Shift: A Logarithmic Quantization Method for Low-Power CNN Acceleration","authors":"Longxing Jiang, David Aledo, R. V. Leuken","doi":"10.23919/DATE56975.2023.10137169","DOIUrl":"https://doi.org/10.23919/DATE56975.2023.10137169","url":null,"abstract":"Logarithmic quantization for Convolutional Neural Networks (CNN): a) fits well typical weights and activation distributions, and b) allows the replacement of the multiplication operation by a shift operation that can be implemented with fewer hardware resources. We propose a new quantization method named Jumping Log Quantization (JLQ). The key idea of JLQ is to extend the quantization range, by adding a coefficient parameter “s” in the power of two exponents $(2^{sx+i})$. This quantization strategy skips some values from the standard logarithmic quantization. In addition, we also develop a small hardware-friendly optimization called weight de-zero. Zero-valued weights that cannot be performed by a single shift operation are all replaced with logarithmic weights to reduce hardware resources with almost no accuracy loss. To implement the Multiply-And-Accumulate (MAC) operation (needed to compute convolutions) when the weights are JLQ-ed and de-zeroed, a new Processing Element (PE) have been developed. This new PE uses a modified barrel shifter that can efficiently avoid the skipped values. Resource utilization, area, and power consumption of the new PE standing alone are reported. We have found that JLQ performs better than other state-of-the-art logarithmic quantization methods when the bit width of the operands becomes very small.","PeriodicalId":340349,"journal":{"name":"2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115444120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FPGA Acceleration of GCN in Light of the Symmetry of Graph Adjacency Matrix 基于图邻接矩阵对称性的GCN FPGA加速

2023 Design, Automation & Test in Europe Conference & Exhibition (DATE) Pub Date : 2023-04-01 DOI: 10.23919/DATE56975.2023.10137076

Gopikrishnan Raveendran Nair, Han-Sok Suh, M. Halappanavar, Frank Liu, J.-s. Seo, Yu Cao

{"title":"FPGA Acceleration of GCN in Light of the Symmetry of Graph Adjacency Matrix","authors":"Gopikrishnan Raveendran Nair, Han-Sok Suh, M. Halappanavar, Frank Liu, J.-s. Seo, Yu Cao","doi":"10.23919/DATE56975.2023.10137076","DOIUrl":"https://doi.org/10.23919/DATE56975.2023.10137076","url":null,"abstract":"Graph Convolutional Neural Networks (GCNs) are widely used to process large-scale graph data. Different from deep neural networks (DNNs), GCNs are sparse, irregular, and unstructured, posing unique challenges to hardware acceleration with regular processing elements (PEs). In particular, the adja-cency matrix of a GCN is extremely sparse, leading to frequent but irregular memory access, low spatial/temporal data locality and poor data reuse. Furthermore, a realistic graph usually consists of unstructured data (e.g., unbalanced distributions), creating significantly different processing times and imbalanced workload for each node in GCN acceleration. To overcome these challenges, we propose an end-to-end hardware-software co-design to accelerate GCNs on resource-constrained FPGAs with the features including: (1) A custom dataflow that leverages symmetry along the diagonal of the adjacency matrix to accelerate feature aggregation for undirected graphs. We utilize either the upper or the lower triangular matrix of the adjacency matrix to perform aggregation in GCN to improve data reuse. (2) Unified compute cores for both aggregation and transform phases, with full support to the symmetry-based dataflow. These cores can be dynamically reconfigured to the systolic mode for transformation or as individual accumulators for aggregation in GCN processing. (3) Preprocessing of the graph in software to rearrange the edges and features to match the custom dataflow. This step improves the regularity in memory access and data reuse in the aggregation phase. Moreover, we quantize the GCN precision from FP32 to INT8 to reduce the memory footprint without losing the inference accuracy. We implement our accelerator design in Intel Stratix10 MX FPGA board with HBM2, and demonstrate $1.3times-110.5times$ improvement in end-to-end GCN latency as compared to the state-of the-art FPGA implementations, on the graph datasets of Cora, Pubmed, Citeseer and Reddit.","PeriodicalId":340349,"journal":{"name":"2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115485486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Practical Remote Power Attack on Machine Learning Accelerators in Cloud FPGAs

2023 Design, Automation & Test in Europe Conference & Exhibition (DATE) Pub Date : 2023-04-01 DOI: 10.23919/DATE56975.2023.10136956

Shanquan Tian, Shayan Moini, Daniel E. Holcomb, R. Tessier, Jakub Szefer

{"title":"A Practical Remote Power Attack on Machine Learning Accelerators in Cloud FPGAs","authors":"Shanquan Tian, Shayan Moini, Daniel E. Holcomb, R. Tessier, Jakub Szefer","doi":"10.23919/DATE56975.2023.10136956","DOIUrl":"https://doi.org/10.23919/DATE56975.2023.10136956","url":null,"abstract":"The security and performance of FPGA-based accelerators play vital roles in today's cloud services. In addition to supporting convenient access to high-end FPGAs, cloud vendors and third-party developers now provide numerous FPGA accelerators for machine learning models. However, the security of accelerators developed for state-of-the-art Cloud FPGA environments has not been fully explored, since most remote accelerator attacks have been prototyped on local FPGA boards in lab settings, rather than in Cloud FPGA environments. To address existing research gaps, this work analyzes three existing machine learning accelerators developed in Xilinx Vitis to assess the potential threats of power attacks on accelerators in Amazon Web Services (AWS) F1 Cloud FPGA platforms, in a multi-tenant setting. The experiments show that malicious co-tenants in a multi-tenant environment can instantiate voltage sensing circuits as register-transfer level (RTL) kernels within the Vitis design environment to spy on co-tenant modules. A methodology for launching a practical remote power attack on Cloud FPGAs is also presented, which uses an enhanced time-to-digital (TDC) based voltage sensor and auto-triggered mechanism. The TDC is used to capture power signatures, which are then used to identify power consumption spikes and observe activity patterns involving the FPGA shell, DRAM on the FPGA board, or the other co-tenant victim's accelerators. Voltage change patterns related to shell use and accelerators are then used to create an auto-triggered attack that can automatically detect when to capture voltage traces without the need for a hard-wired synchronization signal between victim and attacker. To address the novel threats presented in this work, this paper also discusses defenses that could be leveraged to secure multi-tenant Cloud FPGAs from power-based attacks.","PeriodicalId":340349,"journal":{"name":"2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126066366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TAM: A Computing in Memory based on Tandem Array within STT-MRAM for Energy-Efficient Analog MAC Operation 基于STT-MRAM串联阵列的内存计算节能模拟MAC操作

2023 Design, Automation & Test in Europe Conference & Exhibition (DATE) Pub Date : 2023-04-01 DOI: 10.23919/DATE56975.2023.10137323

Jinkai Wang, Zhengkun Gu, Hongyu Wang, Zuolei Hao, Bojun Zhang, Weisheng Zhao, Yue Zhang

{"title":"TAM: A Computing in Memory based on Tandem Array within STT-MRAM for Energy-Efficient Analog MAC Operation","authors":"Jinkai Wang, Zhengkun Gu, Hongyu Wang, Zuolei Hao, Bojun Zhang, Weisheng Zhao, Yue Zhang","doi":"10.23919/DATE56975.2023.10137323","DOIUrl":"https://doi.org/10.23919/DATE56975.2023.10137323","url":null,"abstract":"Computing in memory (CIM) has been demonstrated promising for energy efficient computing. However, the dramatic growth of the data scale in neural network processors has aroused a demand for CIM architecture of higher bit density, for which the spin transfer torque magnetic RAM (STT-MRAM) with high bit density and performance arises as an up-and-coming candidate solution. In this work, we propose an analog CIM scheme based on tandem array within STT-MRAM (TAM) to further improve energy efficiency while achieving high bit density. First, the resistance summation based analog MAC operation minimizes the effect of low tunnel magnetoresistance (TMR) by the serial magnetic tunnel junctions (MTJs) structure in the proposed tandem array with smaller area overhead. Moreover, a read scheme of resistive-to-binary is designed to achieve the MAC results accurately and reliably. Besides, the data-dependent error caused by MTJs in series has been eliminated with a proposed dynamic selection circuit. Simulation results of a 2Kb TAM architecture show 113.2 TOPS/W and 63.7 TOPS/W for 4-bit and 8-bit input/weight precision, respectively, and reduction by 39.3% for bit-cell area compared with existing array of MTJs in series.","PeriodicalId":340349,"journal":{"name":"2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125488006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FastRW: A Dataflow-Efficient and Memory-Aware Accelerator for Graph Random Walk on FPGAs 基于fpga的图形随机漫步数据流高效内存感知加速器

2023 Design, Automation & Test in Europe Conference & Exhibition (DATE) Pub Date : 2023-04-01 DOI: 10.23919/DATE56975.2023.10137297

Yingxue Gao, Teng Wang, Lei Gong, Chao Wang, Xi Li, Xuehai Zhou

{"title":"FastRW: A Dataflow-Efficient and Memory-Aware Accelerator for Graph Random Walk on FPGAs","authors":"Yingxue Gao, Teng Wang, Lei Gong, Chao Wang, Xi Li, Xuehai Zhou","doi":"10.23919/DATE56975.2023.10137297","DOIUrl":"https://doi.org/10.23919/DATE56975.2023.10137297","url":null,"abstract":"Graph random walk (GRW) sampling is becoming increasingly important with the widespread popularity of graph applications. It involves some walkers that wander through the graph to capture the desirable properties and reduce the size of the original graph. However, previous research suffers long sampling latency and severe memory access bottlenecks due to intrinsic data dependency and irregular vertex distribution. This paper proposes FastRW, a dedicated accelerator to release GRW acceleration on FPGAs. FastRW first schedules walkers' execution to address data dependency and mask long sampling latency. Then, FastRW leverages pipeline specialization and bit-level optimization to customize a processing engine with five modules and achieve a pipelining dataflow. Finally, to alleviate the differential accesses caused by irregular vertex distribution, FastRW implements a hybrid memory architecture to provide parallel access ports according to the vertex's degree. We evaluate FastRW with two classic GRW algorithms on a wide range of real-world graph datasets. The experimental results show that FastRW achieves a speedup of 14.13× on average over the system running on two 8-core Intel CPUs. FastRW also achieves 3.28×∼198.24× energy efficiency over the architecture implemented on V100 GPU.","PeriodicalId":340349,"journal":{"name":"2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126924448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Novel Fault-Tolerant Architecture for Tiled Matrix Multiplication 一种新的平铺矩阵乘法容错体系结构

2023 Design, Automation & Test in Europe Conference & Exhibition (DATE) Pub Date : 2023-04-01 DOI: 10.23919/DATE56975.2023.10136985

Sandeep Bal, Chandra sekhar Mummidi, V. C. Ferreira, S. Srinivasan, S. Kundu

{"title":"A Novel Fault-Tolerant Architecture for Tiled Matrix Multiplication","authors":"Sandeep Bal, Chandra sekhar Mummidi, V. C. Ferreira, S. Srinivasan, S. Kundu","doi":"10.23919/DATE56975.2023.10136985","DOIUrl":"https://doi.org/10.23919/DATE56975.2023.10136985","url":null,"abstract":"General matrix multiplication (GEMM) is common to many scientific and machine-learning applications. Convolution, the dominant computation in Convolutional Neural Networks (CNNs), can be formulated as a GEMM problem. Due to its widespread use, a new generation of processors features GEMM acceleration in hardware. Intel recently announced an Advanced Matrix Multiplication (AMX®) instruction set for GEMM, which is supported by 1kB AMX registers and a Tile Multiplication unit (TMUL) for multiplying tiles (sub-matrices) in hardware. Silent Data Corruption (SDC) is a well-known problem that occurs when hardware generates corrupt output. Google and Meta recently reported findings of SDC in GEMM in their data centers. Algorithm-Based Fault Tolerance (ABFT) is an efficient mechanism for detecting and correcting errors in GEMM, but classic ABFT solutions are not optimized for hardware acceleration. In this paper, we present a novel ABFT implementation directly on hardware. Though the exact implementation of Intel TMUL is not known, we propose two different TMUL architectures representing two design points in the area-power-performance spectrum and illustrate how ABFT can be directly incorporated into the TMUL hardware. This approach has two advantages: (i) an error can be concurrently detected at the tile level, which is an improvement over finding such errors only after performing the full matrix multiplication; and (ii) we further demonstrate that performing ABFT at the hardware level has no performance impact and only a small area, latency, and power overhead.","PeriodicalId":340349,"journal":{"name":"2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122698209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Region-based Flash Caching with Joint Latency and Lifetime Optimization in Hybrid SMR Storage Systems 混合SMR存储系统中具有联合延迟和生命周期优化的基于区域的闪存缓存

2023 Design, Automation & Test in Europe Conference & Exhibition (DATE) Pub Date : 2023-04-01 DOI: 10.23919/DATE56975.2023.10137148

Zhengang Chen, Guohui Wang, Zhiping Shi, Yong-Yuan Guan, Tianyu Wang

引用次数: 0

ADEE-LID: Automated Design of Energy-Efficient Hardware Accelerators for Levodopa-Induced Dyskinesia Classifiers 高效节能硬件加速器的自动设计左旋多巴诱导运动障碍分类器

2023 Design, Automation & Test in Europe Conference & Exhibition (DATE) Pub Date : 2023-04-01 DOI: 10.23919/DATE56975.2023.10137079

Martin Hurta, Vojtěch Mrázek, Michaela Drahosova, L. Sekanina

引用次数: 1

Expanding In-Cone Obfuscated Tree for Anti SAT Attack 扩展反SAT攻击的锥内混淆树

2023 Design, Automation & Test in Europe Conference & Exhibition (DATE) Pub Date : 2023-04-01 DOI: 10.23919/DATE56975.2023.10137091

RuiJie Wang, Li-Nung Hsu, Yung-Chih Chen, TingTing Hwang

引用次数: 0