A. Bosio, Samuele Germiniani, G. Pravadelli, Marcello Traiola
{"title":"Exploiting assertions mining and fault analysis to guide RTL-level approximation","authors":"A. Bosio, Samuele Germiniani, G. Pravadelli, Marcello Traiola","doi":"10.23919/DATE56975.2023.10136949","DOIUrl":"https://doi.org/10.23919/DATE56975.2023.10136949","url":null,"abstract":"Approximate Computing (AxC) paradigm was introduced to achieve higher power efficiency, lower area and better performances w.r.t. a “classical” computing system at the cost of a degraded, but still acceptable, output accuracy [1]. AxC can be applied at several abstraction levels of a given computing system: from circuit to algorithm [1], leading to a wide design exploration space that quickly became the bottleneck for successfully deploying AxC. Indeed, the literature proposes many works to automatically trade-off between output accuracy and performances [2]. However, most of them lack the capability to identify resilient elements (e.g, HW component, HDL statements, etc.) of the design to be approximated. Consequently, exploring the design for AxC generally results in a long and tedious procedure. Existing approaches generate approximate variants of the Design Under Exploration (DUE). Every variant is then executed/simulated in order to determine the accuracy degradation [3], which depends on the application and requires a specific metric to be computed (e.g., similarity index, hamming distance, etc.).","PeriodicalId":340349,"journal":{"name":"2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124154745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MECALS: A Maximum Error Checking Technique for Approximate Logic Synthesis","authors":"Chang Meng, Jiajun Sun, Yuqi Mai, Weikang Qian","doi":"10.23919/DATE56975.2023.10136950","DOIUrl":"https://doi.org/10.23919/DATE56975.2023.10136950","url":null,"abstract":"Approximate computing is an effective computing paradigm to improve energy efficiency for error-tolerant applications. Approximate logic synthesis (ALS) methods are designed to generate approximate circuits under certain error constraints. This paper focuses on ALS methods under the maximum error constraint and proposes MECALS, a maximum error checking technique for ALS. MECALS models maximum error using partial Boolean difference and performs fast error checking with SAT sweeping. Based on MECALS, we design an efficient ALS flow. Our experimental results show that compared to a state-of-the-art ALS method, our flow is 13× faster and improves area and delay reduction by 39.2% and 26.0%, respectively.","PeriodicalId":340349,"journal":{"name":"2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127919511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reduce: A Framework for Reducing the Overheads of Fault-Aware Retraining","authors":"Muhammad Abdullah Hanif, M. Shafique","doi":"10.23919/DATE56975.2023.10137268","DOIUrl":"https://doi.org/10.23919/DATE56975.2023.10137268","url":null,"abstract":"Fault-aware retraining has emerged as a prominent technique for mitigating permanent faults in Deep Neural Network (DNN) hardware accelerators. However, retraining leads to huge overheads, specifically when used for fine-tuning large DNNs designed for solving complex problems. Moreover, as each fabricated chip can have a distinct fault pattern, fault-aware retraining is required to be performed for each chip individually considering its unique fault map, which further aggravates the problem. To reduce the overall retraining cost, in this work, we introduce the concept of resilience-driven retraining amount selection. To realize this concept, we propose a novel framework, Reduce, that, at first, computes the resilience of the given DNN to faults at different fault rates and with different amounts of retraining. Then, based on the resilience, it computes the amount of retraining required for each chip considering its unique fault map. We demonstrate the effectiveness of our methodology for a systolic array-based DNN accelerator experiencing permanent faults in the computational array.","PeriodicalId":340349,"journal":{"name":"2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"130 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128497544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuming Liu, A. Yanguas-Gil, Sandeep Madireddy, Yanjing Li
{"title":"Memristor-Spikelearn: A Spiking Neural Network Simulator for Studying Synaptic Plasticity under Realistic Device and Circuit Behaviors","authors":"Yuming Liu, A. Yanguas-Gil, Sandeep Madireddy, Yanjing Li","doi":"10.23919/DATE56975.2023.10136938","DOIUrl":"https://doi.org/10.23919/DATE56975.2023.10136938","url":null,"abstract":"We present the Memristor-Spikelearn simulator (open-sourced), which is capable of incorporating detailed mem-ristor and circuit models in simulation to enable thorough study of synaptic plasticity in spiking neural networks under realistic device and circuit behaviors. Using this simulator, we demonstrate that: (1) a detailed device model is essential for simulating synaptic plasticity workloads, because results obtained using a simplified model can be misleading (e.g., it can overestimate test accuracy by up to 21.9%); (2) detailed simulation helps to determine the proper range of conductance values to represent weights, which is critical in order to achieve the desired accuracy -energy tradeoff (e.g., increasing the conductance values by $10times$ can increase accuracy from 70% to 83% at the price of $20times$ higher energy); and (3) detailed simulation also helps to determine an optimized circuit structure, which is another important design parameter that can yield different accuracy -energy tradeoffs.","PeriodicalId":340349,"journal":{"name":"2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128750275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Two-Stream Neural Network for Post-Layout Waveform Prediction","authors":"Sanghwi Kim, Hyejin Shin, Hyunkyu Kim","doi":"10.23919/DATE56975.2023.10137286","DOIUrl":"https://doi.org/10.23919/DATE56975.2023.10137286","url":null,"abstract":"The gap between pre- and post-simulation, as well as the considerable layout time, increases the significance of the post-layout waveform prediction in dynamic random access memory (DRAM) design. This study develops a post-layout prediction model using the following two-stream neural network: (1) a multi-layer perceptron neural network to calculate the coupling noise by using the physical properties of global interconnects, and (2) a convolutional neural network to compute the time series trends of the waveforms by referencing adjacent signals. The proposed model trains two types of heterogeneous data such that accuracy of 95.5% is achieved on the 1b DRAM process 16Gb DDR5 composed of hundreds of millions of transistors. The model significantly improves the design completeness by pre-detecting the deterioration in the signal quality via post-layout waveform prediction. Generally, although a few weeks are required to obtain post-layout waveforms after the circuit design process, waveforms can be instantly predicted using our proposed model.","PeriodicalId":340349,"journal":{"name":"2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129111923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anstasios Dimitriou, Mingyu Hu, Jonathon S. Hare, G. Merrett
{"title":"Exploration of Decision Sub-Network Architectures for FPGA-based Dynamic DNNs","authors":"Anstasios Dimitriou, Mingyu Hu, Jonathon S. Hare, G. Merrett","doi":"10.23919/DATE56975.2023.10137302","DOIUrl":"https://doi.org/10.23919/DATE56975.2023.10137302","url":null,"abstract":"Dynamic Deep Neural Networks (DNNs) can achieve faster execution and less computationally intensive inference by spending fewer resources on easy to recognise or less informative parts of an input. They make data-dependent decisions, which strategically deactivate a model's components, e.g. layers, channels or sub-networks. However, dynamic DNNs have only been explored and applied on conventional computing systems ($text{CPU} +text{GPU}$)) and programmed with libraries designed for static networks, limiting their effects. In this paper, we propose and explore two approaches for efficiently realising the sub-networks that make these decisions on FPGAs. A pipeline approach targets the use of the existing hardware to execute the sub-network, while a parallel approach uses dedicated circuitry for it. We explore the performance of each using the BranchyNet early exit approach on LeNet-5, and evaluate on a Xilinx ZCU106. The pipeline approach is 36% faster than a desktop CPU. It consumes 0.51 mJ per inference, 16x lower than a non-dynamic network on the same platform and 8x lower than an Nvidia Jetson Xavier NX. The parallel approach executes 17% faster than the pipeline approach when on dynamic inference no early exits are taken, but incurs an increase in energy consumption of 28%.","PeriodicalId":340349,"journal":{"name":"2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115884072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GraphIte: Accelerating Iterative Graph Algorithms on ReRAM Architectures via Approximate Computing","authors":"Dwaipayan Choudhury, A. Kalyanaraman, P. Pande","doi":"10.23919/DATE56975.2023.10137001","DOIUrl":"https://doi.org/10.23919/DATE56975.2023.10137001","url":null,"abstract":"ReRAM-based Processing-in-Memory (PIM) offers a promising paradigm for computing near data, making it an attractive platform of choice for graph applications that suffer from sparsity and irregular memory access. However, the performance of ReRAM-based graph accelerators is limited by two key challenges - significant storage requirements (particularly due to wasted zero cell storage of a graph's adjacency matrix), and significant amount of on-chip traffic between ReRAM-based processing elements. In this paper we present, GraphIte, an approximate computing-based framework for accelerating iterative graph applications on ReRAM-based architectures. GraphIte uses sparsification and approximate updates to achieve significant reductions in ReRAM storage and data movement. Our experiments on PageRank and community detection show that our proposed architecture outperforms a state-of-the-art ReRAM-based graph accelerator by up to 83.4% reduction in execution time while consuming up to 87.9% less energy for a range of graph inputs and workloads.","PeriodicalId":340349,"journal":{"name":"2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115939697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Benjamin Lukas Cajus Barzen, Arya Reais-Parsi, Eddie Hung, Minwoo Kang, A. Mishchenko, J. Greene, J. Wawrzynek
{"title":"Narrowing the Synthesis Gap: Academic FPGA Synthesis is Catching Up With the Industry","authors":"Benjamin Lukas Cajus Barzen, Arya Reais-Parsi, Eddie Hung, Minwoo Kang, A. Mishchenko, J. Greene, J. Wawrzynek","doi":"10.23919/DATE56975.2023.10137310","DOIUrl":"https://doi.org/10.23919/DATE56975.2023.10137310","url":null,"abstract":"Historically, open-source FPGA synthesis and technology mapping tools have been considered far inferior to industry-standard tools. We show that this is no longer true. Improvements in recent years to Yosys (Verilog elaborator) and ABC (technology mapper) have resulted in substantially better performance, evident in both the reduction of area utilization and the increase in the maximum achievable clock frequency. More specifically, we describe how ABC9 — a set of feature additions to ABC — was integrated into Yosys upstream and available in the latest version. Technology mapping now has a complete view of the circuit, including support for hard blocks (e.g., carry chains) and multiple clock domains for timing-aware mapping. We demonstrate how these improvements accumulate in dramatically better synthesis results, with Yosys-ABC9 reducing the delay gap from 30% to 0% on a commercial FPGA target for the commonly used VTR benchmark, thus matching Vivado's performance in terms of maximum clock frequency. We also measured the performance on a selection of circuits from OpenCores as well as literature, comparing the results produced by Vivado, Yosys-ABC1 (existing work), and the proposed Yosys-ABC9 integration.","PeriodicalId":340349,"journal":{"name":"2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116393995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
W. Fornaciari, G. Agosta, Daniele Cattaneo, Lev Denisov, Andrea Galimberti, Gabriele Magnani, Davide Zoni
{"title":"Hardware and Software Support for Mixed Precision Computing: a Roadmap for Embedded and HPC Systems","authors":"W. Fornaciari, G. Agosta, Daniele Cattaneo, Lev Denisov, Andrea Galimberti, Gabriele Magnani, Davide Zoni","doi":"10.23919/DATE56975.2023.10137092","DOIUrl":"https://doi.org/10.23919/DATE56975.2023.10137092","url":null,"abstract":"Mixed precision is an approximate computing technique that can be used to trade-off computation accuracy for performance and/or energy. It can be applied to many error-tolerant applications, but manual precision tuning is both tedious and error-prone. Furthermore, the effectiveness of the technique heavily depends on hardware characteristics. Therefore, a hardware/software co-design approach is necessary for an effective exploitation of precision tuning opportunities offered by the applications. In this paper, we propose, based on the state of the art of precision tuning software and mixed precision hardware, a roadmap for the evolution of hardware designs and compiler-based precision tuning support, which is ongoing in the context of the European projects TEXTAROSSA and APROPOS.","PeriodicalId":340349,"journal":{"name":"2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116988179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using High-Level Synthesis to model System Verilog procedural timing controls","authors":"Luca Ezio Pozzoni, Fabrizio Ferrandi, Loris Mendola, Alfio Antonino Palazzo, Francesco Pappalardo","doi":"10.23919/DATE56975.2023.10136907","DOIUrl":"https://doi.org/10.23919/DATE56975.2023.10136907","url":null,"abstract":"In modern SoC designs, digital components' development and verification processes often depend on the component's interactions with other digital and analog modules on the same die. While designers can rely on a wide range of tools and practices for validating fully-digital models, porting the same workflow to mixed models' development requires significant efforts from the designers. A common practice is to use Real Number Modeling techniques to generate HDL-based behavioral models of analog components to efficiently simulate mixed models using only event-based simulations rather than Analog Mixed Signals (AMS) simulations. However, some of these models' language features are not synthesizable with existing synthesis tools, requiring additional efforts from the designers to generate post-tapeout prototypes. This paper presents a methodology for transforming some non-synthesizable System Verilog language features related to timing controls into functionally-equivalent synthesizable Verilog constructs. The resulting synthesizable models replicate their respective RNMs' behavior while explicitly managing delay controls and event expressions. The RNMs are first transformed using the MLIR framework and then synthesized with open-source HLS tools to obtain FPGA-synthesizable Verilog models.","PeriodicalId":340349,"journal":{"name":"2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114295036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}