Anthony Byrne, Yanni Pang, Allen Zou, S. Nadgowda, A. Coskun
{"title":"MicroFaaS: Energy-efficient Serverless on Bare-metal Single-board Computers","authors":"Anthony Byrne, Yanni Pang, Allen Zou, S. Nadgowda, A. Coskun","doi":"10.23919/DATE54114.2022.9774688","DOIUrl":"https://doi.org/10.23919/DATE54114.2022.9774688","url":null,"abstract":"Serverless function-as-a-service (FaaS) platforms offer a radically-new paradigm for cloud software development, yet the hardware infrastructure underlying these platforms is based on a decades-old design pattern. The rise of FaaS presents an opportunity to reimagine cloud infrastructure to be more energy-efficient, cost-effective, reliable, and secure. In this paper, we show how replacing handfuls of x86-based rack servers with hundreds of ARM-based single-board computers could lead to a virtualization-free, energy-proportional cloud that achieves this vision. We call our systematically-designed implementation MicroFaaS, and we conduct a thorough evaluation and cost analysis comparing MicroFaaS to a throughput-matched FaaS platform implemented in the style of conventional virtualization-based cloud systems. Our results show a 5.6x increase in energy efficiency and 34.2% decrease in total cost of ownership compared to our baseline.","PeriodicalId":232583,"journal":{"name":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115823301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tao Yang, Dongyue Li, Zhuoran Song, Yilong Zhao, Fangxin Liu, Zongwu Wang, Zhezhi He, Li Jiang
{"title":"DTQAtten: Leveraging Dynamic Token-based Quantization for Efficient Attention Architecture","authors":"Tao Yang, Dongyue Li, Zhuoran Song, Yilong Zhao, Fangxin Liu, Zongwu Wang, Zhezhi He, Li Jiang","doi":"10.23919/DATE54114.2022.9774692","DOIUrl":"https://doi.org/10.23919/DATE54114.2022.9774692","url":null,"abstract":"Models based on the attention mechanism, i.e. transformers, have shown extraordinary performance in Natural Language Processing (NLP) tasks. However, their memory footprint, inference latency, and power consumption are still prohibitive for efficient inference at edge devices, even at data centers. To tackle this issue, we present an algorithm-architecture co-design with dynamic and mixed-precision quantization, DTQAtten. We present empirically that the tolerance to the noise varies from token to token in attention-based NLP models. This finding leads us to quantize different tokens with mixed levels of bits. Thus, we design a compression framework that (i) dynamically quantizes tokens while they are forwarded in the models and (ii) jointly determines the ratio of each precision. Moreover, due to the dynamic mixed-precision tokens caused by our framework, previous matrix-multiplication accelerators (e.g. systolic array) cannot effectively exploit the benefit of the compressed attention computation. We thus design our accelerator with the variable-speed systolic array (VSSA) and propose an effective optimization strategy to alleviate the pipeline-stall problem in VSSA without hardware overhead. We conduct experiments with existing attention-based NLP models, including BERT and GPT-2 on various language tasks. Our results show that DTQAtten outperforms the previous neural network accelerator Eyeriss by 13.12× in terms of speedup and 3.8× in terms of energy-saving. Compared with the state-of-the-art attention accelerator SpAtten, our DTQAtten achieves at least 2.65× speedup and 3.38× energy efficiency improvement.","PeriodicalId":232583,"journal":{"name":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115395294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Traveling Salesman Problem Solvers using the Ising Model with Simulated Bifurcation","authors":"Tingting Zhang, Jie Han","doi":"10.23919/DATE54114.2022.9774576","DOIUrl":"https://doi.org/10.23919/DATE54114.2022.9774576","url":null,"abstract":"An Ising model-based solver has shown efficiency in obtaining suboptimal solutions for combinatorial optimization problems. As an NP-hard problem, the traveling salesman problem (TSP) plays an important role in various routing and scheduling applications. However, the execution speed and solution quality significantly deteriorate using a solver with simulated annealing (SA) due to the quadratically increasing number of spins and strong constraints placed on the spins. The ballistic simulated bifurcation (bSB) algorithm utilizes the signs of Kerr-nonlinear parametric oscillators' positions as the spins' states. It can update the states in parallel to alleviate the time explosion problem. In this paper, we propose an efficient method for solving TSPs by using the Ising model with bSB. Firstly, the TSP is mapped to an Ising model without external magnetic fields by introducing a redundant spin. Secondly, various evolution strategies for the introduced position and different dynamic configurations of the time step are considered to improve the efficiency in solving TSPs. The effectiveness is specifically discussed and evaluated by comparing the solution quality to SA. Experiments on benchmark datasets show that the proposed bSB-based TSP solvers offer superior performance in solution quality and achieve a significant speed up in runtime than recent SA-based ones.","PeriodicalId":232583,"journal":{"name":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"217 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123241838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Pin Accessibility-driven Placement Optimization with Accurate and Comprehensive Prediction Model","authors":"Suwan Kim, Taewhan Kim","doi":"10.23919/DATE54114.2022.9774753","DOIUrl":"https://doi.org/10.23919/DATE54114.2022.9774753","url":null,"abstract":"The significantly increased density of pins of stan-dard cells and the reduced number of routing tracks at sub-10nm nodes have made the pin access problem in detailed routing very difficult. To alleviate this pin accessibility problem in detailed routing, recent works have proposed to make a small perturbation of cell shifting, cell flipping, and adjacent cells swapping in the detailed placement stage. Here, an essential element for the success of pin accessibility aware detailed placement is the installed cost function, which should be sufficiently accurate in predicting the degree of routing difficulty in accessing pins. In this work, we propose a new model of cost function that is comprehensively devised to overcome the limitations of the prior ones. Precisely, unlike the conventional cost functions, our proposed cost function model is based on the empirical routing data in order to fully reflect the potential outcomes of detailed routing. Through experiments with benchmark circuits, it is shown that using our proposed cost function in detailed placement is able to reduce the routing errors by 44 % on average while using the existing cost functions reduce the routing errors on average by at most 15 %.","PeriodicalId":232583,"journal":{"name":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123369837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Syed Aftab Rashid, Muhammad Ali Awan, P. Souto, K. Bletsas, E. Tovar
{"title":"Cache-aware Schedulability Analysis of PREM Compliant Tasks","authors":"Syed Aftab Rashid, Muhammad Ali Awan, P. Souto, K. Bletsas, E. Tovar","doi":"10.23919/DATE54114.2022.9774670","DOIUrl":"https://doi.org/10.23919/DATE54114.2022.9774670","url":null,"abstract":"The Predictable Execution Model (PREM) is useful for mitigating inter-core interference due to shared resources such as the main memory. However, it is cache-agnostic, which makes schedulabulity analysis pessimistic, via overestimation of prefetches and write-backs. In response, we present cache-aware schedulability analysis for PREM tasks on fixed-task-priority partitioned multicores, that bounds the number of cache prefetches and write-backs. Our approach identifies memory blocks loaded in the execution of a previous scheduling interval of each task, that remain in the cache until its next scheduling interval. Doing so, greatly reduces the estimated prefetches and write backs. In experimental evaluations, our analysis improves the schedulability of PREM tasks by up to 55 percentage points.","PeriodicalId":232583,"journal":{"name":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121903385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Golden Model-Free Hardware Trojan Detection by Classification of Netlist Module Graphs","authors":"Alexander Hepp, Johanna Baehr, G. Sigl","doi":"10.23919/DATE54114.2022.9774760","DOIUrl":"https://doi.org/10.23919/DATE54114.2022.9774760","url":null,"abstract":"In a world where increasingly complex integrated circuits are manufactured in supply chains across the globe, hardware Trojans are an omnipresent threat. State-of-the-art methods for Trojan detection often require a golden model of the device under test. Other methods that operate on the netlist without a golden model cannot handle complex designs and operate on Trojan-specific sets of netlist graph features. In this work, we propose a novel machine-learning-based method for hardware Trojan detection. Our method first uses a library of known malicious and benign modules in hierarchical designs to train an eXtreme Gradient Boosted Tree Classifier (XGBClassifier). For training, we generate netlist graphs of each hierarchical module and calculate feature vectors comprising structural characteristics of these graphs. After the training phase, we can analyze the synthesized hierarchical modules of an unknown design under test. The method calculates a feature vector for each module. With this feature vector, each module can be classified into either benign or malicious by the previously trained XGBClassifier. After classifying all modules, we derive a classification for all standard cells in the design under test. This technique allows the identification of hardware Trojan cells in a design and highlights regions of interest to direct further reverse engineering efforts. Experiments show that this approach performs with >97 % Sensitivity and Specificity across available and newly generated hardware Trojan benchmarks and can be applied to more complex designs than previous netlist-based methods while maintaining similar computational complexity.","PeriodicalId":232583,"journal":{"name":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122177358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Energy-Efficient CGRAs via Stochastic Computing","authors":"Bo Wang, Rong Zhu, Jiaxing Shang, Dajiang Liu","doi":"10.23919/DATE54114.2022.9774585","DOIUrl":"https://doi.org/10.23919/DATE54114.2022.9774585","url":null,"abstract":"Stochastic computing (SC) is a promising computing paradigm for low-power and low-cost applications with the added benefit of high error tolerance. Meanwhile, Coarse-Grained Re-configurable Architecture (CGRA) is also a promising platform for domain-specific applications for its combination of energy efficiency and flexibility. Intuitively, introducing SC to CGRA would synergistically reinforce the strengths of both paradigms. Accordingly, this paper proposes an SC-based CGRA by replacing the exact multiplication in traditional CGRA with an SC-based multiplication, where the problem of accuracy and latency are both improved using parallel stochastic sequence generators and leading zero shifters. In addition, with the flexible connections among PEs, the high-accuracy operation can be easily achieved by combing neighbor PEs without switching costs like power-gating. Compared to the state-of-the-art approximate computing design of CGRA, our proposed CGRA has 16% more energy reduction and 34% energy efficiency improvement while keeping high configuration flexibility.","PeriodicalId":232583,"journal":{"name":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124269930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chenlin Ma, Zhuokai Zhou, Yingping Wang, Yi Wang, Rui Mao
{"title":"MU-RMW: Minimizing Unnecessary RMW Operations in the Embedded Flash with SMR Disk","authors":"Chenlin Ma, Zhuokai Zhou, Yingping Wang, Yi Wang, Rui Mao","doi":"10.23919/DATE54114.2022.9774638","DOIUrl":"https://doi.org/10.23919/DATE54114.2022.9774638","url":null,"abstract":"Emerging Shingled Magnetic Recording (SMR) Disk can improve the storage capacity significantly by overlapping multiple tracks with the shingled direction. However, the shingled-like structure leads to severe write amplification caused by RMW operations inner SMR disks. As the mainstream solid-state storage technology, NAND flash has the advantages of tiny size, cost-effective, high performance, making it suitable and promising to be incorporated into SMR disks to boost the system performance. In this hybrid embedded storage system (i.e., the Embedded Flash with SMR disk (EF-SMR) system), we observe that physical flash blocks can contain a mixture of data associated with different SMR data bands; when garbage collecting such flash blocks, multiple RMW operations are triggered to rewrite the involved SMR bands and the performance is further exacerbated. Therefore, in this paper, we for the first time present MU-RMW to guarantee data from different SMR bands will not be mixed up within the flash blocks with an aim at minimizing unnecessary RMW operations. The effectiveness of MU-RMW was evaluated with realistic and intensive I/O workloads and the results are encouraging.","PeriodicalId":232583,"journal":{"name":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128073639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ADD-based Spectral Analysis of Probing Security","authors":"M. Molteni, V. Zaccaria, V. Ciriani","doi":"10.23919/DATE54114.2022.9774700","DOIUrl":"https://doi.org/10.23919/DATE54114.2022.9774700","url":null,"abstract":"In this paper, we introduce a novel exact verification methodology for non-interference properties of cryptographic circuits. The methodology exploits the Algebraic Decision Diagram representation of the Walsh spectrum to overcome the potential slow down associated with its exact verification against non-interference constraints. Benchmarked against a standard set of use cases, the methodology speeds-up 1.88x the median verification time over the existing state-of-the art tools for exact verification.","PeriodicalId":232583,"journal":{"name":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125006802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Value-aware Parity Insertion ECC for Fault-tolerant Deep Neural Network","authors":"Seonmin Lee, Joon-Sung Yang","doi":"10.23919/DATE54114.2022.9774543","DOIUrl":"https://doi.org/10.23919/DATE54114.2022.9774543","url":null,"abstract":"Deep neural networks (DNNs) are deployed on hardware devices and are widely used in various fields to perform inference from inputs. Unfortunately, hardware devices can become unreliable by incidents such as unintended process, voltage and temperature variations, and this can introduce the occurrence of erroneous weights. Prior study reports that the erroneous weights can cause a significant accuracy degradation. In safety-critical applications such as autonomous driving, it can bring catastrophic results. Retraining or fine-tuning can be used to adjust corrupted weights to prevent the accuracy degradation. However, training-based approaches would incur a significant computational overhead due to a massive size of training datasets and intensive training operations. Thus, this paper proposes a value-aware parity insertion error correction code (ECC) to recover erroneous weights with a reduced parity storage overhead and no additional training processes. Previous ECC-based reliability improvement methods, Weight Nulling and In-place Zero-space ECC, are compared with the proposed method. Experimental results demonstrate that DNNs with the value-aware parity insertion ECC can perform inference without the accuracy degradation, on average, in 122.5× and 15.1× higher bit error rate conditions over Weight Nulling and In-place Zero-space ECC, respectively.","PeriodicalId":232583,"journal":{"name":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131594788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}