Philipp Mundhenk, Arne Hamann, Andreas Heyl, D. Ziegenbein
{"title":"Reliable Distributed Systems","authors":"Philipp Mundhenk, Arne Hamann, Andreas Heyl, D. Ziegenbein","doi":"10.23919/DATE54114.2022.9774734","DOIUrl":"https://doi.org/10.23919/DATE54114.2022.9774734","url":null,"abstract":"The domains of Cyber-Physical Systems (CPSs) and Information Technology (IT) are converging. Driven by the need for increased compute performance, as well as the need for increased connectivity and runtime flexibility, IT hardware, such as microprocessors and Graphics Processing Units (GPUs), as well as software abstraction layers are introduced to CPS. These systems and components are being enhanced for the execution of hard real-time applications. This enables the convergence of embedded and IT: Embedded workloads can be executed reliably on top of IT infrastructure. This is the dawn of Reliable Distributed Systems (RDSs), a technology that combines the performance and cost of IT systems with the reliability of CPSs. The Fabric is a global RDS runtime environment, weaving the interconnections between devices and enabling abstractions for compute, communication, storage, sensing & actuation. This paper outlines the vision of RDS, introduces the aspects required for implementing RDSs and the Fabric, relates existing technologies, and outlines open research challenges.","PeriodicalId":232583,"journal":{"name":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"24 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116633606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Gibbon: Efficient Co-Exploration of NN Model and Processing-In-Memory Architecture","authors":"Hanbo Sun, Chenyu Wang, Zhenhua Zhu, Xuefei Ning, Guohao Dai, Huazhong Yang, Yu Wang","doi":"10.23919/DATE54114.2022.9774605","DOIUrl":"https://doi.org/10.23919/DATE54114.2022.9774605","url":null,"abstract":"The memristor-based Processing-In-Memory (PIM) architectures have shown great potential to boost the computing energy efficiency of Neural Networks (NNs). Existing work concentrates on hardware architecture design and algorithm-hardware co-optimization, but neglects the non-negligible impact of the correlation between NN models and PIM architectures. To ensure high accuracy and energy efficiency, it is important to co-design the NN model and PIM architecture. However, on the one hand, the co-exploration space of NN model and PIM architecture is extremely tremendous, making searching for the optimal results difficult. On the other hand, during the co-exploration process, PIM simulators pose a heavy computational burden and runtime overhead for evaluation. To address these problems, in this paper, we propose an efficient co-exploration framework for the NN model and PIM architecture, named Gibbon. In Gibbon, we propose an evolutionary search algorithm with adaptive parameter priority, which focuses on subspace of high priority parameters and alleviates the problem of vast co-design space. Besides, we design a Recurrent Neural Network (RNN) based predictor for accuracy and hardware performances. It substitutes for a large part of the PIM simulator workload and reduces the long simulation time. Experimental results show that the proposed co-exploration framework can find better NN models and PIM architectures than existing studies in only seven GPU hours (8.4~41.3× speedup). At the same time, Gibbon can improve the accuracy of co-design results by 10.7% and reduce the energy-delay-product by 6.48× compared with existing work.","PeriodicalId":232583,"journal":{"name":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"83 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120919687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Health Monitoring of Milling Tools under Distinct Operating Conditions by a Deep Convolutional Neural Network model","authors":"Priscile Suawa Fogou, Michael Hübner","doi":"10.23919/DATE54114.2022.9774570","DOIUrl":"https://doi.org/10.23919/DATE54114.2022.9774570","url":null,"abstract":"One of the most popular manufacturing techniques is milling. It can be used to make a variety of geometric components, such as flat grooves, surfaces, etc. The condition of the milling tool has a major impact on the quality of milling processes. Hence the importance of follow-up. When working on monitoring solutions, it is crucial to take into account different operating variables, such as rotational speed, especially in real world experiences. This work addresses the topic of predictive maintenance by exploiting the fusion of sensor data and the artificial intelligence-based analysis of signals measured by sensors. With a set of data such as vibration and sound reflection from the sensors, we focus on finding solutions for the task of detecting the health condition of machines. A Deep Convolutional Neural Network (DCNN) model is provided with fusion at the sensor data level to detect five consecutive health states of a milling tool; From a healthier state to a state of degradation. In addition, a demonstrator is built with Simulink to simulate and visualize the detection process. To examine the capacity of our model, the signal data was processed individually and subsequently merged. Experiments were carried out on three sets of data recorded during a real milling process. Results using the proposed DCNN architecture with raw data have reached an accuracy of more than 94% for all data sets.","PeriodicalId":232583,"journal":{"name":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"145 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121150527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Full-credit Flow Control: A Novel Technique to Implement Deadlock-free Adaptive Routing","authors":"Yi Dai, K. Lu, Sheng Ma, Junsheng Chang","doi":"10.23919/DATE54114.2022.9774519","DOIUrl":"https://doi.org/10.23919/DATE54114.2022.9774519","url":null,"abstract":"Deadlock-free adaptive routing is extensively adopted in interconnection networks to improve communication bandwidth and reduce latency. However, existing deadlock-free flow control schemes either underutilize memory resources due to inefficient buffer management for simple hardware implementations, or rely on complicated coordination and synchronization mechanisms with high hardware complexity. In this work, we solve the deadlock problem from a different perspective by considering the deadlock as a lack of credit. With minor modifications of the credit accumulation procedure, our proposed full-credit flow control (FFC) ensures atomic buffer usage only based on local credit status while making full use of the buffer space. FFC can be easily integrated in the industrial router to achieve deadlock freedom with less area and power consumption, but 112% higher throughput, compared to the critical bubble scheme (CBS). We further propose a credit reservation strategy to eliminate the escape virtual channel (VC) cost for fully adaptive routing implementation. The synthesizing results demonstrate that FFC along with credit reservation (FFC-CR) can reduce the area by 29% and power consumption by 26% compared with CBS.","PeriodicalId":232583,"journal":{"name":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116447356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SCI-FI: Control Signal, Code, and Control Flow Integrity against Fault Injection Attacks","authors":"Thomas Chamelot, Damien Couroussé, K. Heydemann","doi":"10.23919/DATE54114.2022.9774685","DOIUrl":"https://doi.org/10.23919/DATE54114.2022.9774685","url":null,"abstract":"Fault injection attacks have become a serious threat against embedded systems. Recently, Laurent et al. have reported that some faults inside the microarchitecture escape all typical software fault models and so software counter-measures. Moreover, state-of-the-art counter-measures, hardware-only or with hardware support, do not consider the integrity of microarchitectural control signals that are the target of these faults. We present SCI-FI, a counter-measure for Control Signal, Code, and Control-Flow Integrity against Fault Injection attacks. SCI-FI combines the protection of pipeline control signals with a fine-grained code and control-flow integrity mechanism, and can additionally provide code authentication. We evaluate SCI-FI by extending a RISC-V core. The average hardware area overheads range from 6.5% to 23.8%, and the average code size and execution time increase by 25.4% and 17.5% respectively.","PeriodicalId":232583,"journal":{"name":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134088528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Error Generation for 3D NAND Flash Memory","authors":"Weihua Liu, Fei Wu, Songmiao Meng, Xiang Chen, Changsheng Xie","doi":"10.23919/DATE54114.2022.9774514","DOIUrl":"https://doi.org/10.23919/DATE54114.2022.9774514","url":null,"abstract":"Three-dimension (3D) NAND flash memory is the preferred storage component of solid-state drive (SSD) for its high ratio of capacity and cost. Optimizing the reliability of modern SSD needs to test and collect a large amount of real-world error data from 3D NAND flash memory. However, the test costs have surged dozens of times as its capacity increases. It's imperative to reduce the costs of testing denser and high-capacity flash memory. To facilitate it, in this paper, we aim to enable reproducing error data efficiently for 3D NAND flash memory. We use a conditional generative adversarial network (cGAN) to learn the error distribution with multiple interferences and generate diverse error data comparable to the real-world. Evaluation results demonstrate it is feasible and efficient for error generation with cGAN.","PeriodicalId":232583,"journal":{"name":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132640950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Cossettini, Konstantin Taranov, Christian Vogt, M. Magno, T. Hoefler, L. Benini
{"title":"A RDMA Interface for Ultra-Fast Ultrasound Data-Streaming over an Optical Link","authors":"A. Cossettini, Konstantin Taranov, Christian Vogt, M. Magno, T. Hoefler, L. Benini","doi":"10.23919/DATE54114.2022.9774599","DOIUrl":"https://doi.org/10.23919/DATE54114.2022.9774599","url":null,"abstract":"Digital ultrasound (US) probes integrate the analog-to-digital conversion directly on the probe and can be conveniently connected to commodity devices. Existing digital probes are however limited to a relatively small number of channels, do not guarantee access to the raw US data, or cannot operate at very high frame rates (e.g., due to exhaustion of computing and storage units on the receiving device). In this work, we present an open, compact, power-efficient, 192-channels digital US data acquisition system capable of streaming US data at transfer rates greater than 80 Gbps towards a host PC for ultra-high frame rate imaging (in the multi-kHz range). Our US probe is equipped with two power-efficient Field Programmable Gate Arrays (FPGAs) and is interfaced to the host PC with two optical-link 100G Ethernet connections. The high-speed performance is enabled by implementing a Remote Direct Memory Access (RDMA) communication protocol between the probe and the controlling PC, that utilizes a high-performance Non-Volatile Memory Express (NVMe) interface to store the streamed data. To the best of our knowledge, thanks to the achieved datarates, this is the first high-channel-count compact digital US platform capable of raw data streaming at frame rates of 20 kHz (for imaging at 3.5 cm depths), without the need for sparse sampling, consuming less than 40 W.","PeriodicalId":232583,"journal":{"name":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132753308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PoisonHD: Poison Attack on Brain-Inspired Hyperdimensional Computing","authors":"Ruixuan Wang, Xun Jiao","doi":"10.23919/DATE54114.2022.9774641","DOIUrl":"https://doi.org/10.23919/DATE54114.2022.9774641","url":null,"abstract":"While machine learning (ML) methods especially deep neural networks (DNNs) promise enormous societal and economic benefits, their deployments present daunting challenges due to intensive computational demands and high storage requirements. Brain-inspired hyperdimensional computing (HDC) has recently been introduced as an alternative computational model that mimics the “human brain” at the functionality level. HDC has already demonstrated promising accuracy and efficiency in multiple application domains including healthcare and robotics. However, the robustness and security aspects of HDC has not been systematically investigated and sufficiently examined. Poison attack is a commonly-seen attack on various ML models including DNNs. It injects noises to labels of training data to introduce classification error of ML models. This paper presents PoisonHD, an HDC-specific poison attack framework that maximizes its effectiveness in degrading the classification accuracy by leveraging the internal structural information of HDC models. By applying PoisonHD on three datasets, we show that PoisonHD can cause significantly greater accuracy drop on HDC model than a random label-flipping approach. We further develop a defense mechanism by designing an HDC-based data sanitization that can significantly recover the accuracy loss caused by poison attack. To the best of our knowledge, this is the first paper that studies the poison attack on HDC models.","PeriodicalId":232583,"journal":{"name":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132773382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guilherme Korol, M. Jordan, M. B. Rutzig, A. C. S. Beck
{"title":"AdaFlow: A Framework for Adaptive Dataflow CNN Acceleration on FPGAs","authors":"Guilherme Korol, M. Jordan, M. B. Rutzig, A. C. S. Beck","doi":"10.23919/DATE54114.2022.9774727","DOIUrl":"https://doi.org/10.23919/DATE54114.2022.9774727","url":null,"abstract":"To meet latency and privacy requirements, resource-hungry deep learning applications have been migrating to the Edge, where IoT devices can offload the inference processing to local Edge servers. Since FPGAs have successfully accelerated an increasing number of deep learning applications (especially CNN-based ones), they emerge as an effective alternative for Edge platforms. However, Edge applications may present highly unpredictable workloads, requiring runtime adaptability in the inference processing. Although some works apply model switching on CPU and GPU platforms by exploiting different pruning rates at runtime, so the inference can adapt according to some quality-performance trade-off, FPGA-based accelerators refrain from this approach since they are synthesized to specific CNN models. In this context, this work enables model switching on FPGAs by adding to the well-known FINN accelerator an extra level of adaptability (i.e., flexibility) and support to the dynamic use of pruning via fast model switch on flexible accelerators, at the cost of some extra logic, or via FPGA reconfigurations of fixed accelerators. From that, we developed AdaFlow: a framework that automatically builds, at design time, a library from these new available versions (flexible and fixed, pruned or not) that will be used, at runtime, to dynamically select a given version according to a user-configurable accuracy threshold and current workload conditions. We have evaluated AdaFlow under a smart Edge surveillance application with two CNN models and two datasets, showing that AdaFlow processes, on average, 1.3× more inferences and increases, on average, 1.4× the power efficiency over state-of-the-art statically deployed dataflow accelerators.","PeriodicalId":232583,"journal":{"name":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"201 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132504401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards ADC-Less Compute-In-Memory Accelerators for Energy Efficient Deep Learning","authors":"Utkarsh Saxena, I. Chakraborty, K. Roy","doi":"10.23919/DATE54114.2022.9774573","DOIUrl":"https://doi.org/10.23919/DATE54114.2022.9774573","url":null,"abstract":"Compute-in-Memory (CiM) hardware has shown great potential in accelerating Deep Neural Networks (DNNs). However, most CiM accelerators for matrix vector multiplication rely on costly analog to digital converters (ADCs) which becomes a bottleneck in achieving high energy efficiency. In this work, we propose a hardware-software co-design approach to reduce the aforementioned ADC costs through partial-sum quantization. Specifically, we replace ADCs with 1-bit sense amplifiers and develop a quantization aware training methodology to compensate for the loss in representation ability. We show that the proposed ADC-less DNN model achieves 1.1x-9.6x reduction in energy consumption while maintaining accuracy within 1% of the DNN model without partial-sum quantization.","PeriodicalId":232583,"journal":{"name":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"616 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133825308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}