Yon Vanommeslaeghe, J. Denil, Jasper De Viaene, D. Ceulemans, S. Derammelaere, P. D. Meulenaere
{"title":"Leveraging Domain Knowledge for the Efficient Design-Space Exploration of Advanced Cyber-Physical Systems","authors":"Yon Vanommeslaeghe, J. Denil, Jasper De Viaene, D. Ceulemans, S. Derammelaere, P. D. Meulenaere","doi":"10.1109/DSD.2019.00058","DOIUrl":"https://doi.org/10.1109/DSD.2019.00058","url":null,"abstract":"Cyber-physical systems are becoming increasingly complex. In these advanced systems, the different engineering domains involved in the design process become more and more intertwined. In these situations, a traditional (sequential) design process becomes inefficient in finding good designs options. Instead, an integrated approach is needed where parameters in both the control and embedded domain can be chosen, evaluated and optimized to have a good solution in both domains. However, in such an approach, the combined design space becomes vast. As such, methods are needed to mitigate this problem. In this paper, we show how domain knowledge can be used to guide the design-space exploration process for an advanced control system and its deployment on embedded hardware. We use domain knowledge, captured in an ontology, to reason about the relationships between parameters in the different domains. This leads to a stepwise design space-exploration process where this domain knowledge is used to quickly reduce the design space to a subset of likely good candidates. In this process, we make use of cross-domain evaluation to find feasible design options with good system-level performance.","PeriodicalId":217233,"journal":{"name":"2019 22nd Euromicro Conference on Digital System Design (DSD)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124456945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
B. Ustaoğlu, S. Huhn, F. Sill, Daniel Große, R. Drechsler
{"title":"SAT-Hard: A Learning-Based Hardware SAT-Solver","authors":"B. Ustaoğlu, S. Huhn, F. Sill, Daniel Große, R. Drechsler","doi":"10.1109/DSD.2019.00021","DOIUrl":"https://doi.org/10.1109/DSD.2019.00021","url":null,"abstract":"Within the last decades, tremendous research work has been carried out on the development of software-based algorithms to solve the Boolean Satisfiability Problem. These SAT-solvers have then been heavily orchestrated for addressing complex computational tasks like the verification of circuits. In this field, most of the applied techniques focused only on the design phase of the circuit. Due to this fact, new approaches have been published in the literature solely focusing on online verification as well as self-verification. These kind of solutions strictly require Hardware (HW) SAT-solvers that can be integrated into a system while introducing only low hardware overhead and still providing high flexibility. By following these observations, this work presents SAT-Hard: In contrast to the state-of-the-art, SAT-Hard takes advantage of learning techniques to support features like clause learning and non-chronological backtracking, and combines them within a lightweight and standalone HW device. By this, a run-time speed-up of 2,000x can be achieved. Furthermore, the experimental evaluation clearly demonstrates that those complex problems can be solved in less than 20 seconds. Particularly due to its compactness, SAT-Hard is suitable for self-verification that enables the continuous verification of an integrated system during its lifetime.","PeriodicalId":217233,"journal":{"name":"2019 22nd Euromicro Conference on Digital System Design (DSD)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127364311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sebastian Vogel, R. Raghunath, A. Guntoro, Kristof Van Laerhoven, G. Ascheid
{"title":"Bit-Shift-Based Accelerator for CNNs with Selectable Accuracy and Throughput","authors":"Sebastian Vogel, R. Raghunath, A. Guntoro, Kristof Van Laerhoven, G. Ascheid","doi":"10.1109/DSD.2019.00106","DOIUrl":"https://doi.org/10.1109/DSD.2019.00106","url":null,"abstract":"Hardware accelerators for compute intensive algorithms such as convolutional neural networks benefit from number representations with reduced precision. In this paper, we evaluate and extend a number representation based on power-of-two quantization enabling bit-shift-based processing of multiplications. We found that weights of a neural network can either be represented by a single 4 bit power-of-two value or with two 4 bit values depending on accuracy requirements. We evaluate the classification accuracy of VGG-16 and ResNet50 on the ImageNet dataset with weights represented in our novel number format. To include a more complex task, we additionally evaluate the format on two networks for semantic segmentation. In addition, we design a novel processing element based on bit-shifts which is configurable in terms of throughput (4 bit mode) and accuracy (8 bit mode). We evaluate this processing element in an FPGA implementation of a dedicated accelerator for neural networks incorporating a 32-by-64 processing array running at 250 MHz with 1 TOp/s peak throughput in 8 bit mode. The accelerator is capable of processing regular convolutional layers and dilated convolutions in combination with pooling and upsampling. For a semantic segmentation network with 108.5 GOp/frame, our FPGA implementation achieves a throughput of 7.0 FPS in the 8 bit accurate mode and upto 11.2 FPS in the 4 bit mode corresponding to 760.1 GOp/s and 1,218 GOp/s effective throughput, respectively. Finally, we compare the novel design to classical multiplier-based approaches in terms of FPGA utilization and power consumption. Our novel multiply-accumulate engines designed for the optimized number representation uses 9 % less logical elements while allowing double throughput compared to a classical implementation. Moreover, a measurement shows 25 % reduction of power consumption at same throughput. Therefore, our flexible design offers a solution to the trade-off between energy efficiency, accuracy, and high throughput.","PeriodicalId":217233,"journal":{"name":"2019 22nd Euromicro Conference on Digital System Design (DSD)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121404223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
David Trilla, Carles Hernández, J. Abella, F. Cazorla
{"title":"Modeling the Impact of Process Variations in Worst-Case Energy Consumption Estimation","authors":"David Trilla, Carles Hernández, J. Abella, F. Cazorla","doi":"10.1109/DSD.2019.00092","DOIUrl":"https://doi.org/10.1109/DSD.2019.00092","url":null,"abstract":"The advent of autonomous power-limited systems poses a new challenge for system verification. Powerful processors needed to enable autonomous operation, are typically power-hungry, jeopardizing battery duration. Therefore, guaranteeing a given battery duration requires worst-case energy consumption (WCEC) estimation for tasks running on those systems. Unfortunately, processor energy and power can suffer significant variation across different units due to process variation (PV), i.e. variability in the electrical properties of transistors and wires due to imperfect manufacturing, which challenges existing WCEC estimation methods for applications. In this paper, we propose a statistical modeling approach to capture PV impact on applications energy and a methodology to compute their WCEC capturing PV, as required to deploy portable critical devices.","PeriodicalId":217233,"journal":{"name":"2019 22nd Euromicro Conference on Digital System Design (DSD)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121632653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reliability Assessment of Flooded Min-Sum LDPC Decoders Based on Sub-Threshold Processing Units","authors":"S. Nimara","doi":"10.1109/DSD.2019.00096","DOIUrl":"https://doi.org/10.1109/DSD.2019.00096","url":null,"abstract":"This paper aims to evaluate the performance degradation of faulty flooded Min-Sum LDPC decoder architectures based on sub-threshold processing units, by performing hierarchical decomposition of combinational and sequential sub-blocks of processing units described at RTL level. Logic synthesis of the combinational sub-blocks is performed and faults are injected for each logic gate according to a delay-dependent fault model for critical and non-critical paths of the design. The impact of the probabilistic behavior of sub-threshold gates on the error-correction performance of the decoder is analyzed in terms of bit error rate (BER) metrics for Binary Additive White Gaussian Noise (BiAWGN) communication channel model.","PeriodicalId":217233,"journal":{"name":"2019 22nd Euromicro Conference on Digital System Design (DSD)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121801732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Combinational Decompressors with Nonlinear Codes","authors":"O. Novák, M. Rozkovec, Jan Plíva","doi":"10.1109/DSD.2019.00078","DOIUrl":"https://doi.org/10.1109/DSD.2019.00078","url":null,"abstract":"Test patterns are transferred from the tester to the circuit under test in a compressed form as it minimizes test access mechanism bandwidth and transfer time. It was found that nonlinear binary codes could be useful for encoding test patterns in a similar way as linear ones and the compression efficiency may be higher. The key important characteristics of the nonlinear codes are that the number of codeword bits may be higher than it is for the linear code words while the number of specified bits is preserved. The nonlinear binary codes can be used in test pattern decompressors. It causes better encoding characteristics can be obtained for a higher number of parallel scan chains comparing with linear codes. In this paper, we propose a relatively fast heuristics that can be used for finding the nonlinear function truth tables guaranteeing the required number of specified bits and enables hardware overhead minimization. We quantify the benefits and costs of decompressors with nonlinear codes and verify the benchmark circuit test pattern encoding efficiency.","PeriodicalId":217233,"journal":{"name":"2019 22nd Euromicro Conference on Digital System Design (DSD)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131880789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lorenzo Servadei, Zhao Han, Michael Werner, W. Ecker, Keerthikumara Devarajegowda
{"title":"Formal Verification Methodology in an Industrial Setup","authors":"Lorenzo Servadei, Zhao Han, Michael Werner, W. Ecker, Keerthikumara Devarajegowda","doi":"10.1109/DSD.2019.00094","DOIUrl":"https://doi.org/10.1109/DSD.2019.00094","url":null,"abstract":"This paper presents a practical methodology for applying formal verification on industrial designs. The methodology is developed considering the quality, efficiency and productivity required in an industrial verification setup. The flow proposes a systematic approach addressing various aspects of the formal verification. First, the design implementation (RTL) is analyzed for its formal friendliness based on several predefined criteria. Next, a property automation flow is adapted for an efficient property development. Later, a series of verification tasks, grouped into formal test plan and formal execution plan are carried out to reach the formal sign-off stage. To demonstrate the applicability and effectiveness of the methodology, the proposed flow has been successfully applied on several industrial designs. In this paper, we consider the formal verification of Error Correction Codes, generally implemented in program and data flash memory interfaces to benchmark the proposed flow. Automatic property generation flow is used to generate an optimal property set with varying abstraction levels. The property proof runtimes are drastically reduced and better coverage compared to the previous hand-written properties has been achieved. New RTL bugs and specification errors have been found that were previously missed during the simulation.","PeriodicalId":217233,"journal":{"name":"2019 22nd Euromicro Conference on Digital System Design (DSD)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129828011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generating Efficient Parallel Code from the RVC-CAL Dataflow Language","authors":"Omair Rafique, Florian Krebs, K. Schneider","doi":"10.1109/DSD.2019.00035","DOIUrl":"https://doi.org/10.1109/DSD.2019.00035","url":null,"abstract":"The RVC-CAL language is used for implementing dataflow process networks (DPNs), i.e., distributed systems of actors. The behavior of an actor is defined by a set of actions which can consume input tokens and produce output tokens. RVC-CAL DPNs can offer parallelism both at the level of actors and at the level of actions. To efficiently execute these models on a target hardware, it is important to generate parallel code based on the entire parallelism provided by these two levels. In this paper, we discuss criteria for the generation of parallel software from RVC-CAL models based on the potential parallelism of modeled behaviors. The approach considers both the coarse-grained (task-parallel) execution of actors using multithreading and the fine-grained (data-parallel) execution of their actions using the open computing language (OpenCL) or even a higher-level layer of OpenCL, namely SYCL. The methodology is validated by benchmarks on OpenCL abstracted hardware platforms. Based on the experimental results, the methodology is evaluated for efficiency (performance) in comparison with a pure multithreaded C++ approach and a well-known reference framework.","PeriodicalId":217233,"journal":{"name":"2019 22nd Euromicro Conference on Digital System Design (DSD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129403011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anastasios Fanariotis, T. Orphanoudakis, V. Fotopoulos, P. Kitsos
{"title":"DSD-i1: A Mixed Functionality Development Board Geared Towards Digital Systems Design Education","authors":"Anastasios Fanariotis, T. Orphanoudakis, V. Fotopoulos, P. Kitsos","doi":"10.1109/DSD.2019.00032","DOIUrl":"https://doi.org/10.1109/DSD.2019.00032","url":null,"abstract":"The present paper describes the design and implementation of a development board, designed in Digital Systems and Media Computing Laboratory of the Hellenic Open University which is very active in the field of digital systems design. The board hosts an MCU and an FPGA on the same PCB, cooperating with tight interconnection between them and supported by a set of basic on-board peripherals that cover some of the most essential educational examples in the field, minimizing the need for external devices. The design is geared i) towards ease of use in order to alleviate any initial setup stress by students or inexperienced designers and ii) towards low-cost fabrication in order to facilitate educational institutions who provide distance education to offer this board to every student for out-of-laboratory usage. The design process, consists of three main stages; the PCB design, the firmware development and the lastly the host computer software development. The architectural and design choices made for each stage are described fully later on the paper with each decision balancing between ease-of-use, cost and functionality, in the form of offered services. The board functions as a very low-cost laboratory educational platform for both low level HDL training as well as higher level MCU Firmware programming, supporting even more complex scenarios of FPGA softcore usage and programming or concurrent usage of FPGA and MCU in complete System-on-Chip (SoC) designs.","PeriodicalId":217233,"journal":{"name":"2019 22nd Euromicro Conference on Digital System Design (DSD)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124528516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Application of Hyper-Heuristics to Flexible Manufacturing Systems","authors":"Alexis Linard, Joost van Pinxten","doi":"10.1109/DSD.2019.00057","DOIUrl":"https://doi.org/10.1109/DSD.2019.00057","url":null,"abstract":"Optimizing the productivity of Flexible Manufacturing Systems requires online scheduling to ensure that the timing constraints due to complex interactions between modules are satisfied. This work focuses on optimizing a ranking metric such that the online scheduler locally (i.e., per product) chooses an option that yields the highest productivity in the long term. In this paper, we focus on the scheduling of a re-entrant Flexible Manufacturing System, more specifically a Large Scale Printer capable of printing hundreds of sheets per minute. The system requires an online scheduler that determines for each sheet when it should enter the system, be printed for the first time, and when it should return for its second print. We have applied genetic programming, a hyper-heuristic, to heuristically find good ranking metrics that can be used in an online scheduling heuristic. The results show that metrics can be tuned for different job types, to increase the productivity of such systems. Our methods achieved a significant reduction in the jobs' makespan.","PeriodicalId":217233,"journal":{"name":"2019 22nd Euromicro Conference on Digital System Design (DSD)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124595571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}