Ahmad Reda, Afulay Ahmed Bouzid, Alhasan Zghaibe, Daniel Drótos, Vásárhelyi József
{"title":"Model predictive-based DNN control model for automated steering deployed on FPGA using an automatic IP generator tool","authors":"Ahmad Reda, Afulay Ahmed Bouzid, Alhasan Zghaibe, Daniel Drótos, Vásárhelyi József","doi":"10.1007/s10617-024-09287-x","DOIUrl":"https://doi.org/10.1007/s10617-024-09287-x","url":null,"abstract":"<p>With the increase in the non-linearity and complexity of the driving system’s environment, developing and optimizing related applications is becoming more crucial and remains an open challenge for researchers and automotive companies alike. Model predictive control (MPC) is a well-known classic control strategy used to solve online optimization problems. MPC is computationally expensive and resource-consuming. Recently, machine learning has become an effective alternative to classical control systems. This paper provides a developed deep neural network (DNN)-based control strategy for automated steering deployed on FPGA. The DNN model was designed and trained based on the behavior of the traditional MPC controller. The performance of the DNN model is evaluated compared to the performance of the designed MPC which already proved its merit in automated driving task. A new automatic intellectual property generator based on the Xilinx system generator (XSG) has been developed, not only to perform the deployment but also to optimize it. The performance was evaluated based on the ability of the controllers to drive the lateral deviation and yaw angle of the vehicle to be as close as possible to zero. The DNN model was implemented on FPGA using two different data types, fixed-point and floating-point, in order to evaluate the efficiency in the terms of performance and resource consumption. The obtained results show that the suggested DNN model provided a satisfactory performance and successfully imitated the behavior of the traditional MPC with a very small root mean square error (RMSE = 0.011228 rad). Additionally, the results show that the deployments using fixed-point data greatly reduced resource consumption compared to the floating-point data type while maintaining satisfactory performance and meeting the safety conditions</p>","PeriodicalId":50594,"journal":{"name":"Design Automation for Embedded Systems","volume":"41 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141779776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel Reiser, Junchao Chen, Johannes Knödtel, Andrea Baroni, Miloš Krstić, Marc Reichenbach
{"title":"Design and analysis of an adaptive radiation resilient RRAM subsystem for processing systems in satellites","authors":"Daniel Reiser, Junchao Chen, Johannes Knödtel, Andrea Baroni, Miloš Krstić, Marc Reichenbach","doi":"10.1007/s10617-024-09285-z","DOIUrl":"https://doi.org/10.1007/s10617-024-09285-z","url":null,"abstract":"<p>Among the numerous benefits that novel RRAM devices offer over conventional memory technologies is an inherent resilience to the effects of radiation. Hence, they appear suitable for use as a memory subsystem in a computer architecture for satellites. In addition to memory devices resistant to radiation, the concept of applying protective measures dynamically promises a system with low susceptibility to errors during radiation events, while also ensuring efficient performance in the absence of radiation events. This paper presents the first RRAM-based memory subsystem for satellites with a dynamic response to radiation events. We integrate this subsystem into a computing platform that employs the same dynamic principles for its processing system and implements modules for timely detection and even prediction of radiation events. To determine which protection mechanism is optimal, we examine various approaches and simulate the probability of errors in memory. Additionally, we are studying the impact on the overall system by investigating different software algorithms and their radiation robustness requirements using a fault injection simulation. Finally, we propose a potential implementation of the dynamic RRAM-based memory subsystem that includes different levels of protection and can be used for real applications in satellites.</p>","PeriodicalId":50594,"journal":{"name":"Design Automation for Embedded Systems","volume":"4 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140578611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving edge AI for industrial IoT applications with distributed learning using consensus","authors":"Samuel Fidelis, Márcio Castro, Frank Siqueira","doi":"10.1007/s10617-024-09284-0","DOIUrl":"https://doi.org/10.1007/s10617-024-09284-0","url":null,"abstract":"<p>Internet of Things (IoT) devices produce massive amounts of data in a very short time. Transferring these data to the cloud to be analyzed may be prohibitive for applications that require near real-time processing. One solution to meet such timing requirements is to bring most data processing closer to IoT devices (i.e., to the edge). In this context, the present work proposes a distributed architecture that meets the timing requirements imposed by Industrial IoT (IIoT) applications that need to apply Machine Learning (ML) models with high accuracy and low latency. This is done by dividing the tasks of storing and processing data into different layers—mist, fog, and cloud—using the cloud layer only for the tasks related to long-term storage of summarized data and hosting of necessary reports and dashboards. The proposed architecture employs ML inferences in the edge layer in a distributed fashion, where each edge node is either responsible for applying a different ML technique or the same technique but with a different training data set. Then, a consensus algorithm takes the ML inference results from the edge nodes to decide the result of the inference, thus improving the system’s overall accuracy. Results obtained with two different data sets show that the proposed approach can improve the accuracy of the ML models without significantly compromising the response time.</p>","PeriodicalId":50594,"journal":{"name":"Design Automation for Embedded Systems","volume":"43 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140578540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christian Eichler, Jonas Röckl, Benedikt Jung, Ralph Schlenk, Tilo Müller, Timo Hönig
{"title":"Profiling with trust: system monitoring from trusted execution environments","authors":"Christian Eichler, Jonas Röckl, Benedikt Jung, Ralph Schlenk, Tilo Müller, Timo Hönig","doi":"10.1007/s10617-024-09283-1","DOIUrl":"https://doi.org/10.1007/s10617-024-09283-1","url":null,"abstract":"<p>Large-scale attacks on IoT and edge computing devices pose a significant threat. As a prominent example, Mirai is an IoT botnet with 600,000 infected devices around the globe, capable of conducting effective and targeted DDoS attacks on (critical) infrastructure. Driven by the substantial impacts of attacks, manufacturers and system integrators propose Trusted Execution Environments (TEEs) that have gained significant importance recently. TEEs offer an execution environment to run small portions of code isolated from the rest of the system, even if the operating system is compromised. In this publication, we examine TEEs in the context of system monitoring and introduce the Trusted Monitor (TM), a novel anomaly detection system that runs within a TEE. The TM continuously profiles the system using hardware performance counters and utilizes an application-specific machine-learning model for anomaly detection. In our evaluation, we demonstrate that the TM accurately classifies 86% of 183 tested workloads, with an overhead of less than 2%. Notably, we show that a real-world kernel-level rootkit has observable effects on performance counters, allowing the TM to detect it. Major parts of the TM are implemented in the Rust programming language, eliminating common security-critical programming errors.</p>","PeriodicalId":50594,"journal":{"name":"Design Automation for Embedded Systems","volume":"8 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139768834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohammad Hassani Sadi, Chirag Sudarshan, Norbert Wehn
{"title":"Novel adaptive quantization methodology for 8-bit floating-point DNN training","authors":"Mohammad Hassani Sadi, Chirag Sudarshan, Norbert Wehn","doi":"10.1007/s10617-024-09282-2","DOIUrl":"https://doi.org/10.1007/s10617-024-09282-2","url":null,"abstract":"<p>There is a high energy cost associated with training Deep Neural Networks (DNNs). Off-chip memory access contributes a major portion to the overall energy consumption. Reduction in the number of off-chip memory transactions can be achieved by quantizing the data words to low data bit-width (E.g., 8-bit). However, low-bit-width data formats suffer from a limited dynamic range, resulting in reduced accuracy. In this paper, a novel 8-bit Floating Point (FP8) data format quantized DNN training methodology is presented, which adapts to the required dynamic range on-the-fly. Our methodology relies on varying the bias values of FP8 format to fit the dynamic range to the required range of DNN parameters and input feature maps. The range fitting during the training is adaptively performed by an online statistical analysis hardware unit without stalling the computation units or its data accesses. Our approach is compatible with any DNN compute cores without any major modifications to the architecture. We propose to integrate the new FP8 quantization unit in the memory controller. The FP32 data from the compute core are converted to FP8 in the memory controller before writing to the DRAM and converted back after reading the data from DRAM. Our results show that the DRAM access energy is reduced by 3.07<span>(times )</span> while using an 8-bit data format instead of using 32-bit. The accuracy loss of the proposed methodology with 8-bit quantized training is <span>(approx 1%)</span> for various networks with image and natural language processing datasets.\u0000</p>","PeriodicalId":50594,"journal":{"name":"Design Automation for Embedded Systems","volume":"41 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139768862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
José Luis Conradi Hoffmann, Leonardo Passig Horstmann, Antônio Augusto Fröhlich
{"title":"Transparent integration of autonomous vehicles simulation tools with a data-centric middleware","authors":"José Luis Conradi Hoffmann, Leonardo Passig Horstmann, Antônio Augusto Fröhlich","doi":"10.1007/s10617-023-09280-w","DOIUrl":"https://doi.org/10.1007/s10617-023-09280-w","url":null,"abstract":"<p>Simulations are key steps in the design, implementation, and verification of autonomous vehicles (AV). Parallel to this, typical simulation tools fail to integrate the entirety of the aspects related to the complexity of AV applications, such as data communication delay, security, and the integration of software/hardware-in-the-loop and other simulation tools. This work proposes a SmartData-based middleware to integrate AV simulators and external tools. The interface models the data used on a simulator and creates an intermediary layer between the simulator and the external tools by defining the inputs and outputs as SmartData. A message bus is used for communication between SmartData following their Interest relations. Messages are exchanged following a specific protocol. Nevertheless, the architecture presented is agnostic of protocol. Moreover, we present a data-centric AV design integrated into the middleware. The design considers the standardization of the data interfaces between AV components, including sensing, perception, planning, decision, and actuation. Therefore, the presented design promotes a transparent integration of the AV simulation with other simulators (e.g., network simulators), cloud services, fault injection mechanisms, digital twins, and hardware-in-the-loop scenarios. Moreover, the design allows for transparent, runtime component replacement and time synchronization, the modularization of the vehicle components, and the addition of security aspects in the simulation. We present a case-study application with an AV simulation using CARLA, and we measure the end-to-end delay and overhead incurred in the simulation by our middleware. An increase in the end-to-end delay was measured once data communication was not acknowledged in the original scenario, and data was assumed to be ready for processing with no communication delay between sensors, decision-making, and actuation units.</p>","PeriodicalId":50594,"journal":{"name":"Design Automation for Embedded Systems","volume":"26 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139373954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the impact of hardware-related events on the execution of real-time programs","authors":"","doi":"10.1007/s10617-023-09281-9","DOIUrl":"https://doi.org/10.1007/s10617-023-09281-9","url":null,"abstract":"<h3>Abstract</h3> <p>Estimating safe upper bounds on execution times of programs is required in the design of predictable real-time systems. When multi-core, instruction pipeline, branch prediction, or cache memory are in place, due to the considerable complexity traditional static timing analysis faces, measurement-based timing analysis (MBTA) is a more tractable option. MBTA estimates upper bounds on execution times using data measured under the execution of representative execution scenarios. In this context, understanding how hardware-related events affect the executing program under analysis brings about useful information for MBTA. This paper contributes to this need by modeling the execution behavior of programs in function of hardware-related events. More specifically, for a program under analysis, we show that the number of cycles per executed instruction can be correlated to hardware-related event occurrences. We apply our modeling methodology to two architectures, ARMv7 Cortex-M4 and Cortex-A53. While all hardware events can be monitored at once in the former, the latter allows simultaneous monitoring of up to 6 out of 59 events. We then describe a method to select the most relevant hardware events that affect the execution of a program under analysis. These events are then used to model the program behavior via machine learning techniques under different execution scenarios. The effectiveness of this method is evaluated by extensive experiments. Obtained results revealed prediction errors below 20%, showing that the chosen events can largely explain the execution behavior of programs.</p>","PeriodicalId":50594,"journal":{"name":"Design Automation for Embedded Systems","volume":"119 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139066478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Jordan, J. Vicenzi, Tiago Knorst, Guilherme Korol, Antonio Carlos Schneider Beck, M. B. Rutzig
{"title":"Multiprovision: a Design Space Exploration tool for multi-tenant resource provisioning in CPU–GPU environments","authors":"M. Jordan, J. Vicenzi, Tiago Knorst, Guilherme Korol, Antonio Carlos Schneider Beck, M. B. Rutzig","doi":"10.1007/s10617-023-09279-3","DOIUrl":"https://doi.org/10.1007/s10617-023-09279-3","url":null,"abstract":"","PeriodicalId":50594,"journal":{"name":"Design Automation for Embedded Systems","volume":"47 2","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138952408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Leonardo Passig Horstmann, José Luis Conradi Hoffmann, Antônio Augusto Fröhlich
{"title":"Monitoring the performance of multicore embedded systems without disrupting its timing requirements","authors":"Leonardo Passig Horstmann, José Luis Conradi Hoffmann, Antônio Augusto Fröhlich","doi":"10.1007/s10617-023-09278-4","DOIUrl":"https://doi.org/10.1007/s10617-023-09278-4","url":null,"abstract":"<p>Monitoring the performance of multicore embedded systems is crucial to properly ensure their timing requirements. Collecting performance data is also very relevant for optimization and validation efforts. However, the strategies used to monitor and capture data in such systems are complex to design and implement since they must not interfere with the running system beyond the point at which the system’s timing and performance characteristics start to get affected by the monitoring strategies. In this paper, we extend a monitoring framework developed in previous work to encompass three monitoring strategies, namely Active and Passive Periodic monitoring and Job-based monitoring. Periodic monitoring follows a given sampling rate. Active Periodic relies on periodic timer interrupts to guarantee deterministic sampling, while Passive Periodic trades determinism for a less invasive strategy, sampling data only when ordinary system events are handled. Job-based follows an event-driven monitoring that samples data whenever a job leaves the CPU, thus building isolated traces for each job. We evaluate them according to overhead, latency, and jitter, where none of them presented an average impact on the system execution time higher than <span>(0.3%)</span>. Moreover, a qualitative analysis is conducted in terms of data quality. On one hand, while Periodic monitoring allows for configurable sampling rates, it does not account for the rescheduling of jobs and may capture mixed traces. On the other hand, Job-based monitoring provides data samples tied to the execution of each job while disregarding sampling rate configuration and may lose track of instant measures.</p>","PeriodicalId":50594,"journal":{"name":"Design Automation for Embedded Systems","volume":"32 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138686297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jamile Vasconcelos, George Lima, Marwan Wehaiba El Khazen, Adriana Gogonel, Liliana Cucu-Grosjean
{"title":"On vulnerabilities in EVT-based timing analysis: an experimental investigation on a multi-core architecture","authors":"Jamile Vasconcelos, George Lima, Marwan Wehaiba El Khazen, Adriana Gogonel, Liliana Cucu-Grosjean","doi":"10.1007/s10617-023-09277-5","DOIUrl":"https://doi.org/10.1007/s10617-023-09277-5","url":null,"abstract":"","PeriodicalId":50594,"journal":{"name":"Design Automation for Embedded Systems","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135944710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}