{"title":"Safety-Driven DNN Sizing for Vehicular CPS","authors":"Tingan Zhu;Mier Li;Bineet Ghosh;Samarjit Chakraborty;Parasara Sridhar Duggirala","doi":"10.1109/LES.2025.3595839","DOIUrl":"https://doi.org/10.1109/LES.2025.3595839","url":null,"abstract":"Perception processing in cyber–physical systems (CPSs) is now almost exclusively done using deep neural networks (DNNs). Here, camera, radar, and LiDAR data—in autonomous vehicles or robots—is fed into DNNs that detect surrounding obstacles and distances to them. These results are used by controllers to compute appropriate actuation signals. But a CPS typically has multiple state components, where each of them might be estimated using a different camera, radar or LiDAR and an associated DNN. Hence, an emerging problem is to implement multiple DNNs on a resource-constrained graphics processing unit (GPU). While many GPUs from NVIDIA and AMD allow them to be split into multiple virtual GPUs, there is little work on how to partition them, and therefore size the corresponding DNNs, when they are a part of the same CPS. In contrast to the existing practice of focusing on the inference accuracy of individual DNNs in isolation, we propose a system-level safety-driven DNN sizing (and hence GPU partitioning) scheme for vehicular CPS. Our main technical contribution is a detailed experimental evaluation of this DNN sizing approach and an empirical validation of the formal technique behind it.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"18 2","pages":"164-167"},"PeriodicalIF":2.0,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147737036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Justin Beaurivage;Messaoud Ahmed Ouameur;Frédéric Domingue
{"title":"A Memory Representation of Random Forests Optimized for Resource-Limited Embedded Devices","authors":"Justin Beaurivage;Messaoud Ahmed Ouameur;Frédéric Domingue","doi":"10.1109/LES.2025.3574563","DOIUrl":"https://doi.org/10.1109/LES.2025.3574563","url":null,"abstract":"Random forests (RFs) are a versatile and effective machine learning technique widely applied across various tasks. With the increasing demand for deploying machine learning models on resource-constrained embedded devices, such as microcontrollers, challenges arise from the growing complexity of modern datasets. These challenges often result in models that are too large in memory and storage requirements to be feasibly implemented on small devices. In this letter, we propose a lossless memory representation of RFs that significantly limits the amount of random-access memory (RAM) required for prediction tasks, while also reducing the amount of nonvolatile memory needed to store the model. The approach achieves efficiency by embedding the data of leaf nodes within the decision nodes, thereby streamlining the tree structure. Additionally, it supports in-place prediction without requiring a decompression step. To evaluate our method, we implemented four RFs derived from real-world datasets onto four microcontroller platforms. Our results demonstrate that prediction tasks can be performed using at most 144 bytes of RAM for classification tasks, and at most 48 bytes for regression tasks, while memory accesses account for a maximum of 27.0% of the total CPU cycles. On the fastest platform, prediction times ranged between 59 and <inline-formula> <tex-math>$75~mu $ </tex-math></inline-formula>s, highlighting the suitability of this method for a variety of real-time applications.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"18 2","pages":"115-118"},"PeriodicalIF":2.0,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147737052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Padel: Priority-Based Real-Time Scheduling for GPUs","authors":"Atiyeh Gheibi-Fetrat;Sepideh Safari;Amirsaeed Ahmadi-Tonekaboni;Shaahin Hessabi;Hamid Sarbazi-Azad","doi":"10.1109/LES.2025.3589370","DOIUrl":"https://doi.org/10.1109/LES.2025.3589370","url":null,"abstract":"Graphics processing units (GPUs) have become increasingly prevalent in many platforms, including real-time systems, due to their massive architectural parallelism and significant performance. Power and energy management, reducing deadline miss rate (DMR), and efficient allocation of resources to tasks are important design challenges in exploiting GPUs within real-time platforms. One of the most considerable challenges in designing firm real-time systems is allocating GPU resources to tasks in a way that enables as many tasks as possible to be completed correctly within their deadlines while minimizing energy consumption. However, since multiple tasks can be handled concurrently across many cores, power consumption poses a serious limitation. In this work, we introduce Padel, a real-time GPU scheduler that employs spatial multitasking to enhance the utilization of resources, performance, and energy efficiency in GPUs. Our experimental results reveal that when the system is overloaded, Padel can reduce the DMR by 23%, and energy consumption by 8% compared to the state of the art.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"18 2","pages":"85-89"},"PeriodicalIF":2.0,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147737078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FPGA-Based Real-Time Multi-Class Vehicle Classification Using mmWave Radar","authors":"Anand Mohan;Hemant Kumar Meena;Mohd Wajid;Abhishek Srivastava","doi":"10.1109/LES.2025.3586098","DOIUrl":"https://doi.org/10.1109/LES.2025.3586098","url":null,"abstract":"The present study introduces field-programmable gate array (FPGA)-based Real-Time multiclass vehicle classification using millimeter wave radar (mmWave radar), which overcomes the limitations of conventional sensors such as LiDAR and cameras, which are sensitive to adverse weather and lighting conditions. On a hardware-software platform, the implementation of multiclass vehicle classification demonstrated its effectiveness. Within the realm of multiclass vehicle classification applications, the FPGA-based PYNQ-ZU (Python Productivity for Zynq) serves as an efficient embedded architecture. The reliability and accuracy of this method are improved, rendering it a promising solution for autonomous vehicles and advanced driver assistance systems (ADASs) in a variety of driving scenarios. We employed 3-D point cloud data produced by mmWave radar via a PC, then transformed it into 2-D point cloud images by top-view filtration methods. This method demonstrated greater efficacy in feature extraction with VGG-16. Multiple machine learning models were employed for classification tasks on both hardware and software platforms, achieving 100% accuracy with the random forest (RF) algorithm.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"18 2","pages":"132-135"},"PeriodicalIF":2.0,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147737035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Zero-Shot Fused Attention-Based GPT-2 Accelerator for Resource-Constrained Embedded Platform","authors":"Abhishek Yadav;Ayush Dixit;Binod Kumar","doi":"10.1109/LES.2025.3583743","DOIUrl":"https://doi.org/10.1109/LES.2025.3583743","url":null,"abstract":"This letter proposes a hardware-software co-design approach to accelerate inference of generative pretrained Transformer (GPT-2) for resource-constrained embedded applications. Essentially, a standard configuration of GPT-2 (Python-based software implementation) is redefined with high-level language (C++) to ultimately design a dedicated and optimized hardware logic of GPT-2 as an IP core, taking resources available on the ZCU104 Field-programmable gate array (FPGA) board into account. The approach leverages a zero-shot learning setup, buffer tiling, and compiler directives for implementing a fused attention-based GPT-2 architecture with ZCU104, ensuring maximum computational power is effectively squeezed from available resources and tradeoff between throughput (samples/second), power consumption (W), energy efficiency (mJ), and resource utilization is balanced. The proposed optimizations improve throughput by 25.3% (from 67.11 to 84.03 samples/sec) compared to the baseline. Moreover, a comprehensive investigation of the proposed optimization is done by leveraging the impact of layer fusion on latency, utilization, and throughput. Also, the generalizability of the proposed approach is validated by implementing various configurations of GPT-2. Codes and subsequent files are available at GitHub Repository.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"18 2","pages":"107-110"},"PeriodicalIF":2.0,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147737055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Quantitative Security Ranking Method of PUF Based on the Rademacher Complexity of PUFs","authors":"Xuexiang Deng;Xiaole Cui;Xing Zhang","doi":"10.1109/LES.2025.3586724","DOIUrl":"https://doi.org/10.1109/LES.2025.3586724","url":null,"abstract":"physical unclonable function (PUF) is regarded as one of the promising hardware security primitives. However, the modeling attack poses a real threat to the security of PUFs in recent years. So the security becomes an important characteristic of PUF, in addition to the randomness, uniqueness and reliability. Researchers have proposed some methods to evaluate the security of PUFs. However, the quantitative security ranking method of different PUFs is still an open issue. This work introduces the Rademacher complexity of PUF, abbreviated as the R complexity, to evaluate the security of PUFs. A ranking method of PUF security is proposed based on the R complexity. The proposed method is able to against the noise effect of challenge response pairs (CRPs). The securities of twelve different types of PUFs with different sizes are ranked by the proposed method. The ranking results are in line with the results from the security improvement practices of these PUFs, which verify the effectiveness of the proposed ranking method.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"18 2","pages":"119-122"},"PeriodicalIF":2.0,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147737075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Compressing Runtime Memory Usage via Activation Remapping for Deploying Deep Neural Networks on MCUs","authors":"Jinyu Zhan;Xiang Wang;Wei Jiang;Suidi Peng","doi":"10.1109/LES.2025.3571799","DOIUrl":"https://doi.org/10.1109/LES.2025.3571799","url":null,"abstract":"Deploying deep neural networks (DNNs) on microcontroller units (MCUs) has received increasing attentions. Most existing DNN compression algorithms focus on reducing the parameters of DNN to fit the storage constraints of MCUs. However, runtime memory of MCUs is more limited, and these methods are insufficiently optimized for memory usage, which lower the inference efficiency of DNN models on MCUs. Therefore, we propose a runtime memory compression method based on activation remapping to optimize the runtime memory usage on MCUs. By analyzing the frequency distribution of activation values, we introduce Huffman encoding and remap activation values by dynamic range merging to compress the runtime memory usage of MCUs. In addition, a global frequency table based on activation distributions is designed to further reduce the computation and storage overheads on MCUs. Experimental results show that our method can improve the memory compression ratio of MobileNet by up to 26.9% with the accuracy loss of less than 1%, compared with three state-of-the-art methods.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"18 2","pages":"94-98"},"PeriodicalIF":2.0,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147736995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design of Approximate Floating-Point Arithmetic Units Using Hardware-Efficient Rounding Schemes","authors":"Myeongjin Kwak;Seokhyeon Lee;Yongtae Kim","doi":"10.1109/LES.2025.3593921","DOIUrl":"https://doi.org/10.1109/LES.2025.3593921","url":null,"abstract":"This letter presents novel approximate floating-point arithmetic units that leverage hardware-efficient rounding schemes to enhance computational accuracy while significantly improving hardware efficiency. By eliminating the need for guard, round, and sticky bits, the proposed designs achieve substantial reductions in area, power, delay, and energy up to 42.3%, 32.1%, 39.3%, and 58.8%, respectively, compared to conventional exact floating-point units implemented in a 28-nm CMOS technology. Experimental evaluations across multiple applications, including image processing, convolutional neural networks (CNNs), and spiking neural networks (SNNs), demonstrate that the proposed approach maintains high computational accuracy. Evaluations across image processing and neural network tasks show minimal quality and accuracy degradation, confirming the suitability of the proposed designs for energy-efficient applications.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"18 2","pages":"144-147"},"PeriodicalIF":2.0,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147737083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design and Implementation of RISC-V-Based SoC for Electric Vehicle Traction Application","authors":"G. Renjith;C. V. Raghu","doi":"10.1109/LES.2025.3594596","DOIUrl":"https://doi.org/10.1109/LES.2025.3594596","url":null,"abstract":"This letter presents the design and implementation of a RISC-V-based system-on-chip (SoC) on the Arty A7-100T FPGA platform, specifically optimized for electric vehicle (EV) traction control applications. By leveraging the open-source Shakti E-Class core, the proposed SoC integrates deterministic Six-Pulse commutation, hardware-accelerated pulse width modulation (PWM), and robust fault resilience to address critical real-time control challenges inherent in legacy microcontroller. Empirical validation demonstrates a 35% reduction in Mean RMS speed error, less than 2% speed oscillation amplitude under load perturbations, and a 39% improvement in worst-case response time compared to conventional systems. The architecture’s extensibility and security features underscore RISC-V’s potential as a scalable, cost-efficient foundation for next-generation EV controllers, despite current deployment as a standalone system with network integration envisioned for future development.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"18 2","pages":"148-151"},"PeriodicalIF":2.0,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147737115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}