IEEE Embedded Systems Letters最新文献_第2页

FSMA: Fine-Grained Interlayer Scheduling and Mapping Co-Exploration Framework for Chiplet-Based DNN Accelerators 基于芯片的深度神经网络加速器的细粒度层间调度和映射协同探索框架

IF 2 4区计算机科学

IEEE Embedded Systems Letters Pub Date : 2026-04-01 Epub Date: 2025-07-18 DOI: 10.1109/LES.2025.3590369

Tao Lu;Yongchang Zhang;Peilin Wang;Haiqiu Huang;Zhirong Ye;Mingyu Wang

{"title":"FSMA: Fine-Grained Interlayer Scheduling and Mapping Co-Exploration Framework for Chiplet-Based DNN Accelerators","authors":"Tao Lu;Yongchang Zhang;Peilin Wang;Haiqiu Huang;Zhirong Ye;Mingyu Wang","doi":"10.1109/LES.2025.3590369","DOIUrl":"https://doi.org/10.1109/LES.2025.3590369","url":null,"abstract":"In modern computing environments, chiplet-based deep neural network (DNN) accelerators have emerged as a promising technology for embedded systems. However, chiplet-based designs have led to complex topologies, nonuniform links, and interchiplet connections with lower bandwidth and higher power consumption, posing challenges to scheduling and mapping for enhancing system performance and energy efficiency. To effectively deploy DNN workloads, this letter introduced a fine-grained interlayer scheduling and mapping co-exploration framework, FSMA, which is well-suited for chiplet-based DNN inference embedded systems. To maximize the optimization opportunities, an interlayer scheduling and mapping encoding scheme centered on fine-grained multidimensional cut is proposed. To further explore the optimization space delimited by the encoding scheme, an exploration algorithm with the operators specifically designed is developed to modify the scheduling and mapping collaboration. Experimental results have proven the superiority of the proposed scheduling and mapping optimization for chiplet scenarios across diverse topologies and routing algorithms. Compared with the prior designs, FSMA achieves reductions of 47.05% and 28.9% in the energy-delay product (EDP) on average, respectively.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"18 2","pages":"152-155"},"PeriodicalIF":2.0,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147737143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Agile Systolic Array-Based Hardware Accelerator for Scalable Multi-Head Self-Attention 基于灵活收缩阵列的可扩展多头自注意硬件加速器

IF 2 4区计算机科学

IEEE Embedded Systems Letters Pub Date : 2026-04-01 Epub Date: 2025-08-05 DOI: 10.1109/LES.2025.3595586

Xinyu Chen;Yu Li;Beining Zhao;Yintao Liu;Shan Cao;Zhiyuan Jiang

{"title":"An Agile Systolic Array-Based Hardware Accelerator for Scalable Multi-Head Self-Attention","authors":"Xinyu Chen;Yu Li;Beining Zhao;Yintao Liu;Shan Cao;Zhiyuan Jiang","doi":"10.1109/LES.2025.3595586","DOIUrl":"https://doi.org/10.1109/LES.2025.3595586","url":null,"abstract":"As attention mechanisms find increasing application in computer vision, numerous neural networks are now integrating convolutional neural networks (CNNs) with attention layers to enhance performance. As a typical representative, the Bottleneck Transformer (BoT) network effectively balances the extraction of local features and modeling of global context, leading to significant performance improvements. However, the multihead self-attention (MHSA) layer of BoT deviates from conventional MHSA by incorporating a wider range of matrix operations, posing challenges in computational complexity and memory bandwidth. This letter introduces a customized accelerator for the MHSA layer of BoT. The accelerator employs a configurable systolic array architecture, designed to support various matrix operations, including matrix transposition, addition, multiplication, and a hardware-optimized Softmax, facilitating hybrid scalable MHSA layers. Implemented on an FPGA, the proposed accelerator achieves a performance of 143.98 GOPS at a frequency of 320 MHz. When compared to CPU implementations, our design offers a <inline-formula> <tex-math>$7.18times $ </tex-math></inline-formula> reduction in inference time.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"18 2","pages":"90-93"},"PeriodicalIF":2.0,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147737080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Software-Hardware Exploration of Early-Exit Neural Networks on Edge Accelerators 边缘加速器上早期退出神经网络的软硬件探索

IF 2 4区计算机科学

IEEE Embedded Systems Letters Pub Date : 2026-04-01 Epub Date: 2025-03-26 DOI: 10.1109/LES.2025.3573404

Qianying Gong;Biqing Duan;Shengfa Miao;Zhenli He;Di Liu

{"title":"Software-Hardware Exploration of Early-Exit Neural Networks on Edge Accelerators","authors":"Qianying Gong;Biqing Duan;Shengfa Miao;Zhenli He;Di Liu","doi":"10.1109/LES.2025.3573404","DOIUrl":"https://doi.org/10.1109/LES.2025.3573404","url":null,"abstract":"In this article, we propose a Software–Hardware exploration framework for early-exit (EE) Neural Networks (SHEN), to optimally explore the configurations of an EE network on customizable edge accelerators. <inline-formula> <tex-math>$textsf {SHEN}$ </tex-math></inline-formula> consists of two parts: 1) a newly proposed method to optimally determine the number and positions of intermediate classifiers for an EE network and 2) a genetic-based method to explore the optimal configuration of an accelerator for the network. We can utilize <inline-formula> <tex-math>$textsf {SHEN}$ </tex-math></inline-formula> to generate a series of design points for an EE network and accelerators, allowing for the selection of a suitable design configuration for a specific requirement. Experimental results show that <inline-formula> <tex-math>$textsf {SHEN}$ </tex-math></inline-formula> finds better EENN configurations with increased accuracy by up to 5% compared to its backbone counterpart and state-of-the-art. Our exploration also shows its effectiveness in reducing energy-delay product (EDP), where <inline-formula> <tex-math>$textsf {SHEN}$ </tex-math></inline-formula> reduces EDP by up to 85% and 41% compared to its backbone counterpart and state-of-the-art. Our code is available at <uri>https://github.com/</uri> qianying-gong/SHEN.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"18 2","pages":"136-139"},"PeriodicalIF":2.0,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147737129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SecCAN: An Extended CAN Controller With Embedded Intrusion Detection SecCAN：带有嵌入式入侵检测的扩展CAN控制器

IF 2 4区计算机科学

IEEE Embedded Systems Letters Pub Date : 2026-04-01 Epub Date: 2025-03-22 DOI: 10.1109/LES.2025.3572858

Shashwat Khandelwal;Shanker Shreejith

{"title":"SecCAN: An Extended CAN Controller With Embedded Intrusion Detection","authors":"Shashwat Khandelwal;Shanker Shreejith","doi":"10.1109/LES.2025.3572858","DOIUrl":"https://doi.org/10.1109/LES.2025.3572858","url":null,"abstract":"Recent research has highlighted the vulnerability of in-vehicle network protocols, such as controller area networks (CANs) and proposed machine learning-based intrusion detection systems (IDSs) as an effective mitigation technique. However, their efficient integration into vehicular architecture is nontrivial, with existing methods relying on electronic control units (ECUs)-coupled IDS accelerators or dedicated ECUs as IDS accelerators. Here, initiating IDS requires complete reception of a CAN message from the controller, incurring data movement and software overheads. In this letter, we present SecCAN, a novel CAN controller architecture that embeds IDS capability within the datapath of the controller. This integration allows IDS to tap messages directly from within the CAN controller as they are received from the bus, removing overheads incurred by existing ML-based IDSs. A custom-quantised machine-learning accelerator is developed as the IDS engine and embedded into SecCAN’s receive data path, with optimisations to overlap the IDS inference with the protocol’s reception window. We implement SecCAN on AMD XCZU7EV FPGA to quantify its performance and benefits in hardware, using multiple attack datasets. We show that SecCAN can completely hide the IDS latency within the CAN reception window for all CAN packet sizes and detect multiple attacks with state-of-the-art accuracy with zero software overheads on the ECU and low energy overhead (<inline-formula> <tex-math>$73.7~mu $ </tex-math></inline-formula> J per message) for IDS inference. Also, SecCAN incurs limited resource overhead compared to a standard CAN controller (<30%LUT, <1%FF), making it ideally suited for automotive deployment.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"18 2","pages":"123-127"},"PeriodicalIF":2.0,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147737054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Weak PUF-Based Variable Latency Obfuscation Technique for ML-Attack Resilient Arbiter PUFs 基于弱puf的ml攻击弹性仲裁puf可变延迟混淆技术

IF 2 4区计算机科学

IEEE Embedded Systems Letters Pub Date : 2026-04-01 Epub Date: 2025-07-25 DOI: 10.1109/LES.2025.3592298

Aranya Gupta;Bishnu Prasad Das;Sanjeev Manhas;Rajat Sadhukhan

{"title":"Weak PUF-Based Variable Latency Obfuscation Technique for ML-Attack Resilient Arbiter PUFs","authors":"Aranya Gupta;Bishnu Prasad Das;Sanjeev Manhas;Rajat Sadhukhan","doi":"10.1109/LES.2025.3592298","DOIUrl":"https://doi.org/10.1109/LES.2025.3592298","url":null,"abstract":"The conventional linear feedback shift register (LFSR)-based arbiter physical unclonable function (APUF) suffers from machine learning (ML) attacks, which is a challenge for employment in real-world authentication scenarios. To mitigate this vulnerability, this article presents a weak PUF-assisted and challenge-dependent dynamic and nonlinear feedback shift register (DNLFSR)-based APUF. The proposed technique introduces dynamic and variable iteration-based LFSR to make the generated challenge-response pair (CRP) space more nonlinear, which effectively prevents attackers from modeling the APUF. The proposed DNLFSR-based APUF achieves <inline-formula> <tex-math>$approx $ </tex-math></inline-formula> 50% prediction accuracy (similar to random guessing) against various ML attacks carried out through exhaustive testing over 1 million CRPs. Additionally, the proposed technique reduces latency by <inline-formula> <tex-math>$approx n/2$ </tex-math></inline-formula> clock cycles for an n-stage LFSR compared to state-of-the-art techniques. Extensive performance evaluation of the proposed DNLFSR-based APUF on both Python simulation and FPGA hardware implementation shows near-ideal uniformity, uniqueness, and reliability for 64-bit and 128-bit DNLFSR-based APUF. Moreover, the proposed design consumes low hardware overhead, showing a minimum 43.2% reduction in LUT usage compared to the state-of-the-art techniques, making it a lightweight solution for IoT applications.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"18 2","pages":"140-143"},"PeriodicalIF":2.0,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147737111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

LION: A Learned Index for On-Device Sensor Data Management LION：用于设备上传感器数据管理的学习索引

IF 2 4区计算机科学

IEEE Embedded Systems Letters Pub Date : 2026-04-01 Epub Date: 2025-06-19 DOI: 10.1109/LES.2025.3580080

Taeyoon Park;Haena Lee;Christine Euna Jung;Wook-Hee Kim;Hyun-Wook Jin

引用次数: 0

Microcontroller-Based Real-Time Power Monitoring Module by Hybrid Power Factor Selection Algorithm 基于混合功率因数选择算法的微控制器实时功率监测模块

IF 2 4区计算机科学

IEEE Embedded Systems Letters Pub Date : 2026-04-01 Epub Date: 2025-08-27 DOI: 10.1109/LES.2025.3602829

Abhijit Dey;Supratik Mondal;Biswajit Chakraborty;Sovan Dalai;Kesab Bhattacharya

引用次数: 0

Compatibility Analysis and Smooth Transition of Heterogeneous Controllers in Longitudinal Merging Platoons 纵向合并排异构控制器的兼容性分析与平稳过渡

IF 2 4区计算机科学

IEEE Embedded Systems Letters Pub Date : 2026-04-01 Epub Date: 2025-08-01 DOI: 10.1109/LES.2025.3594953

Pintusorn Suttiponpisarn;Chung-Wei Lin

{"title":"Compatibility Analysis and Smooth Transition of Heterogeneous Controllers in Longitudinal Merging Platoons","authors":"Pintusorn Suttiponpisarn;Chung-Wei Lin","doi":"10.1109/LES.2025.3594953","DOIUrl":"https://doi.org/10.1109/LES.2025.3594953","url":null,"abstract":"Vehicle platoon merging presents significant challenges due to varying speeds, gap estimation errors, and synchronization issues. Heterogeneous controllers, in particular, can cause mismatches in control dynamics, resulting in inconsistent acceleration responses, unstable gap spacing, and unpredictable behaviors that negatively impact user comfort and safety. This work investigates the merging performance of 25 controller combinations, pairing five controllers and testing them across three merging scenarios, with a focus on both user comfort and safety gaps. We identify key tradeoffs: mismatches between reactive and predictive controllers often lead to higher jerk, and some joining controllers experience elevated jerk during merging with braking disturbances, resulting in poor comfort. To address these issues, we propose an adaptive transitory controller—a simple assistance module for the joining platoon leader—that enhances compatibility with various preceding controllers and merging scenarios. Results show consistent performance and a 58.38% improvement in user comfort while maintaining gap safety. We also discuss the limitations of our approach and propose directions for future work to further improve robustness.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"18 2","pages":"160-163"},"PeriodicalIF":2.0,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147736981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhanced Prefetching via Dynamic Multistep SARSA-Based Reinforcement Learning 基于sarsa的动态多步强化学习增强预取

IF 2 4区计算机科学

IEEE Embedded Systems Letters Pub Date : 2026-04-01 Epub Date: 2025-03-29 DOI: 10.1109/LES.2025.3574850

Satyaswaroop Nayak;Anadi Goyal;Sheel Sindhu Manohar;Dip Sankar Banerjee;Palash Das

引用次数: 0

Hardware/Software Co-Design of Multilevel Edge Detector on Low-Cost FPGA-Based Embedded Heterogeneous Architecture 基于低成本fpga嵌入式异构架构的多电平边缘检测器软硬件协同设计

IF 2 4区计算机科学

IEEE Embedded Systems Letters Pub Date : 2026-04-01 Epub Date: 2025-03-19 DOI: 10.1109/LES.2025.3571311

Ayoub Mamri;Abdelhafid El Hadri;Abdelaziz Benallegue

{"title":"Hardware/Software Co-Design of Multilevel Edge Detector on Low-Cost FPGA-Based Embedded Heterogeneous Architecture","authors":"Ayoub Mamri;Abdelhafid El Hadri;Abdelaziz Benallegue","doi":"10.1109/LES.2025.3571311","DOIUrl":"https://doi.org/10.1109/LES.2025.3571311","url":null,"abstract":"Edge detection is a fundamental aspect of computer vision, facilitating the extraction of crucial features, such as corners, edges, and line segments, which are vital for Visual Odometry-based applications. While adding more stages to an edge detector can enhance feature precision, it also increases time consumption and resource requirements, necessitating hardware optimizations. To address these challenges, this letter presents a hardware/software co-design for an efficient pipeline implementation of a multilevel edge detector (MED) on a low-cost FPGA-based heterogeneous architecture. Leveraging this framework, we developed cascading duplicate shift registers based on the single-task (ST) mode of OpenCL, which achieves an efficient pipeline execution model for conventional cascade filters. Our ST implementation of the MED significantly outperforms the traditional NDRange implementation in both execution time and resource utilization on the targeted FPGA platform. This allows for effective deployment on the low-cost embedded DE1-SoC, with results validated through comparisons to the NVIDIA Jetson Nano GPU.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"18 2","pages":"99-102"},"PeriodicalIF":2.0,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147737076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0