Mahta Mayahinia;Tommaso Marinelli;Zhenlin Pei;Hsiao-Hsuan Liu;Chenyun Pan;Zsolt Tokei;Francky Catthoor;Mehdi B. Tahoori
{"title":"Dynamic Segmented Bus for Energy-Efficient Last-Level Cache in Advanced Interconnect-Dominant Nodes","authors":"Mahta Mayahinia;Tommaso Marinelli;Zhenlin Pei;Hsiao-Hsuan Liu;Chenyun Pan;Zsolt Tokei;Francky Catthoor;Mehdi B. Tahoori","doi":"10.1109/LES.2024.3444711","DOIUrl":"https://doi.org/10.1109/LES.2024.3444711","url":null,"abstract":"To deal with stagnated performance and energy improved by successive technology scaling, system-technology co-optimization (STCO) comes as a rescue which involves the co-optimization of the important system parameters from the high-level application all the way down to the low-level technology. This article addresses the interconnect dominance issue in advanced nodes as a bottleneck in energy-efficient static RAM (SRAM)-based last-level cache (LLC) and aims to mitigate it through an STCO mechanism. Our main approach in this work is the utilization of a workload-aware controlled dynamic segmented bus (DSB) as the intramacro (interbanks) interconnect. Based on our results, our approach can improve the energy efficiency of the SRAM-based LLC by an average of 35%.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"16 4","pages":"321-324"},"PeriodicalIF":1.7,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142789112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SPELL: An End-to-End Tool Flow for LLM-Guided Secure SoC Design for Embedded Systems","authors":"Sudipta Paria;Aritra Dasgupta;Swarup Bhunia","doi":"10.1109/LES.2024.3447691","DOIUrl":"https://doi.org/10.1109/LES.2024.3447691","url":null,"abstract":"Modern embedded systems and Internet of Things (IoT) devices contain system-on-chips (SoCs) as their hardware backbone, which increasingly contain many critical assets (secure communication keys, configuration bits, firmware, sensitive data, etc.). These critical assets must be protected against wide array of potential vulnerabilities to uphold the system’s confidentiality, integrity, and availability. Today’s SoC designs contain diverse intellectual property (IP) blocks, often acquired from multiple 3rd-party IP vendors. Secure hardware design using them inevitably relies on the accrued domain knowledge of well-trained security experts. In this letter, we introduce \u0000<monospace>SPELL</monospace>\u0000, a novel end-to-end framework for the automated development of secure SoC designs. It leverages conversational large language models (LLMs) to automatically identify security vulnerabilities in a target SoC and map them to the evolving database of common weakness enumerations (CWEs); \u0000<monospace>SPELL</monospace>\u0000 then filters the relevant CWEs, subsequently converting them to systemverilog assertions (SVAs) for verification; and finally, addresses the vulnerabilities via centralized security policy enforcement. We have implemented the \u0000<monospace>SPELL</monospace>\u0000 framework using popular LLMs, such as ChatGPT and GEMINI, to analyze their efficacy in generating appropriate CWEs from user-defined SoC specifications and implement corresponding security policies for an open-source SoC benchmark. We have also explored the limitations of existing pretrained conversational LLMs in this context.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"16 4","pages":"365-368"},"PeriodicalIF":1.7,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142789075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Konstantinos Balaskas;Heba Khdr;Mohammed Bakr Sikal;Fabian Kreß;Kostas Siozios;Jürgen Becker;Jörg Henkel
{"title":"Heterogeneous Accelerator Design for Multi-DNN Workloads via Heuristic Optimization","authors":"Konstantinos Balaskas;Heba Khdr;Mohammed Bakr Sikal;Fabian Kreß;Kostas Siozios;Jürgen Becker;Jörg Henkel","doi":"10.1109/LES.2024.3443628","DOIUrl":"https://doi.org/10.1109/LES.2024.3443628","url":null,"abstract":"The significant advancements of deep neural networks (DNNs) in a wide range of application domains have spawned the need for more specialized, sophisticated solutions in the form of multi-DNN workloads. Heterogeneous DNN accelerators have emerged as an elegant solution to tackle the workloads’ inherent diversity, achieving significant improvements compared to homogeneous solutions. However, utilizing off-the-shelf architectures provides suboptimal adaptability to given workloads, whereas custom design approaches offer limited heterogeneity, and thus reduced gains. In this letter, we combat these shortcomings and propose an exploration-based framework to holistically design heterogeneous accelerators, tailored for multi-DNN workloads. Our framework is workload-agnostic and leverages architectural heterogeneity to its full potential, by integrating low-precision arithmetic and custom structural parameters. We explore the formed design space, targeting to minimize the system’s energy-delay product (EDP) via heuristic techniques. Our proposed accelerators achieve, on average, a significant \u0000<inline-formula> <tex-math>$5.5times $ </tex-math></inline-formula>\u0000 reduction in EDP compared to the state of the art across various multi-DNN workloads.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"16 4","pages":"317-320"},"PeriodicalIF":1.7,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142789084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Toward Precision-Aware Safe Neural-Controlled Cyber–Physical Systems","authors":"Harikishan Thevendhriya;Sumana Ghosh;Debasmita Lohar","doi":"10.1109/LES.2024.3444004","DOIUrl":"https://doi.org/10.1109/LES.2024.3444004","url":null,"abstract":"The safety of neural network (NN) controllers is crucial, specifically in the context of safety-critical Cyber-Physical System (CPS) applications. Current safety verification focuses on the reachability analysis, considering the bounded errors from the noisy environments or inaccurate implementations. However, it assumes real-valued arithmetic and does not account for the fixed-point quantization often used in the embedded systems. Some recent efforts have focused on generating the sound quantized NN implementations in fixed-point, ensuring specific target error bounds, but they assume the safety of NNs is already proven. To bridge this gap, we introduce Nexus, a novel two-phase framework combining reachability analysis with sound NN quantization. Nexus provides an end-to-end solution that ensures CPS safety within bounded errors while generating mixed-precision fixed-point implementations for the NN controllers. Additionally, we optimize these implementations for the automated parallelization on the FPGAs using a commercial HLS compiler, reducing the machine cycles significantly.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"16 4","pages":"397-400"},"PeriodicalIF":1.7,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142789159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anthony Faure-Gignoux;Kevin Delmas;Adrien Gauffriau;Claire Pagetti
{"title":"Methodology for Formal Verification of Hardware Safety Strategies Using SMT","authors":"Anthony Faure-Gignoux;Kevin Delmas;Adrien Gauffriau;Claire Pagetti","doi":"10.1109/LES.2024.3439859","DOIUrl":"https://doi.org/10.1109/LES.2024.3439859","url":null,"abstract":"Safety-critical embedded systems must maintain their functionality even in the presence of single permanent hardware failure. Naive redundancy of hardware is often unaffordable and impractical, therefore alternative strategies must be explored for minimal cost fault tolerance. The objective of this article is to propose a methodology to evaluate formally safety strategies using satisfiability modulo theory solvers. Practically, the approach consists in providing a bounded model checking demonstration applied to the formal model of hardware. We show the capabilities of the approach on an efficient hardware accelerator designed to perform parallel computations of matrix multiplications and convolutions.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"16 4","pages":"381-384"},"PeriodicalIF":1.7,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142789037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Co-Designing Perception-Based Autonomous Systems on CPU-GPU Platforms","authors":"Suraj Singh;Ashiqur Rahaman Molla;Arijit Mondal;Soumyajit Dey","doi":"10.1109/LES.2024.3443135","DOIUrl":"https://doi.org/10.1109/LES.2024.3443135","url":null,"abstract":"Perception-based autonomous system design methods are widely adopted in various domains like transportation, industrial robotics, etc. However, attaining safe and predictable execution in such systems depends on the platform-level integration of perception and control tasks. This letter presents a novel methodology to co-optimize these tasks, assuming a CPU-GPU-based real-time platform, a common choice of compute resource in this domain. Unlike the traditional methods that separately address AI-based sensing and control concerns, we consider that the overall performance of the system depends on the inferencing accuracy of the perception tasks and the performance of the control tasks iteratively executing in a feedback loop. We propose a design-space exploration methodology that considers the above concern and validates the same on an autonomous driving use case using a novel simulation setup.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"16 4","pages":"357-360"},"PeriodicalIF":1.7,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142789038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Florentia Afentaki;Paula Carolina Lozano Duarte;Georgios Zervakis;Mehdi B. Tahoori
{"title":"Reducing ADC Front-End Costs During Training of On-Sensor Printed Multilayer Perceptrons","authors":"Florentia Afentaki;Paula Carolina Lozano Duarte;Georgios Zervakis;Mehdi B. Tahoori","doi":"10.1109/LES.2024.3447412","DOIUrl":"https://doi.org/10.1109/LES.2024.3447412","url":null,"abstract":"Printed electronics (PEs) technology offers a cost-effective and fully-customizable solution to computational needs beyond the capabilities of traditional silicon technologies, offering advantages, such as on-demand manufacturing and conformal, low-cost hardware. However, the low-resolution fabrication of PEs, which results in large feature sizes, poses a challenge for integrating complex designs like those of machine learning (ML) classification systems. Current literature optimizes only the multilayer perceptron (MLP) circuit within the classification system, while the cost of analog-to-digital converters (ADCs) is overlooked. Printed applications frequently require on-sensor processing, yet while the digital classifier has been extensively optimized, the analog-to-digital interfacing, specifically the ADCs, dominates the total area and energy consumption. In this letter, we target digital printed MLP classifiers and we propose the design of customized ADCs per MLP’s input which involves minimizing the distinct represented numbers for each input, simplifying thus the ADC’s circuitry. Incorporating this ADC optimization in the MLP training, enables eliminating ADC levels and the respective comparators, while still maintaining high classification accuracy. Our approach achieves \u0000<inline-formula> <tex-math>$11.2times $ </tex-math></inline-formula>\u0000 lower ADC area for less than 5% accuracy drop across varying MLPs.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"16 4","pages":"353-356"},"PeriodicalIF":1.7,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142789080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Characterizing CNN Throughput and Energy Under Multithreaded and Multiaccelerator Execution","authors":"M A Muneeb;Rajesh Kedia","doi":"10.1109/LES.2024.3446896","DOIUrl":"https://doi.org/10.1109/LES.2024.3446896","url":null,"abstract":"Emerging applications and batch processing convolutional neural network (CNN) workloads require executing multiple CNNs concurrently. A wide variety of CNN accelerators are available today and we characterize the support for concurrency for CNNs in such accelerators. We use a commercial-off-the-shelf CNN accelerator in multithreading and multiaccelerator modes and identify that upto \u0000<inline-formula> <tex-math>$3.98times $ </tex-math></inline-formula>\u0000 improvement in throughput and \u0000<inline-formula> <tex-math>$3.20times $ </tex-math></inline-formula>\u0000 improvement in energy per inference can be obtained even with just a single accelerator. Our detailed characterization of 104 CNN models, for three different sizes of accelerator, reveals many insights that connect CNN characteristics to improvement in throughput and energy. We also present a design space and a low error throughput estimation model to explore such a design space.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"16 4","pages":"369-372"},"PeriodicalIF":1.7,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142789039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maryam Eslami;Yuhao Liu;Salim Ullah;Mostafa E. Salehi;Reshad Hosseini;Seyed Ahmad Mirsalari;Akash Kumar
{"title":"MONO: Enhancing Bit-Flip Resilience With Bit Homogeneity for Neural Networks","authors":"Maryam Eslami;Yuhao Liu;Salim Ullah;Mostafa E. Salehi;Reshad Hosseini;Seyed Ahmad Mirsalari;Akash Kumar","doi":"10.1109/LES.2024.3444921","DOIUrl":"https://doi.org/10.1109/LES.2024.3444921","url":null,"abstract":"Deep neural networks (DNNs) have been applied across diverse domains, including safety-critical applications. Past studies indicate that DNNs are very sensitive to changes in weights and activations due to uneven bit-weight distribution in standard number formats like fixed points, which can cause significant output accuracy fluctuations. To address this issue, we introduce a new data type called MONO to enhance bit-flip resilience using uniformity at the bit level by employing symmetric weights for all bit positions. On average, MONO has improved error resilience more effectively than the fixed-point data type, even when utilizing triple modular redundancy (TMR) and most significant bit (MSB) protection, while maintaining low overhead.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"16 4","pages":"333-336"},"PeriodicalIF":1.7,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142789076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing HLS Performance Prediction on FPGAs Through Multimodal Representation Learning","authors":"Longshan Shang;Teng Wang;Lei Gong;Chao Wang;Xuehai Zhou","doi":"10.1109/LES.2024.3446797","DOIUrl":"https://doi.org/10.1109/LES.2024.3446797","url":null,"abstract":"The emergence of design space exploration (DSE) technology has reduced the cost of searching for pragma configurations that lead to optimal performance microarchitecture. However, obtaining synthesis reports for a single design candidate can be time-consuming, sometimes taking several hours or even tens of hours, rendering this process prohibitively expensive. Researchers have proposed many solutions to address this issue. Previous studies have focused on extracting features from a single modality, leading to challenges in comprehensively evaluating the quality of designs. To overcome this limitation, this letter introduces a novel modal-aware representation learning method for the evaluation of high-level synthesis (HLS) design, named MORPH, which integrates information from three data modalities to characterize HLS designs, including code, graph, and code description (caption) modality. Remarkably, our model outperforms the baseline, demonstrating a 6%–25% improvement in root mean squared error loss. Moreover, the transferability of our predictor has also been notably enhanced.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"16 4","pages":"385-388"},"PeriodicalIF":1.7,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142789151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}