IEEE Embedded Systems Letters最新文献

筛选
英文 中文
Heterogeneous Accelerator Design for Multi-DNN Workloads via Heuristic Optimization 基于启发式优化的多dnn工作负载异构加速器设计
IF 1.7 4区 计算机科学
IEEE Embedded Systems Letters Pub Date : 2024-12-05 DOI: 10.1109/LES.2024.3443628
Konstantinos Balaskas;Heba Khdr;Mohammed Bakr Sikal;Fabian Kreß;Kostas Siozios;Jürgen Becker;Jörg Henkel
{"title":"Heterogeneous Accelerator Design for Multi-DNN Workloads via Heuristic Optimization","authors":"Konstantinos Balaskas;Heba Khdr;Mohammed Bakr Sikal;Fabian Kreß;Kostas Siozios;Jürgen Becker;Jörg Henkel","doi":"10.1109/LES.2024.3443628","DOIUrl":"https://doi.org/10.1109/LES.2024.3443628","url":null,"abstract":"The significant advancements of deep neural networks (DNNs) in a wide range of application domains have spawned the need for more specialized, sophisticated solutions in the form of multi-DNN workloads. Heterogeneous DNN accelerators have emerged as an elegant solution to tackle the workloads’ inherent diversity, achieving significant improvements compared to homogeneous solutions. However, utilizing off-the-shelf architectures provides suboptimal adaptability to given workloads, whereas custom design approaches offer limited heterogeneity, and thus reduced gains. In this letter, we combat these shortcomings and propose an exploration-based framework to holistically design heterogeneous accelerators, tailored for multi-DNN workloads. Our framework is workload-agnostic and leverages architectural heterogeneity to its full potential, by integrating low-precision arithmetic and custom structural parameters. We explore the formed design space, targeting to minimize the system’s energy-delay product (EDP) via heuristic techniques. Our proposed accelerators achieve, on average, a significant \u0000<inline-formula> <tex-math>$5.5times $ </tex-math></inline-formula>\u0000 reduction in EDP compared to the state of the art across various multi-DNN workloads.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"16 4","pages":"317-320"},"PeriodicalIF":1.7,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142789084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Toward Precision-Aware Safe Neural-Controlled Cyber–Physical Systems 迈向精确感知安全神经控制的网络物理系统
IF 1.7 4区 计算机科学
IEEE Embedded Systems Letters Pub Date : 2024-12-05 DOI: 10.1109/LES.2024.3444004
Harikishan Thevendhriya;Sumana Ghosh;Debasmita Lohar
{"title":"Toward Precision-Aware Safe Neural-Controlled Cyber–Physical Systems","authors":"Harikishan Thevendhriya;Sumana Ghosh;Debasmita Lohar","doi":"10.1109/LES.2024.3444004","DOIUrl":"https://doi.org/10.1109/LES.2024.3444004","url":null,"abstract":"The safety of neural network (NN) controllers is crucial, specifically in the context of safety-critical Cyber-Physical System (CPS) applications. Current safety verification focuses on the reachability analysis, considering the bounded errors from the noisy environments or inaccurate implementations. However, it assumes real-valued arithmetic and does not account for the fixed-point quantization often used in the embedded systems. Some recent efforts have focused on generating the sound quantized NN implementations in fixed-point, ensuring specific target error bounds, but they assume the safety of NNs is already proven. To bridge this gap, we introduce Nexus, a novel two-phase framework combining reachability analysis with sound NN quantization. Nexus provides an end-to-end solution that ensures CPS safety within bounded errors while generating mixed-precision fixed-point implementations for the NN controllers. Additionally, we optimize these implementations for the automated parallelization on the FPGAs using a commercial HLS compiler, reducing the machine cycles significantly.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"16 4","pages":"397-400"},"PeriodicalIF":1.7,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142789159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Methodology for Formal Verification of Hardware Safety Strategies Using SMT 使用SMT的硬件安全策略的形式化验证方法
IF 1.7 4区 计算机科学
IEEE Embedded Systems Letters Pub Date : 2024-12-05 DOI: 10.1109/LES.2024.3439859
Anthony Faure-Gignoux;Kevin Delmas;Adrien Gauffriau;Claire Pagetti
{"title":"Methodology for Formal Verification of Hardware Safety Strategies Using SMT","authors":"Anthony Faure-Gignoux;Kevin Delmas;Adrien Gauffriau;Claire Pagetti","doi":"10.1109/LES.2024.3439859","DOIUrl":"https://doi.org/10.1109/LES.2024.3439859","url":null,"abstract":"Safety-critical embedded systems must maintain their functionality even in the presence of single permanent hardware failure. Naive redundancy of hardware is often unaffordable and impractical, therefore alternative strategies must be explored for minimal cost fault tolerance. The objective of this article is to propose a methodology to evaluate formally safety strategies using satisfiability modulo theory solvers. Practically, the approach consists in providing a bounded model checking demonstration applied to the formal model of hardware. We show the capabilities of the approach on an efficient hardware accelerator designed to perform parallel computations of matrix multiplications and convolutions.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"16 4","pages":"381-384"},"PeriodicalIF":1.7,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142789037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Co-Designing Perception-Based Autonomous Systems on CPU-GPU Platforms CPU-GPU平台上基于感知的自治系统协同设计
IF 1.7 4区 计算机科学
IEEE Embedded Systems Letters Pub Date : 2024-12-05 DOI: 10.1109/LES.2024.3443135
Suraj Singh;Ashiqur Rahaman Molla;Arijit Mondal;Soumyajit Dey
{"title":"Co-Designing Perception-Based Autonomous Systems on CPU-GPU Platforms","authors":"Suraj Singh;Ashiqur Rahaman Molla;Arijit Mondal;Soumyajit Dey","doi":"10.1109/LES.2024.3443135","DOIUrl":"https://doi.org/10.1109/LES.2024.3443135","url":null,"abstract":"Perception-based autonomous system design methods are widely adopted in various domains like transportation, industrial robotics, etc. However, attaining safe and predictable execution in such systems depends on the platform-level integration of perception and control tasks. This letter presents a novel methodology to co-optimize these tasks, assuming a CPU-GPU-based real-time platform, a common choice of compute resource in this domain. Unlike the traditional methods that separately address AI-based sensing and control concerns, we consider that the overall performance of the system depends on the inferencing accuracy of the perception tasks and the performance of the control tasks iteratively executing in a feedback loop. We propose a design-space exploration methodology that considers the above concern and validates the same on an autonomous driving use case using a novel simulation setup.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"16 4","pages":"357-360"},"PeriodicalIF":1.7,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142789038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reducing ADC Front-End Costs During Training of On-Sensor Printed Multilayer Perceptrons 在传感器印刷多层感知器训练过程中降低ADC前端成本
IF 1.7 4区 计算机科学
IEEE Embedded Systems Letters Pub Date : 2024-12-05 DOI: 10.1109/LES.2024.3447412
Florentia Afentaki;Paula Carolina Lozano Duarte;Georgios Zervakis;Mehdi B. Tahoori
{"title":"Reducing ADC Front-End Costs During Training of On-Sensor Printed Multilayer Perceptrons","authors":"Florentia Afentaki;Paula Carolina Lozano Duarte;Georgios Zervakis;Mehdi B. Tahoori","doi":"10.1109/LES.2024.3447412","DOIUrl":"https://doi.org/10.1109/LES.2024.3447412","url":null,"abstract":"Printed electronics (PEs) technology offers a cost-effective and fully-customizable solution to computational needs beyond the capabilities of traditional silicon technologies, offering advantages, such as on-demand manufacturing and conformal, low-cost hardware. However, the low-resolution fabrication of PEs, which results in large feature sizes, poses a challenge for integrating complex designs like those of machine learning (ML) classification systems. Current literature optimizes only the multilayer perceptron (MLP) circuit within the classification system, while the cost of analog-to-digital converters (ADCs) is overlooked. Printed applications frequently require on-sensor processing, yet while the digital classifier has been extensively optimized, the analog-to-digital interfacing, specifically the ADCs, dominates the total area and energy consumption. In this letter, we target digital printed MLP classifiers and we propose the design of customized ADCs per MLP’s input which involves minimizing the distinct represented numbers for each input, simplifying thus the ADC’s circuitry. Incorporating this ADC optimization in the MLP training, enables eliminating ADC levels and the respective comparators, while still maintaining high classification accuracy. Our approach achieves \u0000<inline-formula> <tex-math>$11.2times $ </tex-math></inline-formula>\u0000 lower ADC area for less than 5% accuracy drop across varying MLPs.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"16 4","pages":"353-356"},"PeriodicalIF":1.7,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142789080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Characterizing CNN Throughput and Energy Under Multithreaded and Multiaccelerator Execution 在多线程和多加速器执行下表征CNN吞吐量和能量
IF 1.7 4区 计算机科学
IEEE Embedded Systems Letters Pub Date : 2024-12-05 DOI: 10.1109/LES.2024.3446896
M A Muneeb;Rajesh Kedia
{"title":"Characterizing CNN Throughput and Energy Under Multithreaded and Multiaccelerator Execution","authors":"M A Muneeb;Rajesh Kedia","doi":"10.1109/LES.2024.3446896","DOIUrl":"https://doi.org/10.1109/LES.2024.3446896","url":null,"abstract":"Emerging applications and batch processing convolutional neural network (CNN) workloads require executing multiple CNNs concurrently. A wide variety of CNN accelerators are available today and we characterize the support for concurrency for CNNs in such accelerators. We use a commercial-off-the-shelf CNN accelerator in multithreading and multiaccelerator modes and identify that upto \u0000<inline-formula> <tex-math>$3.98times $ </tex-math></inline-formula>\u0000 improvement in throughput and \u0000<inline-formula> <tex-math>$3.20times $ </tex-math></inline-formula>\u0000 improvement in energy per inference can be obtained even with just a single accelerator. Our detailed characterization of 104 CNN models, for three different sizes of accelerator, reveals many insights that connect CNN characteristics to improvement in throughput and energy. We also present a design space and a low error throughput estimation model to explore such a design space.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"16 4","pages":"369-372"},"PeriodicalIF":1.7,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142789039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MONO: Enhancing Bit-Flip Resilience With Bit Homogeneity for Neural Networks MONO:利用位同质性增强神经网络的位翻转弹性
IF 1.7 4区 计算机科学
IEEE Embedded Systems Letters Pub Date : 2024-12-05 DOI: 10.1109/LES.2024.3444921
Maryam Eslami;Yuhao Liu;Salim Ullah;Mostafa E. Salehi;Reshad Hosseini;Seyed Ahmad Mirsalari;Akash Kumar
{"title":"MONO: Enhancing Bit-Flip Resilience With Bit Homogeneity for Neural Networks","authors":"Maryam Eslami;Yuhao Liu;Salim Ullah;Mostafa E. Salehi;Reshad Hosseini;Seyed Ahmad Mirsalari;Akash Kumar","doi":"10.1109/LES.2024.3444921","DOIUrl":"https://doi.org/10.1109/LES.2024.3444921","url":null,"abstract":"Deep neural networks (DNNs) have been applied across diverse domains, including safety-critical applications. Past studies indicate that DNNs are very sensitive to changes in weights and activations due to uneven bit-weight distribution in standard number formats like fixed points, which can cause significant output accuracy fluctuations. To address this issue, we introduce a new data type called MONO to enhance bit-flip resilience using uniformity at the bit level by employing symmetric weights for all bit positions. On average, MONO has improved error resilience more effectively than the fixed-point data type, even when utilizing triple modular redundancy (TMR) and most significant bit (MSB) protection, while maintaining low overhead.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"16 4","pages":"333-336"},"PeriodicalIF":1.7,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142789076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing HLS Performance Prediction on FPGAs Through Multimodal Representation Learning 利用多模态表示学习增强fpga的HLS性能预测
IF 1.7 4区 计算机科学
IEEE Embedded Systems Letters Pub Date : 2024-12-05 DOI: 10.1109/LES.2024.3446797
Longshan Shang;Teng Wang;Lei Gong;Chao Wang;Xuehai Zhou
{"title":"Enhancing HLS Performance Prediction on FPGAs Through Multimodal Representation Learning","authors":"Longshan Shang;Teng Wang;Lei Gong;Chao Wang;Xuehai Zhou","doi":"10.1109/LES.2024.3446797","DOIUrl":"https://doi.org/10.1109/LES.2024.3446797","url":null,"abstract":"The emergence of design space exploration (DSE) technology has reduced the cost of searching for pragma configurations that lead to optimal performance microarchitecture. However, obtaining synthesis reports for a single design candidate can be time-consuming, sometimes taking several hours or even tens of hours, rendering this process prohibitively expensive. Researchers have proposed many solutions to address this issue. Previous studies have focused on extracting features from a single modality, leading to challenges in comprehensively evaluating the quality of designs. To overcome this limitation, this letter introduces a novel modal-aware representation learning method for the evaluation of high-level synthesis (HLS) design, named MORPH, which integrates information from three data modalities to characterize HLS designs, including code, graph, and code description (caption) modality. Remarkably, our model outperforms the baseline, demonstrating a 6%–25% improvement in root mean squared error loss. Moreover, the transferability of our predictor has also been notably enhanced.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"16 4","pages":"385-388"},"PeriodicalIF":1.7,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142789151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FPonAP: Implementation of Floating Point Operations on Associative Processors 在关联处理器上实现浮点运算
IF 1.7 4区 计算机科学
IEEE Embedded Systems Letters Pub Date : 2024-12-05 DOI: 10.1109/LES.2024.3446912
Walaa Amer;Mariam Rakka;Fadi Kurdahi
{"title":"FPonAP: Implementation of Floating Point Operations on Associative Processors","authors":"Walaa Amer;Mariam Rakka;Fadi Kurdahi","doi":"10.1109/LES.2024.3446912","DOIUrl":"https://doi.org/10.1109/LES.2024.3446912","url":null,"abstract":"The associative processor (AP) is a processing in-memory (PIM) platform that avoids data movement between the memory and the processor by running computations directly in the memory. It is a parallel architecture based on content addressable memory (CAM), allowing it to address data by its content and thus accelerating search and pattern recognition tasks. APs are suggested as a promising solution to the memory wall caused by the data movement bottleneck in traditional Von-Neumann architectures for data-driven applications, such as machine learning. However, modern implementations of the AP still lack support for floating point (FP) operations that are heavily used in the target applications. In this letter, we present a novel implementation of FP operations on the AP and evaluate its performance on the levels of latency and energy, showing that the proposed solution outperforms parallel FP execution on central processing unit and even GPU for large vector sizes.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"16 4","pages":"389-392"},"PeriodicalIF":1.7,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142789081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Novel Toolset for Efficient Hardwired Micro-Op Translation in Embedded Microarchitectures 嵌入式微架构中高效硬连线微操作转换的新工具集
IF 1.7 4区 计算机科学
IEEE Embedded Systems Letters Pub Date : 2024-12-05 DOI: 10.1109/LES.2024.3447695
Kevin J. Phillipson;Michael G. Rywalt;Baibhab Chatterjee;Eric M. Schwartz;Greg Stitt
{"title":"Novel Toolset for Efficient Hardwired Micro-Op Translation in Embedded Microarchitectures","authors":"Kevin J. Phillipson;Michael G. Rywalt;Baibhab Chatterjee;Eric M. Schwartz;Greg Stitt","doi":"10.1109/LES.2024.3447695","DOIUrl":"https://doi.org/10.1109/LES.2024.3447695","url":null,"abstract":"Modern SoCs require increasingly complex embedded control deep within their numerous sub-blocks without adding significant die area. This motivated the creation of \u0000<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>\u0000RTL, a novel toolset for systematically designing efficient pipelined implementations of embedded instruction sets originally intended for multicycle execution. \u0000<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>\u0000RTL utilizes hardwired micro-op translation, a technique commonly used in the instruction decoders of large super-scalar microprocessors, however this technique has been overlooked for designing smaller, more efficient embedded microprocessors. Furthermore, the tools to develop instruction decoders with micro-op translation are proprietary and the techniques are trade secrets. The \u0000<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>\u0000RTL toolset is open-source and this letter clearly presents the methodology. The methodology emphasizes direct opcode decoding from multiple synthesized Verilog blocks versus traditional microprogramming which uses sequential decoding from a ROM. Our results show that a pipelined \u0000<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>\u0000RTL microarchitecture achieves a 21.8% reduction in size compared to a hardwired multicycle implementation of the same instruction set. Additionally, the performance of 0.75 DMIPS/MHz surpasses the RISC-V PicoRV32 by 44.2% and the AVR RISC by 82.9%. These improvements in performance, power, and area are of interest to embedded system architects.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"16 4","pages":"373-376"},"PeriodicalIF":1.7,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142789082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信