IEEE Embedded Systems Letters最新文献_第3页

Heterogeneous Accelerator Design for Multi-DNN Workloads via Heuristic Optimization 基于启发式优化的多dnn工作负载异构加速器设计

IF 1.7 4区计算机科学

IEEE Embedded Systems Letters Pub Date : 2024-12-05 DOI: 10.1109/LES.2024.3443628

Konstantinos Balaskas;Heba Khdr;Mohammed Bakr Sikal;Fabian Kreß;Kostas Siozios;Jürgen Becker;Jörg Henkel

{"title":"Heterogeneous Accelerator Design for Multi-DNN Workloads via Heuristic Optimization","authors":"Konstantinos Balaskas;Heba Khdr;Mohammed Bakr Sikal;Fabian Kreß;Kostas Siozios;Jürgen Becker;Jörg Henkel","doi":"10.1109/LES.2024.3443628","DOIUrl":"https://doi.org/10.1109/LES.2024.3443628","url":null,"abstract":"The significant advancements of deep neural networks (DNNs) in a wide range of application domains have spawned the need for more specialized, sophisticated solutions in the form of multi-DNN workloads. Heterogeneous DNN accelerators have emerged as an elegant solution to tackle the workloads’ inherent diversity, achieving significant improvements compared to homogeneous solutions. However, utilizing off-the-shelf architectures provides suboptimal adaptability to given workloads, whereas custom design approaches offer limited heterogeneity, and thus reduced gains. In this letter, we combat these shortcomings and propose an exploration-based framework to holistically design heterogeneous accelerators, tailored for multi-DNN workloads. Our framework is workload-agnostic and leverages architectural heterogeneity to its full potential, by integrating low-precision arithmetic and custom structural parameters. We explore the formed design space, targeting to minimize the system’s energy-delay product (EDP) via heuristic techniques. Our proposed accelerators achieve, on average, a significant \u0000<inline-formula> <tex-math>$5.5times $ </tex-math></inline-formula>\u0000 reduction in EDP compared to the state of the art across various multi-DNN workloads.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"16 4","pages":"317-320"},"PeriodicalIF":1.7,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142789084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Toward Precision-Aware Safe Neural-Controlled Cyber–Physical Systems 迈向精确感知安全神经控制的网络物理系统

IF 1.7 4区计算机科学

IEEE Embedded Systems Letters Pub Date : 2024-12-05 DOI: 10.1109/LES.2024.3444004

Harikishan Thevendhriya;Sumana Ghosh;Debasmita Lohar

引用次数: 0

Methodology for Formal Verification of Hardware Safety Strategies Using SMT 使用SMT的硬件安全策略的形式化验证方法

IF 1.7 4区计算机科学

IEEE Embedded Systems Letters Pub Date : 2024-12-05 DOI: 10.1109/LES.2024.3439859

Anthony Faure-Gignoux;Kevin Delmas;Adrien Gauffriau;Claire Pagetti

引用次数: 0

Co-Designing Perception-Based Autonomous Systems on CPU-GPU Platforms CPU-GPU平台上基于感知的自治系统协同设计

IF 1.7 4区计算机科学

IEEE Embedded Systems Letters Pub Date : 2024-12-05 DOI: 10.1109/LES.2024.3443135

Suraj Singh;Ashiqur Rahaman Molla;Arijit Mondal;Soumyajit Dey

引用次数: 0

Reducing ADC Front-End Costs During Training of On-Sensor Printed Multilayer Perceptrons 在传感器印刷多层感知器训练过程中降低ADC前端成本

IF 1.7 4区计算机科学

IEEE Embedded Systems Letters Pub Date : 2024-12-05 DOI: 10.1109/LES.2024.3447412

Florentia Afentaki;Paula Carolina Lozano Duarte;Georgios Zervakis;Mehdi B. Tahoori

{"title":"Reducing ADC Front-End Costs During Training of On-Sensor Printed Multilayer Perceptrons","authors":"Florentia Afentaki;Paula Carolina Lozano Duarte;Georgios Zervakis;Mehdi B. Tahoori","doi":"10.1109/LES.2024.3447412","DOIUrl":"https://doi.org/10.1109/LES.2024.3447412","url":null,"abstract":"Printed electronics (PEs) technology offers a cost-effective and fully-customizable solution to computational needs beyond the capabilities of traditional silicon technologies, offering advantages, such as on-demand manufacturing and conformal, low-cost hardware. However, the low-resolution fabrication of PEs, which results in large feature sizes, poses a challenge for integrating complex designs like those of machine learning (ML) classification systems. Current literature optimizes only the multilayer perceptron (MLP) circuit within the classification system, while the cost of analog-to-digital converters (ADCs) is overlooked. Printed applications frequently require on-sensor processing, yet while the digital classifier has been extensively optimized, the analog-to-digital interfacing, specifically the ADCs, dominates the total area and energy consumption. In this letter, we target digital printed MLP classifiers and we propose the design of customized ADCs per MLP’s input which involves minimizing the distinct represented numbers for each input, simplifying thus the ADC’s circuitry. Incorporating this ADC optimization in the MLP training, enables eliminating ADC levels and the respective comparators, while still maintaining high classification accuracy. Our approach achieves \u0000<inline-formula> <tex-math>$11.2times $ </tex-math></inline-formula>\u0000 lower ADC area for less than 5% accuracy drop across varying MLPs.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"16 4","pages":"353-356"},"PeriodicalIF":1.7,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142789080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Characterizing CNN Throughput and Energy Under Multithreaded and Multiaccelerator Execution 在多线程和多加速器执行下表征CNN吞吐量和能量

IF 1.7 4区计算机科学

IEEE Embedded Systems Letters Pub Date : 2024-12-05 DOI: 10.1109/LES.2024.3446896

M A Muneeb;Rajesh Kedia

引用次数: 0

MONO: Enhancing Bit-Flip Resilience With Bit Homogeneity for Neural Networks MONO：利用位同质性增强神经网络的位翻转弹性

IF 1.7 4区计算机科学

IEEE Embedded Systems Letters Pub Date : 2024-12-05 DOI: 10.1109/LES.2024.3444921

Maryam Eslami;Yuhao Liu;Salim Ullah;Mostafa E. Salehi;Reshad Hosseini;Seyed Ahmad Mirsalari;Akash Kumar

引用次数: 0

Enhancing HLS Performance Prediction on FPGAs Through Multimodal Representation Learning 利用多模态表示学习增强fpga的HLS性能预测

IF 1.7 4区计算机科学

IEEE Embedded Systems Letters Pub Date : 2024-12-05 DOI: 10.1109/LES.2024.3446797

Longshan Shang;Teng Wang;Lei Gong;Chao Wang;Xuehai Zhou

引用次数: 0

FPonAP: Implementation of Floating Point Operations on Associative Processors 在关联处理器上实现浮点运算

IF 1.7 4区计算机科学

IEEE Embedded Systems Letters Pub Date : 2024-12-05 DOI: 10.1109/LES.2024.3446912

Walaa Amer;Mariam Rakka;Fadi Kurdahi

引用次数: 0

Novel Toolset for Efficient Hardwired Micro-Op Translation in Embedded Microarchitectures 嵌入式微架构中高效硬连线微操作转换的新工具集

IF 1.7 4区计算机科学

IEEE Embedded Systems Letters Pub Date : 2024-12-05 DOI: 10.1109/LES.2024.3447695

Kevin J. Phillipson;Michael G. Rywalt;Baibhab Chatterjee;Eric M. Schwartz;Greg Stitt

{"title":"Novel Toolset for Efficient Hardwired Micro-Op Translation in Embedded Microarchitectures","authors":"Kevin J. Phillipson;Michael G. Rywalt;Baibhab Chatterjee;Eric M. Schwartz;Greg Stitt","doi":"10.1109/LES.2024.3447695","DOIUrl":"https://doi.org/10.1109/LES.2024.3447695","url":null,"abstract":"Modern SoCs require increasingly complex embedded control deep within their numerous sub-blocks without adding significant die area. This motivated the creation of \u0000<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>\u0000RTL, a novel toolset for systematically designing efficient pipelined implementations of embedded instruction sets originally intended for multicycle execution. \u0000<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>\u0000RTL utilizes hardwired micro-op translation, a technique commonly used in the instruction decoders of large super-scalar microprocessors, however this technique has been overlooked for designing smaller, more efficient embedded microprocessors. Furthermore, the tools to develop instruction decoders with micro-op translation are proprietary and the techniques are trade secrets. The \u0000<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>\u0000RTL toolset is open-source and this letter clearly presents the methodology. The methodology emphasizes direct opcode decoding from multiple synthesized Verilog blocks versus traditional microprogramming which uses sequential decoding from a ROM. Our results show that a pipelined \u0000<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>\u0000RTL microarchitecture achieves a 21.8% reduction in size compared to a hardwired multicycle implementation of the same instruction set. Additionally, the performance of 0.75 DMIPS/MHz surpasses the RISC-V PicoRV32 by 44.2% and the AVR RISC by 82.9%. These improvements in performance, power, and area are of interest to embedded system architects.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"16 4","pages":"373-376"},"PeriodicalIF":1.7,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142789082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0