Microprocessors and Microsystems最新文献_第2页

Hardware and software design of APEnetX: A custom high-speed interconnect for scientific computing 科学计算专用高速互连APEnetX的软硬件设计

IF 2.6 4区计算机科学

Microprocessors and Microsystems Pub Date : 2026-02-01 Epub Date: 2025-11-21 DOI: 10.1016/j.micpro.2025.105224

Roberto Ammendola , Andrea Biagioni , Carlotta Chiarini , Paolo Cretaro , Ottorino Frezza , Francesca Lo Cicero , Alessandro Lonardo , Michele Martinelli , Pier Stanislao Paolucci , Elena Pastorelli , Pierpaolo Perticaroli , Luca Pontisso , Cristian Rossi , Francesco Simula , Piero Vicini

{"title":"Hardware and software design of APEnetX: A custom high-speed interconnect for scientific computing","authors":"Roberto Ammendola , Andrea Biagioni , Carlotta Chiarini , Paolo Cretaro , Ottorino Frezza , Francesca Lo Cicero , Alessandro Lonardo , Michele Martinelli , Pier Stanislao Paolucci , Elena Pastorelli , Pierpaolo Perticaroli , Luca Pontisso , Cristian Rossi , Francesco Simula , Piero Vicini","doi":"10.1016/j.micpro.2025.105224","DOIUrl":"10.1016/j.micpro.2025.105224","url":null,"abstract":"<div><div>High speed interconnects are critical to provide robust and highly efficient services to every user in a cluster. Several commercial offerings – many of which now firmly established in the market – have arisen throughout the years, spanning the very many possible tradeoffs between cost, reconfigurability, performance, resiliency and support for a variety of processing architectures. On the other hand, custom interconnects may represent an appealing solution for applications requiring cost-effectiveness, customizability and flexibility.</div><div>In this regard, the APEnet project was started in 2003, focusing on the design of PCIe FPGA-based custom Network Interface Cards (NIC) for cluster interconnects with a 3D torus topology. In this work, we highlight the main features of APEnetX, the latest version of the APEnet NIC. Designed on the Xilinx Alveo U200 card, it implements Remote Direct Memory Access (RDMA) transactions using both Xilinx Ultrascale+ IPs and custom hardware and software components to ensure efficient data transfer without the involvement of the host operating system. The software stack lets the user interface with the NIC directly via a low level driver or through a plug-in for the OpenMPI stack, aligning our NIC to the application layer standards in the HPC community. The APEnetX architecture integrates a Quality-of-Service (QoS) scheme implementation, in order to enforce some level of performance during network congestion events. Finally, APEnetX is accompanied by an Omnet++ based simulator which enables probing the performance of the network when its size is pushed to numbers of nodes otherwise unattainable for cost and/or practicality reasons.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"120 ","pages":"Article 105224"},"PeriodicalIF":2.6,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145571201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Automatic linux malware detection using binary inspection and runtime opcode tracing 自动linux恶意软件检测使用二进制检查和运行时操作码跟踪

IF 2.6 4区计算机科学

Microprocessors and Microsystems Pub Date : 2026-02-01 Epub Date: 2025-12-12 DOI: 10.1016/j.micpro.2025.105237

Martí Alonso , Andreu Gironés , Juan-José Costa , Enric Morancho , Stefano Di Carlo , Ramon Canal

{"title":"Automatic linux malware detection using binary inspection and runtime opcode tracing","authors":"Martí Alonso , Andreu Gironés , Juan-José Costa , Enric Morancho , Stefano Di Carlo , Ramon Canal","doi":"10.1016/j.micpro.2025.105237","DOIUrl":"10.1016/j.micpro.2025.105237","url":null,"abstract":"<div><div>The fast-paced evolution of cyberattacks to digital infrastructures requires new protection mechanisms to counterattack them. Malware attacks, a type of cyberattacks ranging from viruses and worms to ransomware and spyware, have been traditionally detected using signature-based methods. But with new versions of malware, this approach is not good enough, and new machine learning tools look promising. In this paper we present two methods to detect Linux malware using machine learning models: (1) a dynamic approach, that tracks the application executed instructions (opcodes) while they are being executed; and (2) a static approach, that inspects the binary application files before execution. We evaluate (1) five machine learning models (Support Vector Machine, k-Nearest Neighbor, Naive Bayes, Decision Tree and Random Forest) and (2) a deep neural network using a Long Short-Term Memory architecture with word embedding. We show the methodology, the initial dataset preparation, the infrastructure used to obtain the traces of executed instructions, and the evaluation of the results for the different models used. The obtained results show that the dynamic approach with a Random Forest classifier gets a 90% accuracy or higher, while the static approach obtains a 98% accuracy.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"120 ","pages":"Article 105237"},"PeriodicalIF":2.6,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SHAX: Evaluation of SVM hardware accelerator for detecting and preventing ROP on Xtensa Xtensa上用于检测和预防ROP的SVM硬件加速器的评估

IF 2.6 4区计算机科学

Microprocessors and Microsystems Pub Date : 2026-02-01 Epub Date: 2025-12-04 DOI: 10.1016/j.micpro.2025.105236

Adebayo Omotosho , Sirine Ilahi , Ernesto Cristopher Villegas Castillo , Christian Hammer , Hans-Martin Bluethgen

{"title":"SHAX: Evaluation of SVM hardware accelerator for detecting and preventing ROP on Xtensa","authors":"Adebayo Omotosho , Sirine Ilahi , Ernesto Cristopher Villegas Castillo , Christian Hammer , Hans-Martin Bluethgen","doi":"10.1016/j.micpro.2025.105236","DOIUrl":"10.1016/j.micpro.2025.105236","url":null,"abstract":"<div><div><em>Return-oriented programming</em> (ROP) chains together sequences of instructions residing in executable pages of the memory to compromise a program’s control flow. On <em>embedded systems</em>, ROP detection is intricate as such devices lack the resources to directly run sophisticated software-based detection techniques, as these are memory and CPU-intensive.</div><div>However, a <em>Field Programmable Gate Array</em> (FPGA) can enhance the capabilities of an embedded device to handle resource-intensive tasks. Hence, this paper presents the first performance evaluation of a Support Vector Machine (SVM) hardware accelerator for automatic ROP classification on Xtensa-embedded devices using hardware performance counters (HPCs).</div><div>In addition to meeting security requirements, modern cyber–physical systems must exhibit high reliability against hardware failures to ensure correct functionality. To assess the reliability level of our proposed SVM architecture, we perform simulation-based fault injection at the RT-level. To improve the efficiency of this evaluation, we utilize a hybrid virtual prototype that integrates the RT-level model of the SVM accelerator with the Tensilica LX7 Instruction Set Simulator. This setup enables early-stage reliability assessment, helping to identify vulnerabilities and reduce the need for extensive fault injection campaigns during later stages of the design process.</div><div>Our evaluation results show that an SVM accelerator targeting an FPGA device can detect and prevent ROP attacks on an embedded processor with high accuracy in real time. In addition, we explore the most vulnerable locations of our SVM design to permanent faults, enabling the exploration of safety mechanisms that increase fault coverage in future works.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"120 ","pages":"Article 105236"},"PeriodicalIF":2.6,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145685400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cardiac arrhythmia classification system: An optimized HLS-based hardware implementation on PYNQ platform 心律失常分类系统：PYNQ平台上基于hls的优化硬件实现

IF 2.6 4区计算机科学

Microprocessors and Microsystems Pub Date : 2026-02-01 Epub Date: 2025-11-17 DOI: 10.1016/j.micpro.2025.105225

Soumyashree Mangaraj, Kamalakanta Mahapatra, Samit Ari

{"title":"Cardiac arrhythmia classification system: An optimized HLS-based hardware implementation on PYNQ platform","authors":"Soumyashree Mangaraj, Kamalakanta Mahapatra, Samit Ari","doi":"10.1016/j.micpro.2025.105225","DOIUrl":"10.1016/j.micpro.2025.105225","url":null,"abstract":"<div><div>Electrocardiogram (ECG) study to diagnose cardiac abnormalities is a popular non-invasive technique. Architecture relying on deep learning (DL), and its hardware deployment on edge is crucial for effective diagnosis in smart health care applications. This inference on resource limited FPGA platform poses a significant challenge with intense mathematical computations of DL architectures. Existing FPGA implemented convolutional neural network (CNN) architectures typically adopt sequential deep convolutional stacking, which demands recurrent use of memory to retrieve data, and ultimately degrading throughput and adding latency. A hardware efficient tri-branch CNN architecture is introduced for arrhythmia classification, which leverages FPGA’s intrinsic parallel architecture and minimizes overhead of data management. The proposed CNN’s hardware architecture is implemented in a high-level synthesis (HLS) framework through three key optimizations: (i) pool-conv-graded-quantized (PCGQ) module, (ii) in-pool merged function module, and (iii) skip-zero connection. These enhancements improve layer level precision, reduce quantization error, lower latency, and optimize FPGA resource utilization. Implemented on a PYNQ-Z2 FPGA, the design utilizes 27.79% LUTs, 12.24% FFs, 50.45% DSPs, 34.29% BRAM, and delivers 347 GOPS throughput at 45 ms latency, validated in Vivado 2022.2. The proposed system is assessed using the MIT-BIH Arrhythmia Dataset in accordance with AAMI EC57 standards, and attained a classification accuracy of 97.98% across five types of ECG beats, highlighting its suitability for portable healthcare applications.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"120 ","pages":"Article 105225"},"PeriodicalIF":2.6,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145555216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ALFA: Design of an accuracy-configurable and low-latency fault-tolerant adder ALFA：一种精度可配置、低延迟容错加法器的设计

IF 2.6 4区计算机科学

Microprocessors and Microsystems Pub Date : 2026-02-01 Epub Date: 2025-11-19 DOI: 10.1016/j.micpro.2025.105226

Ioannis Tsounis, Dimitris Agiakatsikas, Mihalis Psarakis

{"title":"ALFA: Design of an accuracy-configurable and low-latency fault-tolerant adder","authors":"Ioannis Tsounis, Dimitris Agiakatsikas, Mihalis Psarakis","doi":"10.1016/j.micpro.2025.105226","DOIUrl":"10.1016/j.micpro.2025.105226","url":null,"abstract":"<div><div>Low-Latency Approximate Adders (LLAAs) are high-performance adder models that perform either approximate addition with configurable accuracy-loss or accurate addition by integrating proper circuitry to detect and correct the expected approximation error. Due to their block-based structure, these adder models offer lower latency at the expense of configurable accuracy loss and area overhead. However, hardware accelerators employing such adders are susceptible to hardware (HW) faults, which can cause extra errors (i.e., HW errors) in addition to the expected approximation errors during their operation. In this work, we propose a novel Accuracy Configurable Low-latency and Fault-tolerant Adder, namely ALFA, that offers 100% fault coverage taking into consideration the required accuracy level. Our approach takes advantage of the resemblance between the HW errors and the approximation errors to build a scheme based on selective Triple Modular Redundancy (TMR), which can detect and correct all errors that violate the accuracy threshold. The proposed ALFA model for approximate operation achieves significant performance gains with minimum area overhead compared to the state-of-the-art Reduced Precision Redundancy (RPR) Ripple Carry Adders (RCA) with the same level of fault-tolerance. Furthermore, the accurate ALFA model outperforms the RCA with classical TMR in terms of performance.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"120 ","pages":"Article 105226"},"PeriodicalIF":2.6,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145618617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A CGRA frontend for bandwidth utilization in HiPReP 用于HiPReP中带宽利用的CGRA前端

IF 2.6 4区计算机科学

Microprocessors and Microsystems Pub Date : 2025-12-01 Epub Date: 2025-11-08 DOI: 10.1016/j.micpro.2025.105220

Philipp Käsgen , Markus Weinhardt , Christian Hochberger

{"title":"A CGRA frontend for bandwidth utilization in HiPReP","authors":"Philipp Käsgen , Markus Weinhardt , Christian Hochberger","doi":"10.1016/j.micpro.2025.105220","DOIUrl":"10.1016/j.micpro.2025.105220","url":null,"abstract":"<div><div>When dealing with multiple data consumers and producers in a highly parallel accelerator architecture the challenge arises how to coordinate the requests to memory. An example of such an accelerator is a coarse-grained reconfigurable array (CGRA). CGRAs consist of multiple processing elements (PEs) which can consume and produce data. On the one hand, the resulting load and store requests to the memory need to be orchestrated such that the CGRA does not deadlock when connected to a cache hierarchy responding to memory requests out-of-request-order. On the other hand, multiple consumers and producers open up the possibility to make better use of the available memory bandwidth such that the cache is busy constantly. We call the unit to address these challenges and opportunities <em>frontend</em> (FE).</div><div>We propose a synthesizable FE for the HiPReP CGRA which enables the integration with a RISC-V based host system. Based on an example application, we showcase a methodology to match the number of consumers and producers (i.e. PEs) with the memory hierarchy such that the CGRA can efficiently harness the available L1 data cache bandwidth, reaching 99.6% of the theoretical peak bandwidth in a synthetic benchmark, and enabling a speedup of up to 21.9x over an out-of-order processor for dense matrix-matrix-multiplications. Moreover, we explore the FE design, the impact of the different numbers of PEs, memory access patterns, synthesis results, and compare the accelerator runtime with the runtime on the host itself as baseline.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"119 ","pages":"Article 105220"},"PeriodicalIF":2.6,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Machine learning for predicting digital block layout feasibility in Analog-On-Top designs 基于机器学习的模拟顶层设计中数字块布局可行性预测

IF 2.6 4区计算机科学

Microprocessors and Microsystems Pub Date : 2025-12-01 Epub Date: 2025-11-04 DOI: 10.1016/j.micpro.2025.105221

Francesco Daghero , Gabriele Faraone , Eugenio Serianni , Nicola Di Carolo , Giovanna Antonella Franchino , Michelangelo Grosso , Daniele Jahier Pagliari

{"title":"Machine learning for predicting digital block layout feasibility in Analog-On-Top designs","authors":"Francesco Daghero , Gabriele Faraone , Eugenio Serianni , Nicola Di Carolo , Giovanna Antonella Franchino , Michelangelo Grosso , Daniele Jahier Pagliari","doi":"10.1016/j.micpro.2025.105221","DOIUrl":"10.1016/j.micpro.2025.105221","url":null,"abstract":"<div><div>The Analog-On-Top (AoT) Mixed-Signal (AMS) design flow is a time-consuming process, heavily reliant on expert knowledge and manual iteration. A critical step involves reserving top-level layout regions for digital blocks, which typically requires several back-and-forth exchanges between analog and digital teams due to the complex interplay of design constraints that affect the digital area requirements. Existing automated approaches often fail to generalize, as they are benchmarked on overly simplistic designs that lack real-world complexity. In this work, we frame the area adequacy check as a binary classification task and propose a Machine Learning (ML) solution to predict whether the reserved area for a digital block is sufficient. We conduct an extensive evaluation across multiple ML models on a dataset of production-level designs, achieving up to 94.38% F1 score with a Random Forest. Finally, we apply ensemble techniques to improve performance further, reaching 95.35% F1 with a majority-vote ensemble.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"119 ","pages":"Article 105221"},"PeriodicalIF":2.6,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145467953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Key components for unified 3D wireless communication networks 统一三维无线通信网络的关键部件

IF 2.6 4区计算机科学

Microprocessors and Microsystems Pub Date : 2025-12-01 Epub Date: 2025-09-15 DOI: 10.1016/j.micpro.2025.105204

Marko Andjelkovic , Nebojsa Maletic , Nicola Miglioranza , Milos Krstic , Enrico Koeck , Jan Buchholz , Maike Taddiken , Markus Fehrenz , Shaden Baradie , Dirk Wübben , Markus Breitbach

{"title":"Key components for unified 3D wireless communication networks","authors":"Marko Andjelkovic , Nebojsa Maletic , Nicola Miglioranza , Milos Krstic , Enrico Koeck , Jan Buchholz , Maike Taddiken , Markus Fehrenz , Shaden Baradie , Dirk Wübben , Markus Breitbach","doi":"10.1016/j.micpro.2025.105204","DOIUrl":"10.1016/j.micpro.2025.105204","url":null,"abstract":"<div><div>The integration of conventional terrestrial wireless communication networks and non-terrestrial networks (NTNs) is the main prerequisite for achieving global connectivity in the next generation (6G) wireless communications. Such integrated communication networks are usually referred to as the unified 3D networks. These networks need to meet the requirements for 6G communications in terms of higher data rates, as well as enhanced reliability, security and network reconfigurability. To achieve these goals, new technologies and components have to be developed. This work introduces the German project 6G-TakeOff, aimed at the development of innovative solutions for unified 3D networks. The project consortium brings together leading academic and industrial partners, covering the entire value chain from design of electronics to applications. In this work, the focus is on the development of key hardware components to support the wireless communication in 3D unified networks. The design concept for each component and the planned demonstrators are presented.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"119 ","pages":"Article 105204"},"PeriodicalIF":2.6,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145419087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ecoNIC: SmartNIC-assisted power management for networking workloads in Linux servers ecoNIC：用于Linux服务器中网络工作负载的smartnic辅助电源管理

IF 2.6 4区计算机科学

Microprocessors and Microsystems Pub Date : 2025-12-01 Epub Date: 2025-10-14 DOI: 10.1016/j.micpro.2025.105209

Marco Liess, Franz Biersack, Lars Nolte, Thomas Wild, Andreas Herkersdorf

{"title":"ecoNIC: SmartNIC-assisted power management for networking workloads in Linux servers","authors":"Marco Liess, Franz Biersack, Lars Nolte, Thomas Wild, Andreas Herkersdorf","doi":"10.1016/j.micpro.2025.105209","DOIUrl":"10.1016/j.micpro.2025.105209","url":null,"abstract":"<div><div>Improving the sustainability and energy efficiency of compute resources in next-generation networks is crucial to cope with the ever-growing computing demand while maintaining manageable energy consumption in the processing nodes of the network infrastructure. Simultaneously, critical connected applications, such as autonomous driving, require a high level of service quality in terms of available throughput and achievable latencies. This demands considerable responsiveness from the compute resources and renders power management a challenging task. Existing solutions are not sufficiently adapted to the requirements and characteristics of such applications, making them either responsive but not very efficient, or efficient but unsuitable to provide the required service quality for critical tasks.</div><div>We propose ecoNIC, a concept for energy-efficient network processing that combines an RSS-based hardware load balancer for SmartNICs with an adaptive Dynamic Voltage and Frequency Scaling (DVFS) governor. ecoNIC efficiently pins flow priorities to CPU core clusters, reducing the workload of selected cores in the process, and dynamically adjusts their clock speed to exploit freed-up capacities and save energy. We implement ecoNIC as an FPGA-prototype and integrate the DVFS governor into the Linux kernel. The experimental evaluation shows that significant energy savings can be achieved, while the employed priority-pinning ensures low tail latencies for critical traffic. Without sacrificing an increase in high-priority tail latencies, energy savings of 62% are possible. Further relaxation of the latency constraints allows for energy savings of up to 88%.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"119 ","pages":"Article 105209"},"PeriodicalIF":2.6,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145365058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FORTALESA: Fault-tolerant reconfigurable systolic array for DNN inference FORTALESA: DNN推理的容错可重构收缩阵列

IF 2.6 4区计算机科学

Microprocessors and Microsystems Pub Date : 2025-12-01 Epub Date: 2025-10-29 DOI: 10.1016/j.micpro.2025.105222

Natalia Cherezova , Artur Jutman , Maksim Jenihhin

{"title":"FORTALESA: Fault-tolerant reconfigurable systolic array for DNN inference","authors":"Natalia Cherezova , Artur Jutman , Maksim Jenihhin","doi":"10.1016/j.micpro.2025.105222","DOIUrl":"10.1016/j.micpro.2025.105222","url":null,"abstract":"<div><div>The emergence of Deep Neural Networks (DNNs) in mission- and safety-critical applications brings their reliability to the front. High performance demands of DNNs require the use of specialized hardware accelerators. Systolic array architecture is widely used in DNN accelerators due to its parallelism and regular structure. This work presents a run-time reconfigurable systolic array architecture with three execution modes and four implementation options. All four implementations are evaluated in terms of resource utilization, throughput, and fault tolerance improvement. The proposed architecture is used for reliability enhancement of DNN inference on systolic array through heterogeneous mapping of different network layers to different execution modes. The approach is supported by a novel reliability assessment method based on fault propagation analysis. It is used for the exploration of the appropriate execution mode-layer mapping for DNN inference. The proposed architecture efficiently protects registers and MAC units of systolic array PEs from transient and permanent faults. The reconfigurability feature enables a speedup of up to <span><math><mrow><mn>3</mn><mo>×</mo></mrow></math></span>, depending on layer vulnerability. Furthermore, it requires <span><math><mrow><mn>6</mn><mo>×</mo></mrow></math></span> fewer resources compared to static redundancy and <span><math><mrow><mn>2</mn><mo>.</mo><mn>5</mn><mo>×</mo></mrow></math></span> fewer resources compared to the previously proposed solution for transient faults.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"119 ","pages":"Article 105222"},"PeriodicalIF":2.6,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145467952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0