Microprocessors and Microsystems最新文献

筛选
英文 中文
Formal verification for I²C communication Protocol in aerospace and aviation industries 航空航天工业中I²C通信协议的正式验证
IF 2.6 4区 计算机科学
Microprocessors and Microsystems Pub Date : 2026-03-01 Epub Date: 2026-02-20 DOI: 10.1016/j.micpro.2026.105252
Merve Berik , Yahya Baykal
{"title":"Formal verification for I²C communication Protocol in aerospace and aviation industries","authors":"Merve Berik ,&nbsp;Yahya Baykal","doi":"10.1016/j.micpro.2026.105252","DOIUrl":"10.1016/j.micpro.2026.105252","url":null,"abstract":"<div><div>The aerospace industry comprises many safety-critical applications that involve a vast number of interacting subsystems. Reliable data communication between devices and components is therefore essential. In this context, Inter-Integrated Circuit (I²C) communication protocol is widely preferred due to its simplicity, flexibility, low power consumption, and reliability. However, issues such as data corruption, data loss, and increased latency may still occur and can lead to serious consequences in aviation, including safety risks, electronic malfunctions, air traffic management problems, and incorrect navigation information. To avoid such failures, the I²C Register-Transfer Level (RTL) design must be both correctly implemented and rigorously verified. There are several verification methods for digital design verification. Among several digital design verification approaches, Formal Verification (FV) is one of the most precise and reliable methods for safety- critical systems, as it provides mathematical proofs of conformance to specified properties. In this work, an open-source, Yosys-based formal verification flow is applied to an open-source I²C master design using the SymbiYosys framework. The verification environment is developed in SystemVerilog with SystemVerilog Assertions, enabling the detection of design errors directly against the protocol requirements. By combining bounded model checking, cover analysis, and theorem-proving, the proposed flow systematically verifies all five finite-state-machine (FSM) states and nine transitions of the I²C master. The results demonstrate that formal verification can systematically ensure robust and fault-tolerant I²C operation for avionics applications.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"121 ","pages":"Article 105252"},"PeriodicalIF":2.6,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147397235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Applying hypervisor-based fault tolerance techniques to safety-critical embedded systems 将基于管理程序的容错技术应用于安全关键型嵌入式系统
IF 2.6 4区 计算机科学
Microprocessors and Microsystems Pub Date : 2026-03-01 Epub Date: 2026-02-14 DOI: 10.1016/j.micpro.2026.105255
Santiago Lozano , Javier Fernandez , Jesus Carretero
{"title":"Applying hypervisor-based fault tolerance techniques to safety-critical embedded systems","authors":"Santiago Lozano ,&nbsp;Javier Fernandez ,&nbsp;Jesus Carretero","doi":"10.1016/j.micpro.2026.105255","DOIUrl":"10.1016/j.micpro.2026.105255","url":null,"abstract":"<div><div>The main objective of this work is the design and implementation of a space use case for applying the Hypervisor-Based Fault Tolerance (HBFT) mechanisms to redundant software applications in independent virtual machines, isolated from each other, and to research the effect of the HBFT mechanism on system safety and reliability. To test the developed fault tolerance mechanism, we decided to use a real use case of space systems: the ESA Near InfraRed (NIR) HAWAII 2-RG Data Processing Algorithms benchmarking software. After testing with an exhaustive fault injection campaign, the evaluation results show that our HBFT for critical real-time embedded systems is able to detect and cover all failures detected for critical real-time tasks, recovering failed virtual machines or containers from degradation to become fully operational again.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"121 ","pages":"Article 105255"},"PeriodicalIF":2.6,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147397234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An adaptive ant colony optimization-based obstacle-avoidance routing algorithm for Network-on-Chip 基于自适应蚁群优化的片上网络避障路由算法
IF 2.6 4区 计算机科学
Microprocessors and Microsystems Pub Date : 2026-03-01 Epub Date: 2026-02-21 DOI: 10.1016/j.micpro.2026.105256
Cuiping Shao , Chao Heng , Zujia Miao , Yihan Chen , Huiyun Li , Zhimin Tang
{"title":"An adaptive ant colony optimization-based obstacle-avoidance routing algorithm for Network-on-Chip","authors":"Cuiping Shao ,&nbsp;Chao Heng ,&nbsp;Zujia Miao ,&nbsp;Yihan Chen ,&nbsp;Huiyun Li ,&nbsp;Zhimin Tang","doi":"10.1016/j.micpro.2026.105256","DOIUrl":"10.1016/j.micpro.2026.105256","url":null,"abstract":"<div><div>Network-on-Chip (NoC) fault-tolerant routing presents substantial challenges in achieving an optimal balance among reliability, adaptability, and resource efficiency. Conventional approaches, such as dimension-ordered XY routing, lack dynamic fault-avoidance mechanisms, frequently resulting in congestion and packet loss upon encountering faulty nodes or links. Although bio-inspired algorithms, including Ant Colony Optimization (ACO), demonstrate potential for adaptive routing, current implementations inadequately integrate real-time fault awareness with congestion control while maintaining acceptable hardware overhead. To address these limitations, this paper introduces the Ant Colony Optimization-Fault-Aware (ACO-FA) routing mechanism, which incorporates dynamic path flexibility adaptation alongside buffer-state-aware congestion mitigation. The proposed approach employs a quantitative path flexibility model that dynamically modifies shortest paths through Manhattan distance corrections and fault-location awareness. Additionally, the Path Buffer Occupancy (PBO) metric quantifies multi-hop congestion risk, while a fault penalty factor (<span><math><mi>β</mi></math></span>) optimizes probabilistic path selection. Experimental evaluations indicate that ACO-FA surpasses conventional XY routing across multiple performance dimensions. Under various fault scenarios including single-node, dual-node, multi-node, and link failures, the proposed mechanism achieves improvements of up to 3.0% in Received/Ideal Flits Ratio, up to 30% in throughput at saturation, and up to 33% reduction in average latency.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"121 ","pages":"Article 105256"},"PeriodicalIF":2.6,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147397413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A digital beamforming receiver architecture implemented on a FPGA for space applications 一种在FPGA上实现的用于空间应用的数字波束成形接收机架构
IF 2.6 4区 计算机科学
Microprocessors and Microsystems Pub Date : 2026-03-01 Epub Date: 2025-12-30 DOI: 10.1016/j.micpro.2025.105243
Eduardo Ortega , Agustín Martínez , Antonio Oliva , Fernando Sanz , Óscar Rodríguez , Manuel Prieto , Pablo Parra , Antonio da Silva , Sebastián Sánchez
{"title":"A digital beamforming receiver architecture implemented on a FPGA for space applications","authors":"Eduardo Ortega ,&nbsp;Agustín Martínez ,&nbsp;Antonio Oliva ,&nbsp;Fernando Sanz ,&nbsp;Óscar Rodríguez ,&nbsp;Manuel Prieto ,&nbsp;Pablo Parra ,&nbsp;Antonio da Silva ,&nbsp;Sebastián Sánchez","doi":"10.1016/j.micpro.2025.105243","DOIUrl":"10.1016/j.micpro.2025.105243","url":null,"abstract":"<div><div>The burgeoning interest within the space community in digital beamforming is largely attributable to the superior flexibility that satellites with active antenna systems offer for a wide range of applications, notably in communication services. This paper delves into the analysis and practical implementation of a Digital Beamforming and Digital Down Conversion (DDC) chain, leveraging a high-speed Analog-to-Digital Converter (ADC) certified for space applications alongside a high-performance Field-Programmable Gate Array (FPGA). The proposed design strategy focuses on optimizing resource efficiency and minimizing power consumption by strategically sequencing the beamformer processor ahead of the complex down-conversion operation. This innovative approach entails the application of demodulation and low-pass filtering exclusively to the aggregated beam channel, culminating in a marked reduction in the requisite digital signal processing resources relative to traditional, more resource-intensive digital beamforming and DDC architectures. In the experimental validation, an evaluation board integrating a high-speed ADC and a FPGA was utilized. This setup facilitated the empirical validation of the design’s efficacy by applying various RF input signals to the digital beamforming receiver system. The ADC employed is capable of high-resolution signal processing, while the FPGA provides the necessary computational flexibility and speed for real-time digital signal processing tasks. The findings underscore the potential of this design to significantly enhance the efficiency and performance of digital beamforming systems in space applications.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"121 ","pages":"Article 105243"},"PeriodicalIF":2.6,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrating XtratuM and hardware accelerators in a model-based engineering workflow: The METASAT approach 在基于模型的工程工作流程中集成XtratuM和硬件加速器:METASAT方法
IF 2.6 4区 计算机科学
Microprocessors and Microsystems Pub Date : 2026-03-01 Epub Date: 2026-02-16 DOI: 10.1016/j.micpro.2026.105253
Alejandro J. Calderón , Aitor Amonarriz , Mar Hernández , Leonidas Kosmidis , Jannis Wolf , Marc Solé Bonet , Matina M. Trompouki , Mikel Segura , Peio Onaindia
{"title":"Integrating XtratuM and hardware accelerators in a model-based engineering workflow: The METASAT approach","authors":"Alejandro J. Calderón ,&nbsp;Aitor Amonarriz ,&nbsp;Mar Hernández ,&nbsp;Leonidas Kosmidis ,&nbsp;Jannis Wolf ,&nbsp;Marc Solé Bonet ,&nbsp;Matina M. Trompouki ,&nbsp;Mikel Segura ,&nbsp;Peio Onaindia","doi":"10.1016/j.micpro.2026.105253","DOIUrl":"10.1016/j.micpro.2026.105253","url":null,"abstract":"<div><div>The increasing complexity of satellite systems, driven by the adoption of Industry 4.0 technologies and strict ECSS standards, demands innovative design methodologies. The METASAT project introduces a model-based engineering workflow that integrates open-architecture hardware with advanced software virtualisation layers, such as the XtratuM hypervisor, to address these challenges. By leveraging a specialised toolchain that combines TASTE with MathWorks tools, METASAT enables efficient code generation for hardware accelerators, including the SPARROW AI accelerator and the Vortex GPU. This paper provides an overview of the project, detailing its design approach, toolchain integration, and contributions towards enhancing satellite on-board software engineering. Through these innovations, METASAT demonstrates how advanced modelling and automated code generation can reduce development costs and timelines, improve system performance, and ensure the competitiveness and dependability of future satellite missions.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"121 ","pages":"Article 105253"},"PeriodicalIF":2.6,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147397412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Edge computing System-on-Chip architecture for a Non-Intrusive Load Monitoring sensor in ambient intelligence applications 环境智能应用中非侵入式负载监测传感器的边缘计算片上系统架构
IF 2.6 4区 计算机科学
Microprocessors and Microsystems Pub Date : 2026-03-01 Epub Date: 2026-01-21 DOI: 10.1016/j.micpro.2026.105250
Rubén Nieto , Laura de Diego-Otón , Miguel Tapiador , Víctor M. Navarro , Santiago Murano , Álvaro Hernández , Jesús Ureña
{"title":"Edge computing System-on-Chip architecture for a Non-Intrusive Load Monitoring sensor in ambient intelligence applications","authors":"Rubén Nieto ,&nbsp;Laura de Diego-Otón ,&nbsp;Miguel Tapiador ,&nbsp;Víctor M. Navarro ,&nbsp;Santiago Murano ,&nbsp;Álvaro Hernández ,&nbsp;Jesús Ureña","doi":"10.1016/j.micpro.2026.105250","DOIUrl":"10.1016/j.micpro.2026.105250","url":null,"abstract":"<div><div>Non-Intrusive Load Monitoring (NILM) systems allow the disaggregation of the individual consumption of different appliances from aggregate electrical measurements, for applications such as improving energy efficiency at home. In other contexts, NILM techniques are also useful to promote independent living for elderly, as they enable the inference and monitoring of their behavior through the analysis of their energy consumption and the identification of the appliances’ usage patterns. To achieve this, aggregated voltage and current signals are collected at the entrance of the house using a NILM sensor system. This analysis often involves sending the collected data to the cloud for further processing, which can result in significant bandwidth usage, especially when a high sampling rate approach is employed. In this work, a System-on-Chip (SoC) architecture based on a FPGA (Field-Programmable Gate Array) device is proposed for NILM processing, fully performed on edge computing. This architecture is focused on Ambient Intelligence for Independent Living (AIIL) of elderly. Voltage and current data are acquired at 4 kSPS (kilo Samples Per Second), where on/off switchings (events) of appliances are detected, thus delimiting a window of 4096 samples around both signals. These windows are processed by a Convolutional Neural Network (CNN) that implements the load identification. Unlike prior works that primarily focus on algorithmic enhancements, this study introduces a complete hardware/software design of a FPGA-based SoC architecture and its real-time validation. The proposed architecture achieves an inference latency of <span><math><mrow><mn>56</mn><mspace></mspace><mi>ms</mi></mrow></math></span> and a classification accuracy of 84.7% for fourteen classes (ON/OFF states of seven appliances), while reducing bandwidth usage by transmitting only the final identification instead of raw signals. These results demonstrate the feasibility of real-time implementations of NILM applications at the edge with competitive performance.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"121 ","pages":"Article 105250"},"PeriodicalIF":2.6,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146038351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-precision positioning and timing method of GNSS receiver for mobile communication networks 移动通信网络GNSS接收机高精度定位授时方法
IF 2.6 4区 计算机科学
Microprocessors and Microsystems Pub Date : 2026-03-01 Epub Date: 2025-12-31 DOI: 10.1016/j.micpro.2025.105242
Haodong Zhao, Junna Shang
{"title":"High-precision positioning and timing method of GNSS receiver for mobile communication networks","authors":"Haodong Zhao,&nbsp;Junna Shang","doi":"10.1016/j.micpro.2025.105242","DOIUrl":"10.1016/j.micpro.2025.105242","url":null,"abstract":"<div><div>Currently, high-precision GNSS receivers are expensive and the cost of using them in mobile communication networks is extremely high. To reduce the construction cost of positioning and timing capabilities in mobile communication networks, the existing ordinary GNSS receivers in the network are used to form a self-differential enhanced iterative network to achieve high-precision positioning in local areas.Based on high-precision positioning, various delay errors in the current 1PPS second pulse are corrected by differential information data to solve the precise time of the local clock, thereby improving timing accuracy. In engineering applications, the self-differential enhanced iterative network algorithm is used to make embedded improvements to the antenna parameter sensor commonly used in mobile communication networks. The improved antenna parameter sensor has obtained high-precision positioning and timing functions based on the original attitude and direction measurement functions. Its positioning accuracy can reach millimeter level, and the timing accuracy can reach 20 nanoseconds.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"121 ","pages":"Article 105242"},"PeriodicalIF":2.6,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scalable hardware designs for median filters based on separable sorting networks 基于可分离排序网络的中值滤波器的可扩展硬件设计
IF 2.6 4区 计算机科学
Microprocessors and Microsystems Pub Date : 2026-03-01 Epub Date: 2025-12-25 DOI: 10.1016/j.micpro.2025.105241
Cameron Vogeli, Daniel Llamocca
{"title":"Scalable hardware designs for median filters based on separable sorting networks","authors":"Cameron Vogeli,&nbsp;Daniel Llamocca","doi":"10.1016/j.micpro.2025.105241","DOIUrl":"10.1016/j.micpro.2025.105241","url":null,"abstract":"<div><div>We present scalable and generalized hardware designs for <strong><em>k</em></strong> <strong><em>×</em></strong> <strong><em>k</em></strong> median filters based on separability of sorting networks, where we can process 4 pixels at a time. The fully customized (performance, bit-width) hardware architectures allow for design space exploration to establish trade-offs among processing time and resource usage. Results are presented in terms of resources, processing cycles, and throughput. We present true scalable architectures: our approach features a linear increase (becoming even less pronounced) in hardware resources and processing time as <span><math><mi>k</mi></math></span> grows. As far as we are aware, there are no competing works (that use separability) for <span><math><mrow><mi>k</mi><mo>&gt;</mo><mn>5</mn></mrow></math></span>. The proposed architectures, validated on modern FPGAs for <span><math><mrow><mi>k</mi><mo>=</mo><mn>3</mn><mo>,</mo><mn>5</mn><mo>,</mo><mn>7</mn><mo>,</mo><mn>9</mn><mo>,</mo><mn>11</mn></mrow></math></span>, are expected to be used as building blocks on a variety of image processing applications.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"121 ","pages":"Article 105241"},"PeriodicalIF":2.6,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145869493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ViT-LoRA: Optimized vision transformer for efficient edge computing in medical imaging ViT-LoRA:用于医学成像中高效边缘计算的优化视觉转换器
IF 2.6 4区 计算机科学
Microprocessors and Microsystems Pub Date : 2026-03-01 Epub Date: 2026-01-24 DOI: 10.1016/j.micpro.2026.105251
Premalatha R , Jayanthi K B , Rajasekaran C , Sureshkumar R
{"title":"ViT-LoRA: Optimized vision transformer for efficient edge computing in medical imaging","authors":"Premalatha R ,&nbsp;Jayanthi K B ,&nbsp;Rajasekaran C ,&nbsp;Sureshkumar R","doi":"10.1016/j.micpro.2026.105251","DOIUrl":"10.1016/j.micpro.2026.105251","url":null,"abstract":"<div><div>Vision Transformer (ViT) models have demonstrated excellent performance in medical image processing. Their deployment in resource-constrained situations is limited by their high computational complexity and memory requirements. Although parameter-efficient tuning of ViT models is made possible by Low-Rank Adaptation (LoRA), its use in real-time clinical datasets and edge-device deployment is yet mainly unexplored. Using a real-time lung infection dataset, this research assesses ViT-LoRA's effectiveness in real-world medical imaging scenarios and investigates its generalisation potential on a public COVID-19 CT dataset. Four ViT fine-tuning procedures are thoroughly compared: LoRA-based tuning (ViT-LoRA), adapter-based tuning (ViT-APT), partial fine-tuning (ViT-PFT), and full fine-tuning (ViT-FFT). ViT-LoRA attains a testing accuracy of 98.50 % with only 2.104 million trainable parameters, resulting in a significantly reduced memory of 24.08 MB. The optimized ViT-LoRA Model has been deployed to a NVIDIA Jetson Nano and evaluated against the 30 test images. This evaluation of the ViT-LoRA Model resulted in an average of 3.44 seconds per test image for real-time edge-based medical imaging applications.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"121 ","pages":"Article 105251"},"PeriodicalIF":2.6,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146078755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A runtime-adaptive transformer neural network accelerator on FPGAs 基于fpga的运行自适应变压器神经网络加速器
IF 2.6 4区 计算机科学
Microprocessors and Microsystems Pub Date : 2026-02-01 Epub Date: 2025-11-17 DOI: 10.1016/j.micpro.2025.105223
Ehsan Kabir , Jason D. Bakos , David Andrews , Miaoqing Huang
{"title":"A runtime-adaptive transformer neural network accelerator on FPGAs","authors":"Ehsan Kabir ,&nbsp;Jason D. Bakos ,&nbsp;David Andrews ,&nbsp;Miaoqing Huang","doi":"10.1016/j.micpro.2025.105223","DOIUrl":"10.1016/j.micpro.2025.105223","url":null,"abstract":"<div><div>Transformer neural networks (TNN) excel in natural language processing (NLP), machine translation, and computer vision (CV) without relying on recurrent or convolutional layers. However, they have high computational and memory demands, particularly on resource constrained devices like FPGAs. Moreover, transformer models vary in processing time across applications, requiring custom models with specific parameters. Designing custom accelerators for each model is complex and time-intensive. Some custom accelerators exist with no runtime adaptability, and they often rely on sparse matrices to reduce latency. However, hardware designs become more challenging due to the need for application-specific sparsity patterns. This paper introduces ADAPTOR, a runtime-adaptive accelerator for dense matrix computations in transformer encoders and decoders on FPGAs. ADAPTOR enhances the utilization of processing elements and on-chip memory, enhancing parallelism and reducing latency. It incorporates efficient matrix tiling to distribute resources across FPGA platforms and is fully quantized for computational efficiency and portability. Evaluations on Xilinx Alveo U55C data center cards and embedded platforms like VC707 and ZCU102 show that our design is 1.2<span><math><mo>×</mo></math></span> and 2.87<span><math><mo>×</mo></math></span> more power efficient than the NVIDIA K80 GPU and the i7-8700K CPU respectively. Additionally, it achieves a speedup of 1.7 to 2.25<span><math><mo>×</mo></math></span> compared to some state-of-the-art FPGA-based accelerators.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"120 ","pages":"Article 105223"},"PeriodicalIF":2.6,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145571199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书