Upscale Layer Acceleration on Existing AI Hardware

IF 3.1 2区工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-04-23 DOI:10.1109/TVLSI.2025.3558946

Vuk Vranjkovic;Predrag Teodorovic;Rastislav Struharik

{"title":"Upscale Layer Acceleration on Existing AI Hardware","authors":"Vuk Vranjkovic;Predrag Teodorovic;Rastislav Struharik","doi":"10.1109/TVLSI.2025.3558946","DOIUrl":null,"url":null,"abstract":"Upscaling layers are important components of modern deep learning networks but often pose computational challenges for hardware (HW) accelerators. This article addresses this issue by introducing a novel layer-replacement technique to efficiently process upscaling layers using existing hardware-supported operations like depthwise convolutions and maximum pooling. To minimize the number of replacement layers, we propose an efficient layer number reduction algorithm. Experimental results on four deep neural networks demonstrate a significant speedup ranging from <inline-formula> <tex-math>$1.58\\times $ </tex-math></inline-formula> to <inline-formula> <tex-math>$32.88\\times $ </tex-math></inline-formula> compared to the original HW/software (SW) execution approach, and from <inline-formula> <tex-math>$3.65\\times $ </tex-math></inline-formula> to <inline-formula> <tex-math>$19.21\\times $ </tex-math></inline-formula> compared to the software-only solution, with minimal hardware overhead (0.068% more field-programmable gate array (FPGA) look-up tables (LUTs)). Notably, our technique introduces no numerical errors and maintains comparable input data processing quality to the original network.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 6","pages":"1624-1637"},"PeriodicalIF":3.1000,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10974749/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Upscaling layers are important components of modern deep learning networks but often pose computational challenges for hardware (HW) accelerators. This article addresses this issue by introducing a novel layer-replacement technique to efficiently process upscaling layers using existing hardware-supported operations like depthwise convolutions and maximum pooling. To minimize the number of replacement layers, we propose an efficient layer number reduction algorithm. Experimental results on four deep neural networks demonstrate a significant speedup ranging from

$1.58\times $

$32.88\times $

compared to the original HW/software (SW) execution approach, and from

$3.65\times $

$19.21\times $

compared to the software-only solution, with minimal hardware overhead (0.068% more field-programmable gate array (FPGA) look-up tables (LUTs)). Notably, our technique introduces no numerical errors and maintains comparable input data processing quality to the original network.

查看原文本刊更多论文

现有AI硬件的高级图层加速

升级层是现代深度学习网络的重要组成部分，但通常会给硬件（HW）加速器带来计算挑战。本文通过引入一种新颖的层替换技术来解决这个问题，该技术使用现有硬件支持的操作（如深度卷积和最大池化）有效地处理升级层。为了最小化替换层的数量，我们提出了一种有效的层数缩减算法。在四个深度神经网络上的实验结果表明，与原始的硬件/软件（SW）执行方法相比，显著的加速范围从1.58倍到32.88倍不等，与纯软件解决方案相比，从3.65倍到19.21倍不等，硬件开销最小（现场可编程门阵列（FPGA）查找表（lut）增加0.068%）。值得注意的是，我们的技术没有引入数值误差，并保持与原始网络相当的输入数据处理质量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Very Large Scale Integration (VLSI) Systems 工程技术-工程：电子与电气

CiteScore

6.40

自引率

7.10%

发文量

187

审稿时长

3.6 months

期刊介绍： The IEEE Transactions on VLSI Systems is published as a monthly journal under the co-sponsorship of the IEEE Circuits and Systems Society, the IEEE Computer Society, and the IEEE Solid-State Circuits Society. Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels. To address this critical area through a common forum, the IEEE Transactions on VLSI Systems have been founded. The editorial board, consisting of international experts, invites original papers which emphasize and merit the novel systems integration aspects of microelectronic systems including interactions among systems design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and systems level qualification. Thus, the coverage of these Transactions will focus on VLSI/ULSI microelectronic systems integration.