USEFUSE: Uniform stride for enhanced performance in fused layer architecture of deep neural networks

IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Muhammad Sohail Ibrahim , Muhammad Usman , Jeong-A Lee
{"title":"USEFUSE: Uniform stride for enhanced performance in fused layer architecture of deep neural networks","authors":"Muhammad Sohail Ibrahim ,&nbsp;Muhammad Usman ,&nbsp;Jeong-A Lee","doi":"10.1016/j.sysarc.2025.103459","DOIUrl":null,"url":null,"abstract":"<div><div>Convolutional Neural Networks (CNNs) are crucial in various applications, but deploying them on resource-constrained edge devices poses challenges. This study presents the Sum-of-Products (SOP) units for convolution, which utilize low-latency left-to-right bit-serial arithmetic to minimize response time and enhance overall performance. The study proposes a methodology for fusing multiple convolution layers to reduce off-chip memory communication and increase the overall performance. An effective mechanism detects and skips inefficient convolutions after ReLU layers, minimizing power consumption without compromising accuracy. Additionally, efficient tile movement guarantees uniform access to the fusion pyramid. An analysis demonstrates the uniform stride strategy improves operational intensity. Two designs cater to varied demands: one focuses on minimal response time for mission-critical applications, and another focuses on resource-constrained devices with comparable latency. This approach notably reduced redundant computations, improving the efficiency of CNN deployment on edge devices.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"166 ","pages":"Article 103459"},"PeriodicalIF":4.1000,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems Architecture","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1383762125001316","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

Convolutional Neural Networks (CNNs) are crucial in various applications, but deploying them on resource-constrained edge devices poses challenges. This study presents the Sum-of-Products (SOP) units for convolution, which utilize low-latency left-to-right bit-serial arithmetic to minimize response time and enhance overall performance. The study proposes a methodology for fusing multiple convolution layers to reduce off-chip memory communication and increase the overall performance. An effective mechanism detects and skips inefficient convolutions after ReLU layers, minimizing power consumption without compromising accuracy. Additionally, efficient tile movement guarantees uniform access to the fusion pyramid. An analysis demonstrates the uniform stride strategy improves operational intensity. Two designs cater to varied demands: one focuses on minimal response time for mission-critical applications, and another focuses on resource-constrained devices with comparable latency. This approach notably reduced redundant computations, improving the efficiency of CNN deployment on edge devices.
USEFUSE:提高深度神经网络融合层架构性能的统一跨步
卷积神经网络(cnn)在各种应用中都是至关重要的,但在资源受限的边缘设备上部署它们存在挑战。本研究提出了卷积的乘积和(SOP)单元,它利用低延迟的从左到右的位串行算法来最小化响应时间并提高整体性能。该研究提出了一种融合多个卷积层的方法,以减少片外存储器通信并提高整体性能。一种有效的机制检测并跳过ReLU层后的低效卷积,在不影响精度的情况下最大限度地降低功耗。此外,高效的瓷砖移动保证了统一的访问融合金字塔。分析表明,统一步幅策略提高了操作强度。有两种设计可以满足不同的需求:一种侧重于任务关键型应用程序的最小响应时间,另一种侧重于具有相当延迟的资源受限设备。这种方法显著减少了冗余计算,提高了CNN在边缘设备上的部署效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Systems Architecture
Journal of Systems Architecture 工程技术-计算机:硬件
CiteScore
8.70
自引率
15.60%
发文量
226
审稿时长
46 days
期刊介绍: The Journal of Systems Architecture: Embedded Software Design (JSA) is a journal covering all design and architectural aspects related to embedded systems and software. It ranges from the microarchitecture level via the system software level up to the application-specific architecture level. Aspects such as real-time systems, operating systems, FPGA programming, programming languages, communications (limited to analysis and the software stack), mobile systems, parallel and distributed architectures as well as additional subjects in the computer and system architecture area will fall within the scope of this journal. Technology will not be a main focus, but its use and relevance to particular designs will be. Case studies are welcome but must contribute more than just a design for a particular piece of software. Design automation of such systems including methodologies, techniques and tools for their design as well as novel designs of software components fall within the scope of this journal. Novel applications that use embedded systems are also central in this journal. While hardware is not a part of this journal hardware/software co-design methods that consider interplay between software and hardware components with and emphasis on software are also relevant here.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信