Low-latency remote-offloading system for accelerator

IF 1.8 4区 计算机科学 Q3 TELECOMMUNICATIONS
Shogo Saito, Kei Fujimoto, Akinori Shiraga
{"title":"Low-latency remote-offloading system for accelerator","authors":"Shogo Saito, Kei Fujimoto, Akinori Shiraga","doi":"10.1007/s12243-023-00994-3","DOIUrl":null,"url":null,"abstract":"Abstract Specific workloads are increasingly offloaded to accelerators such as a graphic processing unit (GPU) and field-programmable gate array (FPGA) for real-time processing and computing efficiency. Because accelerators are expensive and consume much power, it is desirable to increase the efficiency of accelerator utilization by sharing accelerators among multiple servers over a network. However, task offloading over a network has the problem of latency due to network processing overhead in remote offloading. This paper proposes a low-latency system for accelerator offloading over a network. To reduce the overhead of remote offloading, we propose a system composed of (1) fast recombination processing of chunked data with a simple protocol to reduce the number of memory copies, (2) polling-based packet receiving check to reduce overhead due to interrupts in interaction with a network interface card, and (3) a run-to-completion model in network processing and accelerator offloading to reduce overhead with context switching. We show that the system can improve performance by 66.40% compared with a simple implementation using kernel protocol stack and confirmed the performance improvement with a virtual radio access network use case as a low-latency application. Furthermore, we show that this performance can also be achieved in practical usage in data center networks.","PeriodicalId":50761,"journal":{"name":"Annals of Telecommunications","volume":null,"pages":null},"PeriodicalIF":1.8000,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Telecommunications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s12243-023-00994-3","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"TELECOMMUNICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Abstract Specific workloads are increasingly offloaded to accelerators such as a graphic processing unit (GPU) and field-programmable gate array (FPGA) for real-time processing and computing efficiency. Because accelerators are expensive and consume much power, it is desirable to increase the efficiency of accelerator utilization by sharing accelerators among multiple servers over a network. However, task offloading over a network has the problem of latency due to network processing overhead in remote offloading. This paper proposes a low-latency system for accelerator offloading over a network. To reduce the overhead of remote offloading, we propose a system composed of (1) fast recombination processing of chunked data with a simple protocol to reduce the number of memory copies, (2) polling-based packet receiving check to reduce overhead due to interrupts in interaction with a network interface card, and (3) a run-to-completion model in network processing and accelerator offloading to reduce overhead with context switching. We show that the system can improve performance by 66.40% compared with a simple implementation using kernel protocol stack and confirmed the performance improvement with a virtual radio access network use case as a low-latency application. Furthermore, we show that this performance can also be achieved in practical usage in data center networks.

Abstract Image

加速器低延迟远程卸载系统
为了提高实时处理和计算效率,越来越多的特定工作负载被转移到图形处理单元(GPU)和现场可编程门阵列(FPGA)等加速器上。由于加速器价格昂贵且消耗大量功率,因此希望通过在网络上的多个服务器之间共享加速器来提高加速器的利用效率。然而,由于远程卸载的网络处理开销,通过网络进行任务卸载存在延迟问题。本文提出了一种低延迟的网络加速器卸载系统。为了减少远程卸载的开销,我们提出了一个系统,该系统包括:(1)使用简单协议对数据块进行快速重组处理,以减少内存副本的数量;(2)基于轮询的数据包接收检查,以减少与网络接口卡交互时中断造成的开销;(3)网络处理和加速器卸载中的运行到完成模型,以减少上下文切换带来的开销。结果表明,与使用内核协议栈的简单实现相比,该系统可以提高66.40%的性能,并通过虚拟无线接入网用例作为低延迟应用验证了性能的提高。此外,我们还证明在数据中心网络的实际使用中也可以实现这种性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Annals of Telecommunications
Annals of Telecommunications 工程技术-电信学
CiteScore
5.20
自引率
5.30%
发文量
37
审稿时长
4.5 months
期刊介绍: Annals of Telecommunications is an international journal publishing original peer-reviewed papers in the field of telecommunications. It covers all the essential branches of modern telecommunications, ranging from digital communications to communication networks and the internet, to software, protocols and services, uses and economics. This large spectrum of topics accounts for the rapid convergence through telecommunications of the underlying technologies in computers, communications, content management towards the emergence of the information and knowledge society. As a consequence, the Journal provides a medium for exchanging research results and technological achievements accomplished by the European and international scientific community from academia and industry.
文献相关原料
公司名称 产品信息 采购帮参考价格
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信