Low-latency remote-offloading system for accelerator

IF 1.8 4区 计算机科学 Q3 TELECOMMUNICATIONS
Shogo Saito, Kei Fujimoto, Akinori Shiraga
{"title":"Low-latency remote-offloading system for accelerator","authors":"Shogo Saito,&nbsp;Kei Fujimoto,&nbsp;Akinori Shiraga","doi":"10.1007/s12243-023-00994-3","DOIUrl":null,"url":null,"abstract":"<div><p>Specific workloads are increasingly offloaded to accelerators such as a graphic processing unit (GPU) and field-programmable gate array (FPGA) for real-time processing and computing efficiency. Because accelerators are expensive and consume much power, it is desirable to increase the efficiency of accelerator utilization by sharing accelerators among multiple servers over a network. However, task offloading over a network has the problem of latency due to network processing overhead in remote offloading. This paper proposes a low-latency system for accelerator offloading over a network. To reduce the overhead of remote offloading, we propose a system composed of (1) fast recombination processing of chunked data with a simple protocol to reduce the number of memory copies, (2) polling-based packet receiving check to reduce overhead due to interrupts in interaction with a network interface card, and (3) a run-to-completion model in network processing and accelerator offloading to reduce overhead with context switching. We show that the system can improve performance by 66.40% compared with a simple implementation using kernel protocol stack and confirmed the performance improvement with a virtual radio access network use case as a low-latency application. Furthermore, we show that this performance can also be achieved in practical usage in data center networks.</p></div>","PeriodicalId":50761,"journal":{"name":"Annals of Telecommunications","volume":"79 3-4","pages":"179 - 196"},"PeriodicalIF":1.8000,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s12243-023-00994-3.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Telecommunications","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s12243-023-00994-3","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"TELECOMMUNICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Specific workloads are increasingly offloaded to accelerators such as a graphic processing unit (GPU) and field-programmable gate array (FPGA) for real-time processing and computing efficiency. Because accelerators are expensive and consume much power, it is desirable to increase the efficiency of accelerator utilization by sharing accelerators among multiple servers over a network. However, task offloading over a network has the problem of latency due to network processing overhead in remote offloading. This paper proposes a low-latency system for accelerator offloading over a network. To reduce the overhead of remote offloading, we propose a system composed of (1) fast recombination processing of chunked data with a simple protocol to reduce the number of memory copies, (2) polling-based packet receiving check to reduce overhead due to interrupts in interaction with a network interface card, and (3) a run-to-completion model in network processing and accelerator offloading to reduce overhead with context switching. We show that the system can improve performance by 66.40% compared with a simple implementation using kernel protocol stack and confirmed the performance improvement with a virtual radio access network use case as a low-latency application. Furthermore, we show that this performance can also be achieved in practical usage in data center networks.

Abstract Image

用于加速器的低延迟远程卸载系统
为了实时处理和提高计算效率,越来越多的特定工作负载被卸载到图形处理器(GPU)和现场可编程门阵列(FPGA)等加速器上。由于加速器价格昂贵且耗电量大,通过网络在多台服务器之间共享加速器来提高加速器的利用效率是可取的。然而,通过网络卸载任务存在延迟问题,这是由于远程卸载时的网络处理开销造成的。本文提出了一种低延迟的网络加速器卸载系统。为了减少远程卸载的开销,我们提出了一个由以下部分组成的系统:(1) 采用简单的协议对分块数据进行快速重组处理,以减少内存副本的数量;(2) 基于轮询的数据包接收检查,以减少与网络接口卡交互时中断造成的开销;(3) 在网络处理和加速器卸载中采用运行到完成模型,以减少上下文切换造成的开销。我们的研究表明,与使用内核协议栈的简单实现相比,该系统的性能提高了 66.40%,并通过虚拟无线接入网络作为低延迟应用程序的使用案例证实了性能的提高。此外,我们还展示了在数据中心网络的实际应用中也能实现这一性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Annals of Telecommunications
Annals of Telecommunications 工程技术-电信学
CiteScore
5.20
自引率
5.30%
发文量
37
审稿时长
4.5 months
期刊介绍: Annals of Telecommunications is an international journal publishing original peer-reviewed papers in the field of telecommunications. It covers all the essential branches of modern telecommunications, ranging from digital communications to communication networks and the internet, to software, protocols and services, uses and economics. This large spectrum of topics accounts for the rapid convergence through telecommunications of the underlying technologies in computers, communications, content management towards the emergence of the information and knowledge society. As a consequence, the Journal provides a medium for exchanging research results and technological achievements accomplished by the European and international scientific community from academia and industry.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信