Low-latency remote-offloading system for accelerator

IF 1.8 4区计算机科学 Q3 TELECOMMUNICATIONS

Annals of Telecommunications Pub Date : 2023-11-03 DOI:10.1007/s12243-023-00994-3

Shogo Saito, Kei Fujimoto, Akinori Shiraga

{"title":"Low-latency remote-offloading system for accelerator","authors":"Shogo Saito, Kei Fujimoto, Akinori Shiraga","doi":"10.1007/s12243-023-00994-3","DOIUrl":null,"url":null,"abstract":"<div><p>Specific workloads are increasingly offloaded to accelerators such as a graphic processing unit (GPU) and field-programmable gate array (FPGA) for real-time processing and computing efficiency. Because accelerators are expensive and consume much power, it is desirable to increase the efficiency of accelerator utilization by sharing accelerators among multiple servers over a network. However, task offloading over a network has the problem of latency due to network processing overhead in remote offloading. This paper proposes a low-latency system for accelerator offloading over a network. To reduce the overhead of remote offloading, we propose a system composed of (1) fast recombination processing of chunked data with a simple protocol to reduce the number of memory copies, (2) polling-based packet receiving check to reduce overhead due to interrupts in interaction with a network interface card, and (3) a run-to-completion model in network processing and accelerator offloading to reduce overhead with context switching. We show that the system can improve performance by 66.40% compared with a simple implementation using kernel protocol stack and confirmed the performance improvement with a virtual radio access network use case as a low-latency application. Furthermore, we show that this performance can also be achieved in practical usage in data center networks.</p></div>","PeriodicalId":50761,"journal":{"name":"Annals of Telecommunications","volume":"79 3-4","pages":"179 - 196"},"PeriodicalIF":1.8000,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s12243-023-00994-3.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Telecommunications","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s12243-023-00994-3","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"TELECOMMUNICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Specific workloads are increasingly offloaded to accelerators such as a graphic processing unit (GPU) and field-programmable gate array (FPGA) for real-time processing and computing efficiency. Because accelerators are expensive and consume much power, it is desirable to increase the efficiency of accelerator utilization by sharing accelerators among multiple servers over a network. However, task offloading over a network has the problem of latency due to network processing overhead in remote offloading. This paper proposes a low-latency system for accelerator offloading over a network. To reduce the overhead of remote offloading, we propose a system composed of (1) fast recombination processing of chunked data with a simple protocol to reduce the number of memory copies, (2) polling-based packet receiving check to reduce overhead due to interrupts in interaction with a network interface card, and (3) a run-to-completion model in network processing and accelerator offloading to reduce overhead with context switching. We show that the system can improve performance by 66.40% compared with a simple implementation using kernel protocol stack and confirmed the performance improvement with a virtual radio access network use case as a low-latency application. Furthermore, we show that this performance can also be achieved in practical usage in data center networks.

Abstract Image

查看原文本刊更多论文

用于加速器的低延迟远程卸载系统

为了实时处理和提高计算效率，越来越多的特定工作负载被卸载到图形处理器（GPU）和现场可编程门阵列（FPGA）等加速器上。由于加速器价格昂贵且耗电量大，通过网络在多台服务器之间共享加速器来提高加速器的利用效率是可取的。然而，通过网络卸载任务存在延迟问题，这是由于远程卸载时的网络处理开销造成的。本文提出了一种低延迟的网络加速器卸载系统。为了减少远程卸载的开销，我们提出了一个由以下部分组成的系统：(1) 采用简单的协议对分块数据进行快速重组处理，以减少内存副本的数量；(2) 基于轮询的数据包接收检查，以减少与网络接口卡交互时中断造成的开销；(3) 在网络处理和加速器卸载中采用运行到完成模型，以减少上下文切换造成的开销。我们的研究表明，与使用内核协议栈的简单实现相比，该系统的性能提高了 66.40%，并通过虚拟无线接入网络作为低延迟应用程序的使用案例证实了性能的提高。此外，我们还展示了在数据中心网络的实际应用中也能实现这一性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Annals of Telecommunications 工程技术-电信学

CiteScore

5.20

自引率

5.30%

发文量

审稿时长

4.5 months

期刊介绍： Annals of Telecommunications is an international journal publishing original peer-reviewed papers in the field of telecommunications. It covers all the essential branches of modern telecommunications, ranging from digital communications to communication networks and the internet, to software, protocols and services, uses and economics. This large spectrum of topics accounts for the rapid convergence through telecommunications of the underlying technologies in computers, communications, content management towards the emergence of the information and knowledge society. As a consequence, the Journal provides a medium for exchanging research results and technological achievements accomplished by the European and international scientific community from academia and industry.