Parallelized Software Offloading of Low-Level Communication with User-Level Threads

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Pub Date : 2018-01-28 DOI:10.1145/3149457.3149475

Wataru Endo, K. Taura

{"title":"Parallelized Software Offloading of Low-Level Communication with User-Level Threads","authors":"Wataru Endo, K. Taura","doi":"10.1145/3149457.3149475","DOIUrl":null,"url":null,"abstract":"Although recent HPC interconnects are assumed to achieve low latency and high bandwidth communication, in practical terms, their performance is often bounded by the network software stacks rather than the underlying hardware because message processing requires a certain amount of computation in CPUs. To exploit the hardware capacity, some existing communication libraries provide an interface for parallelizing accesses to network endpoints with manual hints. However, with growing core counts per node in modern clusters, it is increasingly difficult for users to efficiently handle communication resources in multi-threading environments. We implemented a low-level communication library that can automatically schedule communication requests by offloading them to multiple dedicated threads via lockless circular buffers. To enhance the efficiency of offloading, we developed a novel technique to dynamically change the number of offloading threads using a user-level thread library. We demonstrate that our offloading architecture exhibits better performance characteristics in microbenchmark results than the existing approaches.","PeriodicalId":314778,"journal":{"name":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region","volume":"2015 29","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3149457.3149475","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Although recent HPC interconnects are assumed to achieve low latency and high bandwidth communication, in practical terms, their performance is often bounded by the network software stacks rather than the underlying hardware because message processing requires a certain amount of computation in CPUs. To exploit the hardware capacity, some existing communication libraries provide an interface for parallelizing accesses to network endpoints with manual hints. However, with growing core counts per node in modern clusters, it is increasingly difficult for users to efficiently handle communication resources in multi-threading environments. We implemented a low-level communication library that can automatically schedule communication requests by offloading them to multiple dedicated threads via lockless circular buffers. To enhance the efficiency of offloading, we developed a novel technique to dynamically change the number of offloading threads using a user-level thread library. We demonstrate that our offloading architecture exhibits better performance characteristics in microbenchmark results than the existing approaches.

查看原文本刊更多论文

与用户级线程的低级通信的并行软件卸载

尽管最近的HPC互连被假定为实现低延迟和高带宽通信，但实际上，它们的性能通常受到网络软件堆栈而不是底层硬件的限制，因为消息处理需要cpu中的一定数量的计算。为了利用硬件容量，一些现有的通信库提供了一个接口，通过手动提示来并行访问网络端点。然而，随着现代集群中每个节点的核心数不断增加，用户在多线程环境中高效地处理通信资源变得越来越困难。我们实现了一个低级通信库，它可以通过无锁循环缓冲区将通信请求卸载到多个专用线程来自动调度通信请求。为了提高卸载的效率，我们开发了一种使用用户级线程库动态更改卸载线程数量的新技术。我们证明了我们的卸载架构在微基准测试结果中表现出比现有方法更好的性能特征。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region

自引率

0.00%

发文量