sAXI: A High-Efficient Hardware Inter-Node Link in ARM Server for Remote Memory Access

2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) Pub Date : 2016-05-16 DOI:10.1109/CCGrid.2016.66

Ke Zhang, Yisong Chang, Lixin Zhang, Mingyu Chen, Lei Yu, Zhiwei Xu

{"title":"sAXI: A High-Efficient Hardware Inter-Node Link in ARM Server for Remote Memory Access","authors":"Ke Zhang, Yisong Chang, Lixin Zhang, Mingyu Chen, Lei Yu, Zhiwei Xu","doi":"10.1109/CCGrid.2016.66","DOIUrl":null,"url":null,"abstract":"The ever-growing need for fast big-data operations has made in-memory processing increasingly important in modern datacenters. To mitigate the capacity limitation of a single server node, techniques of inner-rack cross-node memory access have drawn attention recently. However, existing proposals exhibit inefficiency in remote memory access among server nodes due to inter-protocol conversions and non-transparent coarse-grained accesses. In this study, we propose the high-performance and efficient serialized AXI (sAXI) link and its associated cross-node memory access mechanism for emerging ARM-based servers. The key idea behind sAXI is directly extending the on-chip AMBA AXI-4.0 interconnection of the SoC in a local server node to the outside, and then bringing into remote server nodes via high-speed serial lanes. As a result, natively accessing remote memory in adjacent nodes in the same manner of local assets is supported by purely using existing software. Experimental results show that, using the sAXI data-path, performance of remote memory access in the user-level micro-benchmark is very promising (min. latency: 1.16μs, max. bandwidth: 1.52GB/s on our in-house FPGA prototype). In addition, through this efficient hardware inter-node link, performance of an in-memory key-value framework, Redis, can be improved up to 1.72x and large latency overhead of database query can be effectively hidden.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGrid.2016.66","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

The ever-growing need for fast big-data operations has made in-memory processing increasingly important in modern datacenters. To mitigate the capacity limitation of a single server node, techniques of inner-rack cross-node memory access have drawn attention recently. However, existing proposals exhibit inefficiency in remote memory access among server nodes due to inter-protocol conversions and non-transparent coarse-grained accesses. In this study, we propose the high-performance and efficient serialized AXI (sAXI) link and its associated cross-node memory access mechanism for emerging ARM-based servers. The key idea behind sAXI is directly extending the on-chip AMBA AXI-4.0 interconnection of the SoC in a local server node to the outside, and then bringing into remote server nodes via high-speed serial lanes. As a result, natively accessing remote memory in adjacent nodes in the same manner of local assets is supported by purely using existing software. Experimental results show that, using the sAXI data-path, performance of remote memory access in the user-level micro-benchmark is very promising (min. latency: 1.16μs, max. bandwidth: 1.52GB/s on our in-house FPGA prototype). In addition, through this efficient hardware inter-node link, performance of an in-memory key-value framework, Redis, can be improved up to 1.72x and large latency overhead of database query can be effectively hidden.

查看原文本刊更多论文

面向远程内存访问的ARM服务器高效硬件节点间链路

对快速大数据操作的需求不断增长，使得内存处理在现代数据中心中变得越来越重要。为了减轻单个服务器节点的容量限制，机架内跨节点内存访问技术近年来引起了人们的关注。然而，由于协议间转换和非透明的粗粒度访问，现有的建议在服务器节点之间的远程内存访问中表现出低效率。在这项研究中，我们提出了高性能和高效的串行AXI (sAXI)链路及其相关的跨节点内存访问机制，用于新兴的基于arm的服务器。sAXI背后的关键思想是直接将本地服务器节点中SoC的片上AMBA axis -4.0互连扩展到外部，然后通过高速串行通道带入远程服务器节点。因此，仅使用现有软件就可以支持以与本地资产相同的方式本地访问相邻节点中的远程内存。实验结果表明，使用sAXI数据路径，在用户级微基准测试中远程内存访问的性能是非常有希望的(最小延迟:1.16μs，最大延迟:1.16μs)。带宽:1.52GB/s在我们内部的FPGA原型)。此外，通过这种高效的硬件节点间链接，内存中的键值框架Redis的性能可以提高1.72倍，并且可以有效地隐藏数据库查询的大延迟开销。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)

自引率

0.00%

发文量