Telepathic Datacenters: Fast RPCs using Shared CXL Memory

arXiv - CS - Operating Systems Pub Date : 2024-08-21 DOI:arxiv-2408.11325

Suyash Mahar, Ehsan Hajyjasini, Seungjin Lee, Zifeng Zhang, Mingyao Shen, Steven Swanson

{"title":"Telepathic Datacenters: Fast RPCs using Shared CXL Memory","authors":"Suyash Mahar, Ehsan Hajyjasini, Seungjin Lee, Zifeng Zhang, Mingyao Shen, Steven Swanson","doi":"arxiv-2408.11325","DOIUrl":null,"url":null,"abstract":"Datacenter applications often rely on remote procedure calls (RPCs) for fast,\nefficient, and secure communication. However, RPCs are slow, inefficient, and\nhard to use as they require expensive serialization and compression to\ncommunicate over a packetized serial network link. Compute Express Link 3.0\n(CXL) offers an alternative solution, allowing applications to share data using\na cache-coherent, shared-memory interface across clusters of machines. RPCool is a new framework that exploits CXL's shared memory capabilities.\nRPCool avoids serialization by passing pointers to data structures in shared\nmemory. While avoiding serialization is useful, directly sharing pointer-rich\ndata eliminates the isolation that copying data over traditional networks\nprovides, leaving the receiver vulnerable to invalid pointers and concurrent\nupdates to shared data by the sender. RPCool restores this safety with careful\nand efficient management of memory permissions. Another significant challenge\nwith CXL shared memory capabilities is that they are unlikely to scale to an\nentire datacenter. RPCool addresses this by falling back to RDMA-based\ncommunication. Overall, RPCool reduces the round-trip latency by 1.93$\\times$ and\n7.2$\\times$ compared to state-of-the-art RDMA and CXL-based RPC mechanisms,\nrespectively. Moreover, RPCool performs either comparably or better than other\nRPC mechanisms across a range of workloads.","PeriodicalId":501333,"journal":{"name":"arXiv - CS - Operating Systems","volume":"5 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Operating Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.11325","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Datacenter applications often rely on remote procedure calls (RPCs) for fast, efficient, and secure communication. However, RPCs are slow, inefficient, and hard to use as they require expensive serialization and compression to communicate over a packetized serial network link. Compute Express Link 3.0 (CXL) offers an alternative solution, allowing applications to share data using a cache-coherent, shared-memory interface across clusters of machines. RPCool is a new framework that exploits CXL's shared memory capabilities. RPCool avoids serialization by passing pointers to data structures in shared memory. While avoiding serialization is useful, directly sharing pointer-rich data eliminates the isolation that copying data over traditional networks provides, leaving the receiver vulnerable to invalid pointers and concurrent updates to shared data by the sender. RPCool restores this safety with careful and efficient management of memory permissions. Another significant challenge with CXL shared memory capabilities is that they are unlikely to scale to an entire datacenter. RPCool addresses this by falling back to RDMA-based communication. Overall, RPCool reduces the round-trip latency by 1.93$\times$ and 7.2$\times$ compared to state-of-the-art RDMA and CXL-based RPC mechanisms, respectively. Moreover, RPCool performs either comparably or better than other RPC mechanisms across a range of workloads.

查看原文本刊更多论文

远程数据中心：使用共享 CXL 内存实现快速 RPC

数据中心应用程序通常依赖远程过程调用（RPC）来实现快速、高效和安全的通信。但是，RPC 的速度慢、效率低，而且难以使用，因为它们需要昂贵的序列化和压缩，才能通过分组串行网络链接进行通信。Compute Express Link 3.0（CXL）提供了另一种解决方案，允许应用程序使用高速缓存相干的共享内存接口跨机器集群共享数据。RPCool 通过向共享内存中的数据结构传递指针来避免序列化。虽然避免序列化是有用的，但直接共享指针数据消除了通过传统网络复制数据所提供的隔离性，使接收方容易受到无效指针和发送方对共享数据并发更新的影响。RPCool 通过对内存权限进行细致而高效的管理，恢复了这种安全性。CXL 共享内存功能面临的另一个重大挑战是，它们不可能扩展到整个数据中心。RPCool 通过退回到基于 RDMA 的通信来解决这个问题。总体而言，与最先进的 RDMA 和基于 CXL 的 RPC 机制相比，RPCool 将往返延迟分别减少了 1.93 美元/次和 7.2 美元/次。此外，在一系列工作负载中，RPCool 的性能与其他 RPC 机制相当或更好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Operating Systems

自引率

0.00%

发文量