{"title":"远程数据中心:使用共享 CXL 内存实现快速 RPC","authors":"Suyash Mahar, Ehsan Hajyjasini, Seungjin Lee, Zifeng Zhang, Mingyao Shen, Steven Swanson","doi":"arxiv-2408.11325","DOIUrl":null,"url":null,"abstract":"Datacenter applications often rely on remote procedure calls (RPCs) for fast,\nefficient, and secure communication. However, RPCs are slow, inefficient, and\nhard to use as they require expensive serialization and compression to\ncommunicate over a packetized serial network link. Compute Express Link 3.0\n(CXL) offers an alternative solution, allowing applications to share data using\na cache-coherent, shared-memory interface across clusters of machines. RPCool is a new framework that exploits CXL's shared memory capabilities.\nRPCool avoids serialization by passing pointers to data structures in shared\nmemory. While avoiding serialization is useful, directly sharing pointer-rich\ndata eliminates the isolation that copying data over traditional networks\nprovides, leaving the receiver vulnerable to invalid pointers and concurrent\nupdates to shared data by the sender. RPCool restores this safety with careful\nand efficient management of memory permissions. Another significant challenge\nwith CXL shared memory capabilities is that they are unlikely to scale to an\nentire datacenter. RPCool addresses this by falling back to RDMA-based\ncommunication. Overall, RPCool reduces the round-trip latency by 1.93$\\times$ and\n7.2$\\times$ compared to state-of-the-art RDMA and CXL-based RPC mechanisms,\nrespectively. Moreover, RPCool performs either comparably or better than other\nRPC mechanisms across a range of workloads.","PeriodicalId":501333,"journal":{"name":"arXiv - CS - Operating Systems","volume":"5 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Telepathic Datacenters: Fast RPCs using Shared CXL Memory\",\"authors\":\"Suyash Mahar, Ehsan Hajyjasini, Seungjin Lee, Zifeng Zhang, Mingyao Shen, Steven Swanson\",\"doi\":\"arxiv-2408.11325\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Datacenter applications often rely on remote procedure calls (RPCs) for fast,\\nefficient, and secure communication. However, RPCs are slow, inefficient, and\\nhard to use as they require expensive serialization and compression to\\ncommunicate over a packetized serial network link. Compute Express Link 3.0\\n(CXL) offers an alternative solution, allowing applications to share data using\\na cache-coherent, shared-memory interface across clusters of machines. RPCool is a new framework that exploits CXL's shared memory capabilities.\\nRPCool avoids serialization by passing pointers to data structures in shared\\nmemory. While avoiding serialization is useful, directly sharing pointer-rich\\ndata eliminates the isolation that copying data over traditional networks\\nprovides, leaving the receiver vulnerable to invalid pointers and concurrent\\nupdates to shared data by the sender. RPCool restores this safety with careful\\nand efficient management of memory permissions. Another significant challenge\\nwith CXL shared memory capabilities is that they are unlikely to scale to an\\nentire datacenter. RPCool addresses this by falling back to RDMA-based\\ncommunication. Overall, RPCool reduces the round-trip latency by 1.93$\\\\times$ and\\n7.2$\\\\times$ compared to state-of-the-art RDMA and CXL-based RPC mechanisms,\\nrespectively. Moreover, RPCool performs either comparably or better than other\\nRPC mechanisms across a range of workloads.\",\"PeriodicalId\":501333,\"journal\":{\"name\":\"arXiv - CS - Operating Systems\",\"volume\":\"5 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Operating Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.11325\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Operating Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.11325","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Telepathic Datacenters: Fast RPCs using Shared CXL Memory
Datacenter applications often rely on remote procedure calls (RPCs) for fast,
efficient, and secure communication. However, RPCs are slow, inefficient, and
hard to use as they require expensive serialization and compression to
communicate over a packetized serial network link. Compute Express Link 3.0
(CXL) offers an alternative solution, allowing applications to share data using
a cache-coherent, shared-memory interface across clusters of machines. RPCool is a new framework that exploits CXL's shared memory capabilities.
RPCool avoids serialization by passing pointers to data structures in shared
memory. While avoiding serialization is useful, directly sharing pointer-rich
data eliminates the isolation that copying data over traditional networks
provides, leaving the receiver vulnerable to invalid pointers and concurrent
updates to shared data by the sender. RPCool restores this safety with careful
and efficient management of memory permissions. Another significant challenge
with CXL shared memory capabilities is that they are unlikely to scale to an
entire datacenter. RPCool addresses this by falling back to RDMA-based
communication. Overall, RPCool reduces the round-trip latency by 1.93$\times$ and
7.2$\times$ compared to state-of-the-art RDMA and CXL-based RPC mechanisms,
respectively. Moreover, RPCool performs either comparably or better than other
RPC mechanisms across a range of workloads.