在完全分解的系统中对有效数据移动的体系结构支持

Proceedings of the ACM on Measurement and Analysis of Computing Systems Pub Date : 2023-02-27 DOI:10.1145/3579445

Christina Giannoula, Kailong Huang, Jonathan Tang, N. Koziris, G. Goumas, Zeshan A. Chishti, Nandita Vijaykumar

{"title":"在完全分解的系统中对有效数据移动的体系结构支持","authors":"Christina Giannoula, Kailong Huang, Jonathan Tang, N. Koziris, G. Goumas, Zeshan A. Chishti, Nandita Vijaykumar","doi":"10.1145/3579445","DOIUrl":null,"url":null,"abstract":"Resource disaggregation offers a cost effective solution to resource scaling, utilization, and failure-handling in data centers by physically separating hardware devices in a server. Servers are architected as pools of processor, memory, and storage devices, organized as independent failure-isolated components interconnected by a high-bandwidth network. A critical challenge, however, is the high performance penalty of accessing data from a remote memory module over the network. Addressing this challenge is difficult as disaggregated systems have high runtime variability in network latencies/bandwidth, and page migration can significantly delay critical path cache line accesses in other pages. This paper conducts a characterization analysis on different data movement strategies in fully disaggregated systems, evaluates their performance overheads in a variety of workloads, and introduces DaeMon, the first software-transparent mechanism to significantly alleviate data movement overheads in fully disaggregated systems. First, to enable scalability to multiple hardware components in the system, we enhance each compute and memory unit with specialized engines that transparently handle data migrations. Second, to achieve high performance and provide robustness across various network, architecture and application characteristics, we implement a synergistic approach of bandwidth partitioning, link compression, decoupled data movement of multiple granularities, and adaptive granularity selection in data movements. We evaluate DaeMon in a wide variety of workloads at different network and architecture configurations using a state-of-the-art simulator. DaeMon improves system performance and data access costs by 2.39× and 3.06×, respectively, over the widely-adopted approach of moving data at page granularity.","PeriodicalId":426760,"journal":{"name":"Proceedings of the ACM on Measurement and Analysis of Computing Systems","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"DaeMon: Architectural Support for Efficient Data Movement in Fully Disaggregated Systems\",\"authors\":\"Christina Giannoula, Kailong Huang, Jonathan Tang, N. Koziris, G. Goumas, Zeshan A. Chishti, Nandita Vijaykumar\",\"doi\":\"10.1145/3579445\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Resource disaggregation offers a cost effective solution to resource scaling, utilization, and failure-handling in data centers by physically separating hardware devices in a server. Servers are architected as pools of processor, memory, and storage devices, organized as independent failure-isolated components interconnected by a high-bandwidth network. A critical challenge, however, is the high performance penalty of accessing data from a remote memory module over the network. Addressing this challenge is difficult as disaggregated systems have high runtime variability in network latencies/bandwidth, and page migration can significantly delay critical path cache line accesses in other pages. This paper conducts a characterization analysis on different data movement strategies in fully disaggregated systems, evaluates their performance overheads in a variety of workloads, and introduces DaeMon, the first software-transparent mechanism to significantly alleviate data movement overheads in fully disaggregated systems. First, to enable scalability to multiple hardware components in the system, we enhance each compute and memory unit with specialized engines that transparently handle data migrations. Second, to achieve high performance and provide robustness across various network, architecture and application characteristics, we implement a synergistic approach of bandwidth partitioning, link compression, decoupled data movement of multiple granularities, and adaptive granularity selection in data movements. We evaluate DaeMon in a wide variety of workloads at different network and architecture configurations using a state-of-the-art simulator. DaeMon improves system performance and data access costs by 2.39× and 3.06×, respectively, over the widely-adopted approach of moving data at page granularity.\",\"PeriodicalId\":426760,\"journal\":{\"name\":\"Proceedings of the ACM on Measurement and Analysis of Computing Systems\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-02-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ACM on Measurement and Analysis of Computing Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3579445\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM on Measurement and Analysis of Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3579445","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

资源分解通过物理分离服务器中的硬件设备，为数据中心中的资源扩展、利用和故障处理提供了一种经济有效的解决方案。服务器架构为处理器、内存和存储设备池，组织为通过高带宽网络相互连接的独立故障隔离组件。然而，一个关键的挑战是通过网络从远程内存模块访问数据的高性能损失。解决这一挑战是困难的，因为分解系统在网络延迟/带宽方面具有很高的运行时可变性，并且页面迁移会显著延迟其他页面中的关键路径缓存线访问。本文对完全分解系统中不同的数据移动策略进行了特征分析，评估了它们在各种工作负载下的性能开销，并介绍了DaeMon，这是第一个显著减轻完全分解系统中数据移动开销的软件透明机制。首先，为了支持系统中多个硬件组件的可伸缩性，我们使用专门的引擎来增强每个计算和内存单元，这些引擎可以透明地处理数据迁移。其次，为了实现高性能并提供跨各种网络、架构和应用特性的鲁棒性，我们实现了带宽划分、链路压缩、多粒度解耦数据移动以及数据移动中的自适应粒度选择的协同方法。我们使用最先进的模拟器在不同网络和体系结构配置的各种工作负载下评估DaeMon。与广泛采用的按页面粒度移动数据的方法相比，DaeMon将系统性能和数据访问成本分别提高了2.39倍和3.06倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

DaeMon: Architectural Support for Efficient Data Movement in Fully Disaggregated Systems

Resource disaggregation offers a cost effective solution to resource scaling, utilization, and failure-handling in data centers by physically separating hardware devices in a server. Servers are architected as pools of processor, memory, and storage devices, organized as independent failure-isolated components interconnected by a high-bandwidth network. A critical challenge, however, is the high performance penalty of accessing data from a remote memory module over the network. Addressing this challenge is difficult as disaggregated systems have high runtime variability in network latencies/bandwidth, and page migration can significantly delay critical path cache line accesses in other pages. This paper conducts a characterization analysis on different data movement strategies in fully disaggregated systems, evaluates their performance overheads in a variety of workloads, and introduces DaeMon, the first software-transparent mechanism to significantly alleviate data movement overheads in fully disaggregated systems. First, to enable scalability to multiple hardware components in the system, we enhance each compute and memory unit with specialized engines that transparently handle data migrations. Second, to achieve high performance and provide robustness across various network, architecture and application characteristics, we implement a synergistic approach of bandwidth partitioning, link compression, decoupled data movement of multiple granularities, and adaptive granularity selection in data movements. We evaluate DaeMon in a wide variety of workloads at different network and architecture configurations using a state-of-the-art simulator. DaeMon improves system performance and data access costs by 2.39× and 3.06×, respectively, over the widely-adopted approach of moving data at page granularity.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the ACM on Measurement and Analysis of Computing Systems

CiteScore

3.20

自引率

0.00%

发文量