公共云中的加速器即服务：野外性能隔离的主机内流量管理视图

arXiv - CS - Performance Pub Date : 2024-07-14 DOI:arxiv-2407.10098

Jiechen Zhao, Ran Shu, Katie Lim, Zewen Fan, Thomas Anderson, Mingyu Gao, Natalie Enright Jerger

{"title":"公共云中的加速器即服务：野外性能隔离的主机内流量管理视图","authors":"Jiechen Zhao, Ran Shu, Katie Lim, Zewen Fan, Thomas Anderson, Mingyu Gao, Natalie Enright Jerger","doi":"arxiv-2407.10098","DOIUrl":null,"url":null,"abstract":"I/O devices in public clouds have integrated increasing numbers of hardware\naccelerators, e.g., AWS Nitro, Azure FPGA and Nvidia BlueField. However, such\nspecialized compute (1) is not explicitly accessible to cloud users with\nperformance guarantee, (2) cannot be leveraged simultaneously by both providers\nand users, unlike general-purpose compute (e.g., CPUs). Through ten\nobservations, we present that the fundamental difficulty of democratizing\naccelerators is insufficient performance isolation support. The key obstacles\nto enforcing accelerator isolation are (1) too many unknown traffic patterns in\npublic clouds and (2) too many possible contention sources in the datapath. In\nthis work, instead of scheduling such complex traffic on-the-fly and augmenting\nisolation support on each system component, we propose to model traffic as\nnetwork flows and proactively re-shape the traffic to avoid unpredictable\ncontention. We discuss the implications of our findings on the design of future\nI/O management stacks and device interfaces.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"74 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Accelerator-as-a-Service in Public Clouds: An Intra-Host Traffic Management View for Performance Isolation in the Wild\",\"authors\":\"Jiechen Zhao, Ran Shu, Katie Lim, Zewen Fan, Thomas Anderson, Mingyu Gao, Natalie Enright Jerger\",\"doi\":\"arxiv-2407.10098\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"I/O devices in public clouds have integrated increasing numbers of hardware\\naccelerators, e.g., AWS Nitro, Azure FPGA and Nvidia BlueField. However, such\\nspecialized compute (1) is not explicitly accessible to cloud users with\\nperformance guarantee, (2) cannot be leveraged simultaneously by both providers\\nand users, unlike general-purpose compute (e.g., CPUs). Through ten\\nobservations, we present that the fundamental difficulty of democratizing\\naccelerators is insufficient performance isolation support. The key obstacles\\nto enforcing accelerator isolation are (1) too many unknown traffic patterns in\\npublic clouds and (2) too many possible contention sources in the datapath. In\\nthis work, instead of scheduling such complex traffic on-the-fly and augmenting\\nisolation support on each system component, we propose to model traffic as\\nnetwork flows and proactively re-shape the traffic to avoid unpredictable\\ncontention. We discuss the implications of our findings on the design of future\\nI/O management stacks and device interfaces.\",\"PeriodicalId\":501291,\"journal\":{\"name\":\"arXiv - CS - Performance\",\"volume\":\"74 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Performance\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2407.10098\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Performance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.10098","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

公共云中的 I/O 设备集成了越来越多的硬件加速器，例如 AWS Nitro、Azure FPGA 和 Nvidia BlueField。然而，与通用计算（如中央处理器）不同的是，这种专用计算（1）不能明确地向云用户提供性能保证，（2）不能被提供商和用户同时利用。通过十项观察，我们发现加速器民主化的根本困难在于性能隔离支持不足。实施加速器隔离的关键障碍在于：（1）公共云中有太多未知的流量模式；（2）数据路径中有太多可能的争用源。在这项工作中，我们建议将流量建模为网络流，并主动重新塑造流量以避免不可预测的争用，而不是即时调度此类复杂流量并在每个系统组件上增强隔离支持。我们将讨论我们的发现对未来 I/O 管理堆栈和设备接口设计的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Accelerator-as-a-Service in Public Clouds: An Intra-Host Traffic Management View for Performance Isolation in the Wild

I/O devices in public clouds have integrated increasing numbers of hardware accelerators, e.g., AWS Nitro, Azure FPGA and Nvidia BlueField. However, such specialized compute (1) is not explicitly accessible to cloud users with performance guarantee, (2) cannot be leveraged simultaneously by both providers and users, unlike general-purpose compute (e.g., CPUs). Through ten observations, we present that the fundamental difficulty of democratizing accelerators is insufficient performance isolation support. The key obstacles to enforcing accelerator isolation are (1) too many unknown traffic patterns in public clouds and (2) too many possible contention sources in the datapath. In this work, instead of scheduling such complex traffic on-the-fly and augmenting isolation support on each system component, we propose to model traffic as network flows and proactively re-shape the traffic to avoid unpredictable contention. We discuss the implications of our findings on the design of future I/O management stacks and device interfaces.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Performance

自引率

0.00%

发文量