公共云中的加速器即服务:野外性能隔离的主机内流量管理视图

Jiechen Zhao, Ran Shu, Katie Lim, Zewen Fan, Thomas Anderson, Mingyu Gao, Natalie Enright Jerger
{"title":"公共云中的加速器即服务:野外性能隔离的主机内流量管理视图","authors":"Jiechen Zhao, Ran Shu, Katie Lim, Zewen Fan, Thomas Anderson, Mingyu Gao, Natalie Enright Jerger","doi":"arxiv-2407.10098","DOIUrl":null,"url":null,"abstract":"I/O devices in public clouds have integrated increasing numbers of hardware\naccelerators, e.g., AWS Nitro, Azure FPGA and Nvidia BlueField. However, such\nspecialized compute (1) is not explicitly accessible to cloud users with\nperformance guarantee, (2) cannot be leveraged simultaneously by both providers\nand users, unlike general-purpose compute (e.g., CPUs). Through ten\nobservations, we present that the fundamental difficulty of democratizing\naccelerators is insufficient performance isolation support. The key obstacles\nto enforcing accelerator isolation are (1) too many unknown traffic patterns in\npublic clouds and (2) too many possible contention sources in the datapath. In\nthis work, instead of scheduling such complex traffic on-the-fly and augmenting\nisolation support on each system component, we propose to model traffic as\nnetwork flows and proactively re-shape the traffic to avoid unpredictable\ncontention. We discuss the implications of our findings on the design of future\nI/O management stacks and device interfaces.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"74 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Accelerator-as-a-Service in Public Clouds: An Intra-Host Traffic Management View for Performance Isolation in the Wild\",\"authors\":\"Jiechen Zhao, Ran Shu, Katie Lim, Zewen Fan, Thomas Anderson, Mingyu Gao, Natalie Enright Jerger\",\"doi\":\"arxiv-2407.10098\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"I/O devices in public clouds have integrated increasing numbers of hardware\\naccelerators, e.g., AWS Nitro, Azure FPGA and Nvidia BlueField. However, such\\nspecialized compute (1) is not explicitly accessible to cloud users with\\nperformance guarantee, (2) cannot be leveraged simultaneously by both providers\\nand users, unlike general-purpose compute (e.g., CPUs). Through ten\\nobservations, we present that the fundamental difficulty of democratizing\\naccelerators is insufficient performance isolation support. The key obstacles\\nto enforcing accelerator isolation are (1) too many unknown traffic patterns in\\npublic clouds and (2) too many possible contention sources in the datapath. In\\nthis work, instead of scheduling such complex traffic on-the-fly and augmenting\\nisolation support on each system component, we propose to model traffic as\\nnetwork flows and proactively re-shape the traffic to avoid unpredictable\\ncontention. We discuss the implications of our findings on the design of future\\nI/O management stacks and device interfaces.\",\"PeriodicalId\":501291,\"journal\":{\"name\":\"arXiv - CS - Performance\",\"volume\":\"74 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Performance\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2407.10098\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Performance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.10098","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

公共云中的 I/O 设备集成了越来越多的硬件加速器,例如 AWS Nitro、Azure FPGA 和 Nvidia BlueField。然而,与通用计算(如中央处理器)不同的是,这种专用计算(1)不能明确地向云用户提供性能保证,(2)不能被提供商和用户同时利用。通过十项观察,我们发现加速器民主化的根本困难在于性能隔离支持不足。实施加速器隔离的关键障碍在于:(1)公共云中有太多未知的流量模式;(2)数据路径中有太多可能的争用源。在这项工作中,我们建议将流量建模为网络流,并主动重新塑造流量以避免不可预测的争用,而不是即时调度此类复杂流量并在每个系统组件上增强隔离支持。我们将讨论我们的发现对未来 I/O 管理堆栈和设备接口设计的影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Accelerator-as-a-Service in Public Clouds: An Intra-Host Traffic Management View for Performance Isolation in the Wild
I/O devices in public clouds have integrated increasing numbers of hardware accelerators, e.g., AWS Nitro, Azure FPGA and Nvidia BlueField. However, such specialized compute (1) is not explicitly accessible to cloud users with performance guarantee, (2) cannot be leveraged simultaneously by both providers and users, unlike general-purpose compute (e.g., CPUs). Through ten observations, we present that the fundamental difficulty of democratizing accelerators is insufficient performance isolation support. The key obstacles to enforcing accelerator isolation are (1) too many unknown traffic patterns in public clouds and (2) too many possible contention sources in the datapath. In this work, instead of scheduling such complex traffic on-the-fly and augmenting isolation support on each system component, we propose to model traffic as network flows and proactively re-shape the traffic to avoid unpredictable contention. We discuss the implications of our findings on the design of future I/O management stacks and device interfaces.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信