Superways:一种用于超大负载的数据中心拓扑

Proceedings of the Web Conference 2021 Pub Date : 2021-04-19 DOI:10.1145/3442381.3449966

Hamed Rezaei, Balajee Vamanan

{"title":"Superways:一种用于超大负载的数据中心拓扑","authors":"Hamed Rezaei, Balajee Vamanan","doi":"10.1145/3442381.3449966","DOIUrl":null,"url":null,"abstract":"Several important datacenter applications cause incast congestion, which severely degrades flow completion times of short flows and throughput of long flows. Further, because most flows are short and the incast duration is shorter than typical round-trip times, reactive mechanisms that rely on congestion control are not effective. While modern datacenter topologies provide high bisection bandwidth to support all-to-all traffic, incast is fundamentally a many-to-one traffic pattern, and therefore, requires deep buffers or high bandwidth at the network edge. We propose Superways, a heterogeneous datacenter topology that provides higher bandwidth for some servers to absorb incasts, as incasts occur only at a small number of servers that aggregate responses from other senders. Our design is based on the key observation that a small subset of servers which aggregate responses are likely to be network bound, whereas most other servers that communicate only with random servers are not. Superways can be implemented over many of the existing datacenter topologies and can be expanded flexibly without incurring high cost and cabling complexity. We also provide a heuristic for scheduling jobs in our topology to fully utilize the extra capacity. Using a real CloudLab implementation and using ns-3 simulations, we show that Superways significantly improves flow completion times and throughput over existing datacenter topologies. We also analyze cost and cabling complexity, and discuss how to expand our topology.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Superways: A Datacenter Topology for Incast-heavy workloads\",\"authors\":\"Hamed Rezaei, Balajee Vamanan\",\"doi\":\"10.1145/3442381.3449966\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Several important datacenter applications cause incast congestion, which severely degrades flow completion times of short flows and throughput of long flows. Further, because most flows are short and the incast duration is shorter than typical round-trip times, reactive mechanisms that rely on congestion control are not effective. While modern datacenter topologies provide high bisection bandwidth to support all-to-all traffic, incast is fundamentally a many-to-one traffic pattern, and therefore, requires deep buffers or high bandwidth at the network edge. We propose Superways, a heterogeneous datacenter topology that provides higher bandwidth for some servers to absorb incasts, as incasts occur only at a small number of servers that aggregate responses from other senders. Our design is based on the key observation that a small subset of servers which aggregate responses are likely to be network bound, whereas most other servers that communicate only with random servers are not. Superways can be implemented over many of the existing datacenter topologies and can be expanded flexibly without incurring high cost and cabling complexity. We also provide a heuristic for scheduling jobs in our topology to fully utilize the extra capacity. Using a real CloudLab implementation and using ns-3 simulations, we show that Superways significantly improves flow completion times and throughput over existing datacenter topologies. We also analyze cost and cabling complexity, and discuss how to expand our topology.\",\"PeriodicalId\":106672,\"journal\":{\"name\":\"Proceedings of the Web Conference 2021\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-04-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Web Conference 2021\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3442381.3449966\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Web Conference 2021","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3442381.3449966","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

一些重要的数据中心应用程序会导致即时拥塞，这严重降低了短流的流完成时间和长流的吞吐量。此外，由于大多数流都很短，并且持续时间短于典型的往返时间，依赖于拥塞控制的反应机制并不有效。虽然现代数据中心拓扑结构提供高对分带宽来支持所有到所有的流量，但从根本上说，即时传输是一种多对一的流量模式，因此需要在网络边缘提供深缓冲区或高带宽。我们提出了Superways，这是一种异构数据中心拓扑，它为一些服务器提供了更高的带宽来吸收注入，因为注入只发生在少数服务器上，这些服务器聚合了来自其他发送者的响应。我们的设计是基于一个关键的观察，即一小部分聚合响应的服务器可能是网络绑定的，而大多数其他仅与随机服务器通信的服务器则不是。超级通道可以在许多现有的数据中心拓扑上实现，并且可以灵活地扩展，而不会产生高成本和布线复杂性。我们还提供了一种启发式方法来调度拓扑中的作业，以充分利用额外的容量。通过使用真实的CloudLab实现和ns-3模拟，我们发现Superways显著改善了现有数据中心拓扑的流完成时间和吞吐量。我们还分析了成本和布线复杂性，并讨论了如何扩展我们的拓扑。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Superways: A Datacenter Topology for Incast-heavy workloads

Several important datacenter applications cause incast congestion, which severely degrades flow completion times of short flows and throughput of long flows. Further, because most flows are short and the incast duration is shorter than typical round-trip times, reactive mechanisms that rely on congestion control are not effective. While modern datacenter topologies provide high bisection bandwidth to support all-to-all traffic, incast is fundamentally a many-to-one traffic pattern, and therefore, requires deep buffers or high bandwidth at the network edge. We propose Superways, a heterogeneous datacenter topology that provides higher bandwidth for some servers to absorb incasts, as incasts occur only at a small number of servers that aggregate responses from other senders. Our design is based on the key observation that a small subset of servers which aggregate responses are likely to be network bound, whereas most other servers that communicate only with random servers are not. Superways can be implemented over many of the existing datacenter topologies and can be expanded flexibly without incurring high cost and cabling complexity. We also provide a heuristic for scheduling jobs in our topology to fully utilize the extra capacity. Using a real CloudLab implementation and using ns-3 simulations, we show that Superways significantly improves flow completion times and throughput over existing datacenter topologies. We also analyze cost and cabling complexity, and discuss how to expand our topology.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the Web Conference 2021

自引率

0.00%

发文量