利用聚合器感知路由优化释放网内聚合的威力

IF 3.6 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE/ACM Transactions on Networking Pub Date : 2024-07-08 DOI:10.1109/TNET.2024.3423380

Shouxi Luo;Xiaoyu Yu;Ke Li;Huanlai Xing

{"title":"利用聚合器感知路由优化释放网内聚合的威力","authors":"Shouxi Luo;Xiaoyu Yu;Ke Li;Huanlai Xing","doi":"10.1109/TNET.2024.3423380","DOIUrl":null,"url":null,"abstract":"By offloading partial of the aggregation computation from the logical central parameter servers to network devices like programmable switches, In-Network Aggregation (INA) is a general, effective, and widely used approach to reduce network load thus alleviating the communication bottlenecks suffered by large-scale distributed training. Given the fact that INA would take effects if and only if associated traffic goes through the same in-network aggregator, the key to taking advantage of INA lies in routing control. However, existing proposals fall short in doing so and thus are far from optimal, since they select routes for INA-supported traffic without comprehensively considering the characteristics, limitations, and requirements of the network environment, aggregator hardware, and distributed training jobs. To fill the gap, in this paper, we systematically establish a mathematical model to formulate i) the up-down routing constraints of Clos datacenter networks, ii) the limitations raised by modern programmable switches’ pipeline hardware structure, and iii) the various aggregator-aware routing optimization goals required by distributed training tasks under different parallelism strategies. Based on the model, we develop ARO, an Aggregator-aware Routing Optimization solution for INA-accelerated distributed training applications. To be efficient, ARO involves a suite of search space pruning designs, by using the model’s characteristics, yielding tens of times improvement in the solving time with trivial performance loss. Extensive experiments show that ARO is able to find near-optimal results for large-scale routing optimization in tens of seconds, achieving \n<inline-formula> <tex-math>$1.8\\sim 4.0\\times $ </tex-math></inline-formula>\n higher throughput than the state-of-the-art solution.","PeriodicalId":13443,"journal":{"name":"IEEE/ACM Transactions on Networking","volume":"32 5","pages":"4488-4502"},"PeriodicalIF":3.6000,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Releasing the Power of In-Network Aggregation With Aggregator-Aware Routing Optimization\",\"authors\":\"Shouxi Luo;Xiaoyu Yu;Ke Li;Huanlai Xing\",\"doi\":\"10.1109/TNET.2024.3423380\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"By offloading partial of the aggregation computation from the logical central parameter servers to network devices like programmable switches, In-Network Aggregation (INA) is a general, effective, and widely used approach to reduce network load thus alleviating the communication bottlenecks suffered by large-scale distributed training. Given the fact that INA would take effects if and only if associated traffic goes through the same in-network aggregator, the key to taking advantage of INA lies in routing control. However, existing proposals fall short in doing so and thus are far from optimal, since they select routes for INA-supported traffic without comprehensively considering the characteristics, limitations, and requirements of the network environment, aggregator hardware, and distributed training jobs. To fill the gap, in this paper, we systematically establish a mathematical model to formulate i) the up-down routing constraints of Clos datacenter networks, ii) the limitations raised by modern programmable switches’ pipeline hardware structure, and iii) the various aggregator-aware routing optimization goals required by distributed training tasks under different parallelism strategies. Based on the model, we develop ARO, an Aggregator-aware Routing Optimization solution for INA-accelerated distributed training applications. To be efficient, ARO involves a suite of search space pruning designs, by using the model’s characteristics, yielding tens of times improvement in the solving time with trivial performance loss. Extensive experiments show that ARO is able to find near-optimal results for large-scale routing optimization in tens of seconds, achieving \\n<inline-formula> <tex-math>$1.8\\\\sim 4.0\\\\times $ </tex-math></inline-formula>\\n higher throughput than the state-of-the-art solution.\",\"PeriodicalId\":13443,\"journal\":{\"name\":\"IEEE/ACM Transactions on Networking\",\"volume\":\"32 5\",\"pages\":\"4488-4502\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2024-07-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE/ACM Transactions on Networking\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10589457/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/ACM Transactions on Networking","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10589457/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

通过将部分聚合计算从逻辑中心参数服务器卸载到可编程交换机等网络设备上，网内聚合（INA）是一种通用、有效且广泛使用的方法，可降低网络负载，从而缓解大规模分布式培训所遭遇的通信瓶颈。鉴于 INA 只有在相关流量通过同一网内聚合器时才会生效，因此利用 INA 的关键在于路由控制。然而，现有的建议并没有做到这一点，因此远未达到最佳效果，因为它们在为支持 INA 的流量选择路由时，没有全面考虑网络环境、聚合器硬件和分布式培训工作的特点、限制和要求。为了填补这一空白，我们在本文中系统地建立了一个数学模型，以阐明 i) Clos 数据中心网络的上下路由限制；ii) 现代可编程交换机流水线硬件结构带来的限制；iii) 不同并行策略下分布式训练任务所需的各种聚合器感知路由优化目标。基于该模型，我们开发了针对 INA 加速分布式训练应用的聚合器感知路由优化解决方案 ARO。为了提高效率，ARO 利用该模型的特点进行了一系列搜索空间剪枝设计，从而在性能损失不大的情况下将求解时间缩短了数十倍。大量实验表明，ARO能够在数十秒内为大规模路由优化找到接近最优的结果，比最先进的解决方案吞吐量高出1.8美元（4.0倍）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Releasing the Power of In-Network Aggregation With Aggregator-Aware Routing Optimization

By offloading partial of the aggregation computation from the logical central parameter servers to network devices like programmable switches, In-Network Aggregation (INA) is a general, effective, and widely used approach to reduce network load thus alleviating the communication bottlenecks suffered by large-scale distributed training. Given the fact that INA would take effects if and only if associated traffic goes through the same in-network aggregator, the key to taking advantage of INA lies in routing control. However, existing proposals fall short in doing so and thus are far from optimal, since they select routes for INA-supported traffic without comprehensively considering the characteristics, limitations, and requirements of the network environment, aggregator hardware, and distributed training jobs. To fill the gap, in this paper, we systematically establish a mathematical model to formulate i) the up-down routing constraints of Clos datacenter networks, ii) the limitations raised by modern programmable switches’ pipeline hardware structure, and iii) the various aggregator-aware routing optimization goals required by distributed training tasks under different parallelism strategies. Based on the model, we develop ARO, an Aggregator-aware Routing Optimization solution for INA-accelerated distributed training applications. To be efficient, ARO involves a suite of search space pruning designs, by using the model’s characteristics, yielding tens of times improvement in the solving time with trivial performance loss. Extensive experiments show that ARO is able to find near-optimal results for large-scale routing optimization in tens of seconds, achieving

$1.8\sim 4.0\times $

higher throughput than the state-of-the-art solution.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE/ACM Transactions on Networking 工程技术-电信学

CiteScore

8.20

自引率

5.40%

发文量

246

审稿时长

4-8 weeks

期刊介绍： The IEEE/ACM Transactions on Networking’s high-level objective is to publish high-quality, original research results derived from theoretical or experimental exploration of the area of communication/computer networking, covering all sorts of information transport networks over all sorts of physical layer technologies, both wireline (all kinds of guided media: e.g., copper, optical) and wireless (e.g., radio-frequency, acoustic (e.g., underwater), infra-red), or hybrids of these. The journal welcomes applied contributions reporting on novel experiences and experiments with actual systems.