利用聚合器感知路由优化释放网内聚合的威力

IF 3 3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Shouxi Luo;Xiaoyu Yu;Ke Li;Huanlai Xing
{"title":"利用聚合器感知路由优化释放网内聚合的威力","authors":"Shouxi Luo;Xiaoyu Yu;Ke Li;Huanlai Xing","doi":"10.1109/TNET.2024.3423380","DOIUrl":null,"url":null,"abstract":"By offloading partial of the aggregation computation from the logical central parameter servers to network devices like programmable switches, In-Network Aggregation (INA) is a general, effective, and widely used approach to reduce network load thus alleviating the communication bottlenecks suffered by large-scale distributed training. Given the fact that INA would take effects if and only if associated traffic goes through the same in-network aggregator, the key to taking advantage of INA lies in routing control. However, existing proposals fall short in doing so and thus are far from optimal, since they select routes for INA-supported traffic without comprehensively considering the characteristics, limitations, and requirements of the network environment, aggregator hardware, and distributed training jobs. To fill the gap, in this paper, we systematically establish a mathematical model to formulate i) the up-down routing constraints of Clos datacenter networks, ii) the limitations raised by modern programmable switches’ pipeline hardware structure, and iii) the various aggregator-aware routing optimization goals required by distributed training tasks under different parallelism strategies. Based on the model, we develop ARO, an Aggregator-aware Routing Optimization solution for INA-accelerated distributed training applications. To be efficient, ARO involves a suite of search space pruning designs, by using the model’s characteristics, yielding tens of times improvement in the solving time with trivial performance loss. Extensive experiments show that ARO is able to find near-optimal results for large-scale routing optimization in tens of seconds, achieving \n<inline-formula> <tex-math>$1.8\\sim 4.0\\times $ </tex-math></inline-formula>\n higher throughput than the state-of-the-art solution.","PeriodicalId":13443,"journal":{"name":"IEEE/ACM Transactions on Networking","volume":"32 5","pages":"4488-4502"},"PeriodicalIF":3.0000,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Releasing the Power of In-Network Aggregation With Aggregator-Aware Routing Optimization\",\"authors\":\"Shouxi Luo;Xiaoyu Yu;Ke Li;Huanlai Xing\",\"doi\":\"10.1109/TNET.2024.3423380\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"By offloading partial of the aggregation computation from the logical central parameter servers to network devices like programmable switches, In-Network Aggregation (INA) is a general, effective, and widely used approach to reduce network load thus alleviating the communication bottlenecks suffered by large-scale distributed training. Given the fact that INA would take effects if and only if associated traffic goes through the same in-network aggregator, the key to taking advantage of INA lies in routing control. However, existing proposals fall short in doing so and thus are far from optimal, since they select routes for INA-supported traffic without comprehensively considering the characteristics, limitations, and requirements of the network environment, aggregator hardware, and distributed training jobs. To fill the gap, in this paper, we systematically establish a mathematical model to formulate i) the up-down routing constraints of Clos datacenter networks, ii) the limitations raised by modern programmable switches’ pipeline hardware structure, and iii) the various aggregator-aware routing optimization goals required by distributed training tasks under different parallelism strategies. Based on the model, we develop ARO, an Aggregator-aware Routing Optimization solution for INA-accelerated distributed training applications. To be efficient, ARO involves a suite of search space pruning designs, by using the model’s characteristics, yielding tens of times improvement in the solving time with trivial performance loss. Extensive experiments show that ARO is able to find near-optimal results for large-scale routing optimization in tens of seconds, achieving \\n<inline-formula> <tex-math>$1.8\\\\sim 4.0\\\\times $ </tex-math></inline-formula>\\n higher throughput than the state-of-the-art solution.\",\"PeriodicalId\":13443,\"journal\":{\"name\":\"IEEE/ACM Transactions on Networking\",\"volume\":\"32 5\",\"pages\":\"4488-4502\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2024-07-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE/ACM Transactions on Networking\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10589457/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/ACM Transactions on Networking","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10589457/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

摘要

通过将部分聚合计算从逻辑中心参数服务器卸载到可编程交换机等网络设备上,网内聚合(INA)是一种通用、有效且广泛使用的方法,可降低网络负载,从而缓解大规模分布式培训所遭遇的通信瓶颈。鉴于 INA 只有在相关流量通过同一网内聚合器时才会生效,因此利用 INA 的关键在于路由控制。然而,现有的建议并没有做到这一点,因此远未达到最佳效果,因为它们在为支持 INA 的流量选择路由时,没有全面考虑网络环境、聚合器硬件和分布式培训工作的特点、限制和要求。为了填补这一空白,我们在本文中系统地建立了一个数学模型,以阐明 i) Clos 数据中心网络的上下路由限制;ii) 现代可编程交换机流水线硬件结构带来的限制;iii) 不同并行策略下分布式训练任务所需的各种聚合器感知路由优化目标。基于该模型,我们开发了针对 INA 加速分布式训练应用的聚合器感知路由优化解决方案 ARO。为了提高效率,ARO 利用该模型的特点进行了一系列搜索空间剪枝设计,从而在性能损失不大的情况下将求解时间缩短了数十倍。大量实验表明,ARO能够在数十秒内为大规模路由优化找到接近最优的结果,比最先进的解决方案吞吐量高出1.8美元(4.0倍)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Releasing the Power of In-Network Aggregation With Aggregator-Aware Routing Optimization
By offloading partial of the aggregation computation from the logical central parameter servers to network devices like programmable switches, In-Network Aggregation (INA) is a general, effective, and widely used approach to reduce network load thus alleviating the communication bottlenecks suffered by large-scale distributed training. Given the fact that INA would take effects if and only if associated traffic goes through the same in-network aggregator, the key to taking advantage of INA lies in routing control. However, existing proposals fall short in doing so and thus are far from optimal, since they select routes for INA-supported traffic without comprehensively considering the characteristics, limitations, and requirements of the network environment, aggregator hardware, and distributed training jobs. To fill the gap, in this paper, we systematically establish a mathematical model to formulate i) the up-down routing constraints of Clos datacenter networks, ii) the limitations raised by modern programmable switches’ pipeline hardware structure, and iii) the various aggregator-aware routing optimization goals required by distributed training tasks under different parallelism strategies. Based on the model, we develop ARO, an Aggregator-aware Routing Optimization solution for INA-accelerated distributed training applications. To be efficient, ARO involves a suite of search space pruning designs, by using the model’s characteristics, yielding tens of times improvement in the solving time with trivial performance loss. Extensive experiments show that ARO is able to find near-optimal results for large-scale routing optimization in tens of seconds, achieving $1.8\sim 4.0\times $ higher throughput than the state-of-the-art solution.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE/ACM Transactions on Networking
IEEE/ACM Transactions on Networking 工程技术-电信学
CiteScore
8.20
自引率
5.40%
发文量
246
审稿时长
4-8 weeks
期刊介绍: The IEEE/ACM Transactions on Networking’s high-level objective is to publish high-quality, original research results derived from theoretical or experimental exploration of the area of communication/computer networking, covering all sorts of information transport networks over all sorts of physical layer technologies, both wireline (all kinds of guided media: e.g., copper, optical) and wireless (e.g., radio-frequency, acoustic (e.g., underwater), infra-red), or hybrids of these. The journal welcomes applied contributions reporting on novel experiences and experiments with actual systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信