Traffic-Aware In-Network Aggregation Placement for Multi-Tenant Distributed Machine Learning

H. Kim, Hochan Lee, Sangheon Pack
{"title":"Traffic-Aware In-Network Aggregation Placement for Multi-Tenant Distributed Machine Learning","authors":"H. Kim, Hochan Lee, Sangheon Pack","doi":"10.1109/ICCCN58024.2023.10230140","DOIUrl":null,"url":null,"abstract":"Distributed machine learning is an effective method to alleviate intensive computation costs of training; however it suffers from network bottlenecks while gathering local results. Recent advent of programmable data planes opened a new avenue, in-network aggregation, which executes gradient aggregations in the middle of the network resolving network bottlenecks and further accelerates distributed machine learning. However, due to resource-constrained features of current programmable data planes, installation of in-network aggregation functionalities throughout the network would impose unacceptable burden, posing a need for sophisticated deployment. In this paper, we consider a problem of deploying in-network aggregation functionalities, so as to minimize the total network traffic in multi-tenant distributed machine learning. Since the formulated problem is an integer linear programming problem, which is known as NP-hard, we propose a traffic aware placement of in-network aggregation (TAPINA) algorithm with lower complexity and near-optimal performance. TAPINA decides aggregation points of multiple tenants sequentially in order of their expected traffics and reuses the already selected aggregation points by other tenants to reduce the overall deployment cost. Simulation results demonstrate that TAPINA shows near-optimal performance, achieving up to 20 % traffic reduction compared to the state-of-the-art algorithm in most cases.","PeriodicalId":132030,"journal":{"name":"2023 32nd International Conference on Computer Communications and Networks (ICCCN)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 32nd International Conference on Computer Communications and Networks (ICCCN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCN58024.2023.10230140","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Distributed machine learning is an effective method to alleviate intensive computation costs of training; however it suffers from network bottlenecks while gathering local results. Recent advent of programmable data planes opened a new avenue, in-network aggregation, which executes gradient aggregations in the middle of the network resolving network bottlenecks and further accelerates distributed machine learning. However, due to resource-constrained features of current programmable data planes, installation of in-network aggregation functionalities throughout the network would impose unacceptable burden, posing a need for sophisticated deployment. In this paper, we consider a problem of deploying in-network aggregation functionalities, so as to minimize the total network traffic in multi-tenant distributed machine learning. Since the formulated problem is an integer linear programming problem, which is known as NP-hard, we propose a traffic aware placement of in-network aggregation (TAPINA) algorithm with lower complexity and near-optimal performance. TAPINA decides aggregation points of multiple tenants sequentially in order of their expected traffics and reuses the already selected aggregation points by other tenants to reduce the overall deployment cost. Simulation results demonstrate that TAPINA shows near-optimal performance, achieving up to 20 % traffic reduction compared to the state-of-the-art algorithm in most cases.
面向多租户分布式机器学习的流量感知网络聚合布局
分布式机器学习是一种有效的方法,可以减少密集的训练计算成本;然而,它在收集本地结果时受到网络瓶颈的困扰。最近可编程数据平面的出现开辟了一条新的途径,即网络内聚合,它在网络中间执行梯度聚合,解决了网络瓶颈,进一步加速了分布式机器学习。但是,由于当前可编程数据平面的资源限制特点,在整个网络中安装网络内聚合功能将造成不可接受的负担,因此需要进行复杂的部署。在本文中,我们考虑了一个部署网络内聚合功能的问题,以最小化多租户分布式机器学习中的总网络流量。由于公式化问题是一个整数线性规划问题,被称为NP-hard,我们提出了一种具有较低复杂性和接近最优性能的流量感知的网络内聚合(TAPINA)算法。TAPINA按照多个租户的预期流量顺序决定聚合点,并重用其他租户已经选择的聚合点,以降低总体部署成本。仿真结果表明,TAPINA表现出近乎最佳的性能,在大多数情况下,与最先进的算法相比,可实现高达20%的流量减少。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信