Distributed machine learning based link allocation strategy *

Yi Yang, Mingkang Song, Jianming Zhou, Peng Dai, Tenghui Ke, Weidong Li, Zhengguan Wu, Xiayan Zheng, Xijin Li
{"title":"Distributed machine learning based link allocation strategy *","authors":"Yi Yang, Mingkang Song, Jianming Zhou, Peng Dai, Tenghui Ke, Weidong Li, Zhengguan Wu, Xiayan Zheng, Xijin Li","doi":"10.1109/ICSS55994.2022.00044","DOIUrl":null,"url":null,"abstract":"In the field of machine learning, a machine learning system with multiple nodes is usually used, and each node is used to perform a machine learning distributed training process for a part of the data that is allocated to it and provide a server by performing the machine learning distributed training process. The obtained training result, its machine learning data needs to be transmitted through the network. This paper proposes a link allocation method for distributed machine learning. For machine learning computing nodes distributed across domains, due to inconsistencies in link distance, node performance, and link load, the traffic distribution between computing nodes is unbalanced. Aiming at the complex computing requirements of distributed machine learning, a link pre-allocation method is proposed, which establishes a central server-link-node topology map, integrates link resources, and determines the logical distance of nodes. For the synchronously distributed machine learning training set, preallocate transmission link resources and initiate transmission according to the remaining storage capacity of nodes. In order to improve the network utilization efficiency in the process of machine learning, it can break through the influence of large network transmission delay on the efficiency of distributed machine learning.","PeriodicalId":327964,"journal":{"name":"2022 International Conference on Service Science (ICSS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Service Science (ICSS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSS55994.2022.00044","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In the field of machine learning, a machine learning system with multiple nodes is usually used, and each node is used to perform a machine learning distributed training process for a part of the data that is allocated to it and provide a server by performing the machine learning distributed training process. The obtained training result, its machine learning data needs to be transmitted through the network. This paper proposes a link allocation method for distributed machine learning. For machine learning computing nodes distributed across domains, due to inconsistencies in link distance, node performance, and link load, the traffic distribution between computing nodes is unbalanced. Aiming at the complex computing requirements of distributed machine learning, a link pre-allocation method is proposed, which establishes a central server-link-node topology map, integrates link resources, and determines the logical distance of nodes. For the synchronously distributed machine learning training set, preallocate transmission link resources and initiate transmission according to the remaining storage capacity of nodes. In order to improve the network utilization efficiency in the process of machine learning, it can break through the influence of large network transmission delay on the efficiency of distributed machine learning.
基于分布式机器学习的链路分配策略*
在机器学习领域中,通常使用具有多个节点的机器学习系统,每个节点对分配给它的一部分数据执行机器学习分布式训练过程,并通过执行机器学习分布式训练过程提供服务器。得到的训练结果,其机器学习数据需要通过网络进行传输。提出了一种用于分布式机器学习的链路分配方法。对于跨域分布的机器学习计算节点,由于链路距离、节点性能和链路负载的不一致,导致计算节点之间的流量分配不均衡。针对分布式机器学习复杂的计算需求,提出了一种链路预分配方法,该方法建立中央服务器-链路-节点拓扑图,整合链路资源,确定节点之间的逻辑距离。对于同步分布式的机器学习训练集,根据节点的剩余存储容量,预先分配传输链路资源,并启动传输。为了提高机器学习过程中的网络利用效率,可以突破大网络传输延迟对分布式机器学习效率的影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信