Software-defined cloud–optical networks for long-haul geographically distributed machine learning

IF 4 2区 计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Meng Lian;Yongli Zhao;Yike Jiang;Tingting Bao;Yuan Cao;Jie Zhang
{"title":"Software-defined cloud–optical networks for long-haul geographically distributed machine learning","authors":"Meng Lian;Yongli Zhao;Yike Jiang;Tingting Bao;Yuan Cao;Jie Zhang","doi":"10.1364/JOCN.553555","DOIUrl":null,"url":null,"abstract":"Optical networks enable long-haul geographically distributed machine learning (GDML) by connecting multiple data centers (DCs), offering a solution to overcome limitations of single DC-based training for large models. However, effective coordination is hindered by limited resource sharing among cloud and network entities. In this work, we propose an architecture of a software-defined cloud–optical network (SD-CON). Domain controllers of SD-CON jointly abstract cloud and network resources, while a hyper-domain controller establishes cloud–network service function chains (CN-SFCs) to enhance the cloud–network collaboration. Additionally, we introduce the task scheduling algorithm with a multi-candidate parameter server (MPS) to optimize the CN-SFCs. A 1000 km GDML experiment on the China Environment for Network Innovation demonstrates rapid allocation of cloud and network resources (<tex>${\\sim}{5.7}\\;{\\rm s}$</tex> latency) in SD-CON, improving task success rates (over 23.111%) and enhancing resource utilization compared with the baselines.","PeriodicalId":50103,"journal":{"name":"Journal of Optical Communications and Networking","volume":"17 5","pages":"363-377"},"PeriodicalIF":4.0000,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Optical Communications and Networking","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10955384/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

Optical networks enable long-haul geographically distributed machine learning (GDML) by connecting multiple data centers (DCs), offering a solution to overcome limitations of single DC-based training for large models. However, effective coordination is hindered by limited resource sharing among cloud and network entities. In this work, we propose an architecture of a software-defined cloud–optical network (SD-CON). Domain controllers of SD-CON jointly abstract cloud and network resources, while a hyper-domain controller establishes cloud–network service function chains (CN-SFCs) to enhance the cloud–network collaboration. Additionally, we introduce the task scheduling algorithm with a multi-candidate parameter server (MPS) to optimize the CN-SFCs. A 1000 km GDML experiment on the China Environment for Network Innovation demonstrates rapid allocation of cloud and network resources (${\sim}{5.7}\;{\rm s}$ latency) in SD-CON, improving task success rates (over 23.111%) and enhancing resource utilization compared with the baselines.
用于远距离地理分布式机器学习的软件定义云光网络
光网络通过连接多个数据中心(dc)实现远程地理分布式机器学习(GDML),为克服大型模型基于单个dc训练的局限性提供了一种解决方案。然而,云和网络实体之间有限的资源共享阻碍了有效的协调。在这项工作中,我们提出了一种软件定义云光网络(SD-CON)的架构。SD-CON的域控制器共同抽象云和网络资源,超域控制器建立云-网络服务功能链(cn - sfc),增强云-网络协同。此外,我们还引入了多候选参数服务器(MPS)的任务调度算法来优化cn - sfc。在中国网络创新环境下进行的1000公里GDML实验表明,SD-CON能够快速分配云和网络资源(${\sim}{5.7}\;{\rm s}$ latency),与基线相比,提高了任务成功率(超过23.111%),提高了资源利用率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
9.40
自引率
16.00%
发文量
104
审稿时长
4 months
期刊介绍: The scope of the Journal includes advances in the state-of-the-art of optical networking science, technology, and engineering. Both theoretical contributions (including new techniques, concepts, analyses, and economic studies) and practical contributions (including optical networking experiments, prototypes, and new applications) are encouraged. Subareas of interest include the architecture and design of optical networks, optical network survivability and security, software-defined optical networking, elastic optical networks, data and control plane advances, network management related innovation, and optical access networks. Enabling technologies and their applications are suitable topics only if the results are shown to directly impact optical networking beyond simple point-to-point networks.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信