IEEE Transactions on Network and Service Management最新文献

筛选
英文 中文
GreenShield: Optimizing Firewall Configuration for Sustainable Networks 绿盾为可持续网络优化防火墙配置
IF 4.7 2区 计算机科学
IEEE Transactions on Network and Service Management Pub Date : 2024-08-30 DOI: 10.1109/TNSM.2024.3452150
Daniele Bringhenti;Fulvio Valenza
{"title":"GreenShield: Optimizing Firewall Configuration for Sustainable Networks","authors":"Daniele Bringhenti;Fulvio Valenza","doi":"10.1109/TNSM.2024.3452150","DOIUrl":"10.1109/TNSM.2024.3452150","url":null,"abstract":"Sustainability is an increasingly critical design feature for modern computer networks. However, green objectives related to energy savings are affected by the application of approximate cybersecurity management techniques. In particular, their impact is evident in distributed firewall configuration, where traditional manual approaches create redundant architectures, leading to avoidable power consumption. This issue has not been addressed by the approaches proposed in literature to automate firewall configuration so far, because their optimization is not focused on network sustainability. Therefore, this paper presents GreenShield as a possible solution that combines security and green-oriented optimization for firewall configuration. Specifically, GreenShield minimizes the power consumption related to firewalls activated in the network while ensuring that the security requested by the network administrator is guaranteed, and the one due to traffic processing by making firewalls to block undesired traffic as near as possible to the sources. The framework implementing GreenShield has undergone experimental tests to assess the provided optimization and its scalability performance.","PeriodicalId":13423,"journal":{"name":"IEEE Transactions on Network and Service Management","volume":"21 6","pages":"6909-6923"},"PeriodicalIF":4.7,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10660559","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
OptCDU: Optimizing the Computing Data Unit Size for COIN OptCDU:优化 COIN 计算数据单元大小
IF 4.7 2区 计算机科学
IEEE Transactions on Network and Service Management Pub Date : 2024-08-30 DOI: 10.1109/TNSM.2024.3452485
Huanzhuo Wu;Jia He;Jiakang Weng;Giang T. Nguyen;Martin Reisslein;Frank H. P. Fitzek
{"title":"OptCDU: Optimizing the Computing Data Unit Size for COIN","authors":"Huanzhuo Wu;Jia He;Jiakang Weng;Giang T. Nguyen;Martin Reisslein;Frank H. P. Fitzek","doi":"10.1109/TNSM.2024.3452485","DOIUrl":"10.1109/TNSM.2024.3452485","url":null,"abstract":"Computing in the Network (COIN) has the potential to reduce the data traffic and thus the end-to-end latencies for data-rich services. Existing COIN studies have neglected the impact of the size of the data unit that the network nodes compute on. However, similar to the impact of the protocol data unit (packet) size in conventional store-and-forward packet-switching networks, the Computing Data Unit (CDU) size is an elementary parameter that strongly influences the COIN dynamics. We model the end-to-end service time consisting of the network transport delays (for data transmission and link propagation), the loading delays of the data into the computing units, and the computing delays in the network nodes. We derive the optimal CDU size that minimizes the end-to-end service time with gradient descent. We evaluate the impact of the CDU sizing on the amount of data transmitted over the network links and the end-to-end service time for computing the convolutional neural network (CNN) based Yoho and a Deep Neural Network (DNN) based Multi-Layer Perceptron (MLP). We distribute the Yoho and MLP neural modules over up to five network nodes. Our emulation evaluations indicate that COIN strongly reduces the amount of network traffic after the first few computing nodes. Also, the CDU size optimization has a strong impact on the end-to-end service time; whereby, CDU sizes that are too small or too large can double the service time. Our emulations validate that our gradient descent minimization correctly identifies the optimal CDU size.","PeriodicalId":13423,"journal":{"name":"IEEE Transactions on Network and Service Management","volume":"21 6","pages":"6095-6111"},"PeriodicalIF":4.7,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multipartite Entanglement Distribution in the Quantum Internet: Knowing When to Stop! 量子互联网中的多方纠缠分发:知止!
IF 4.7 2区 计算机科学
IEEE Transactions on Network and Service Management Pub Date : 2024-08-30 DOI: 10.1109/TNSM.2024.3452326
Angela Sara Cacciapuoti;Jessica Illiano;Michele Viscardi;Marcello Caleffi
{"title":"Multipartite Entanglement Distribution in the Quantum Internet: Knowing When to Stop!","authors":"Angela Sara Cacciapuoti;Jessica Illiano;Michele Viscardi;Marcello Caleffi","doi":"10.1109/TNSM.2024.3452326","DOIUrl":"10.1109/TNSM.2024.3452326","url":null,"abstract":"Multipartite entanglement distribution is a key functionality of the Quantum Internet. However, quantum entanglement is very fragile, easily degraded by decoherence, which strictly constraints the time horizon within the distribution has to be completed. This, coupled with the quantum noise irremediably impinging on the channels utilized for entanglement distribution, may imply the need to attempt the distribution process multiple times before the targeted network nodes successfully share the desired entangled state. And there is no guarantee that this is accomplished within the time horizon dictated by the coherence times. As a consequence, in noisy scenarios requiring multiple distribution attempts, it may be convenient to stop the distribution process early. In this paper, we take steps in the direction of knowing when to stop the entanglement distribution by developing a theoretical framework, able to capture the quantum noise effects. Specifically, we first prove that the entanglement distribution process can be modeled as a Markov decision process. Then, we prove that the optimal decision policy exhibits attractive features, which we exploit to reduce the computational complexity. The developed framework provides quantum network designers with flexible tools to optimally engineer the design parameters of the entanglement distribution process.","PeriodicalId":13423,"journal":{"name":"IEEE Transactions on Network and Service Management","volume":"21 6","pages":"6041-6058"},"PeriodicalIF":4.7,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10660502","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fairness-Aware VNF Mapping and Scheduling in Satellite Edge Networks for Mission-Critical Applications 面向关键任务应用的卫星边缘网络中公平感知的 VNF 映射和调度
IF 4.7 2区 计算机科学
IEEE Transactions on Network and Service Management Pub Date : 2024-08-29 DOI: 10.1109/TNSM.2024.3452031
Haftay Gebreslasie Abreha;Houcine Chougrani;Ilora Maity;Youssouf Drif;Christos Politis;Symeon Chatzinotas
{"title":"Fairness-Aware VNF Mapping and Scheduling in Satellite Edge Networks for Mission-Critical Applications","authors":"Haftay Gebreslasie Abreha;Houcine Chougrani;Ilora Maity;Youssouf Drif;Christos Politis;Symeon Chatzinotas","doi":"10.1109/TNSM.2024.3452031","DOIUrl":"10.1109/TNSM.2024.3452031","url":null,"abstract":"Satellite Edge Computing (SEC) is seen as a promising solution for deploying network functions in orbit to provide ubiquitous services with low latency and bandwidth. Software Defined Networks (SDN) and Network Function Virtualization (NFV) enable SEC to manage and deploy services more flexibly. In this paper, we study a dynamic and topology-aware VNF mapping and scheduling strategy within an SDN/NFV-enabled SEC infrastructure. Our focus is on meeting the stringent requirements of mission-critical (MC) applications, recognizing their significance in both satellite-to-satellite and edge-to-satellite communications while ensuring service delay margin fairness across various time-sensitive service requests. We formulate the VNF mapping and scheduling problem as an Integer Nonlinear Programming problem (\u0000<monospace>INLP</monospace>\u0000), with the objective of \u0000<italic>minimax</i>\u0000 fairness among specified requests while considering dynamic satellite network topology, traffic, and resource constraints. We then propose two algorithms for solving the \u0000<monospace>INLP</monospace>\u0000 problem: Fairness-Aware Greedy Algorithm for Dynamic VNF Mapping and Scheduling (\u0000<monospace>FAGD_MASC</monospace>\u0000) and Fairness-Aware Simulated Annealing-Based Algorithm for Dynamic VNF Mapping and Scheduling (\u0000<monospace>FASD_MASC</monospace>\u0000) which are suitable for low and high service arrival rates, respectively. Our extensive simulations demonstrate that both \u0000<monospace>FAGD_MASC</monospace>\u0000 and \u0000<monospace>FASD_MASC</monospace>\u0000 approaches are very close to the optimization-based solution and outperform the benchmark solution in terms of service acceptance rates.","PeriodicalId":13423,"journal":{"name":"IEEE Transactions on Network and Service Management","volume":"21 6","pages":"6716-6730"},"PeriodicalIF":4.7,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10659145","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data Aggregation Management With Self-Sovereign Identity in Decentralized Networks 去中心化网络中的自主身份数据聚合管理
IF 4.7 2区 计算机科学
IEEE Transactions on Network and Service Management Pub Date : 2024-08-29 DOI: 10.1109/TNSM.2024.3451995
Yepeng Ding;Junwei Yu;Shaowen Li;Hiroyuki Sato;Maro G. Machizawa
{"title":"Data Aggregation Management With Self-Sovereign Identity in Decentralized Networks","authors":"Yepeng Ding;Junwei Yu;Shaowen Li;Hiroyuki Sato;Maro G. Machizawa","doi":"10.1109/TNSM.2024.3451995","DOIUrl":"10.1109/TNSM.2024.3451995","url":null,"abstract":"Data aggregation management is paramount in data-driven distributed systems. Conventional solutions premised on centralized networks grapple with security challenges concerning authenticity, confidentiality, integrity, and privacy. Recently, distributed ledger technology has gained popularity for its decentralized nature to facilitate overcoming these challenges. Nevertheless, insufficient identity management introduces risks like impersonation and unauthorized access. In this paper, we propose Degator, a data aggregation management framework that leverages self-sovereign identity and functions in decentralized networks to address security concerns and mitigate identity-related risks. We formulate fully decentralized aggregation protocols for data persistence and acquisition in Degator. Degator is compatible with existing data persistence methods, and supports cost-effective data acquisition minimizing dependency on distributed ledgers. We also conduct a formal analysis to elucidate the mechanism of Degator to tackle current security challenges in conventional data aggregation management. Furthermore, we showcase the applicability of Degator through its application in the management of decentralized neuroscience data aggregation and demonstrate its scalability via performance evaluation.","PeriodicalId":13423,"journal":{"name":"IEEE Transactions on Network and Service Management","volume":"21 6","pages":"6174-6189"},"PeriodicalIF":4.7,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10659216","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Is There a DDoS?: System+Application Variable Monitoring to Ascertain the Attack Presence 有 DDoS 吗?系统+应用变量监控以确定攻击是否存在
IF 4.7 2区 计算机科学
IEEE Transactions on Network and Service Management Pub Date : 2024-08-29 DOI: 10.1109/TNSM.2024.3451613
Gunjan Kumar Saini;Gaurav Somani
{"title":"Is There a DDoS?: System+Application Variable Monitoring to Ascertain the Attack Presence","authors":"Gunjan Kumar Saini;Gaurav Somani","doi":"10.1109/TNSM.2024.3451613","DOIUrl":"10.1109/TNSM.2024.3451613","url":null,"abstract":"The state of the art has numerous contributions which focus on combating the DDoS attacks. We argue that the mitigation methods are only useful if the victim service or the mitigation method can ascertain the presence of a DDoS attack. In many of the past solutions, the authors decide the presence of DDoS using quick and dirty checks. However, precise mechanisms are still needed so that the accurate decisions about DDoS mitigation can be made. In this work, we propose a method for detecting the presence of DDoS attacks using system variables available at the server or victim server operating system. To achieve this, we propose a machine learning based detection model in which there are three steps involved. In the first step, we monitored 14 different systems and application variables/ characteristics with and without a variety of DDoS attacks. In the second step, we trained machine learning model with monitored data of all the selected variables. In the final step, our approach uses the artificial neural network (ANN) and random forest (RF) based approaches to detect the presence of DDoS attacks. Our presence identification approach gives a detection accuracy of 88%-95% for massive attacks, 65%-77% for mixed traffic having a mixture of low-rate attack and benign requests, 58%-60% for flashcrowd, 76%-81% for mixed traffic having a mixture of massive attack and benign traffic and 58%-64% for low rate attacks with a detection time of 4-5 seconds.","PeriodicalId":13423,"journal":{"name":"IEEE Transactions on Network and Service Management","volume":"21 6","pages":"6899-6908"},"PeriodicalIF":4.7,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
QoE Estimation Across Different Cloud Gaming Services Using Transfer Learning 利用迁移学习估计不同云游戏服务的 QoE
IF 4.7 2区 计算机科学
IEEE Transactions on Network and Service Management Pub Date : 2024-08-28 DOI: 10.1109/TNSM.2024.3451300
Marcos Carvalho;Daniel Soares;Daniel Fernandes Macedo
{"title":"QoE Estimation Across Different Cloud Gaming Services Using Transfer Learning","authors":"Marcos Carvalho;Daniel Soares;Daniel Fernandes Macedo","doi":"10.1109/TNSM.2024.3451300","DOIUrl":"10.1109/TNSM.2024.3451300","url":null,"abstract":"Cloud Gaming (CG) has become one of the most important cloud-based services in recent years by providing games to different end-network devices, such as personal computers (wired network) and smartphones/tablets (mobile network). CG services stand challenging for network operators since this service demands rigorous network Quality of Services (QoS). Nevertheless, ensuring proper Quality of Experience (QoE) keeps the end-users engaged in the CG services. However, several factors influence users’ experience, such as context (i.e., game type/players) and the end-network type (wired/mobile). In this case, Machine Learning (ML) models have achieved the state-of-the-art on the end-users’ QoE estimation. Despite that, traditional ML models demand a larger amount of data and assume that the training and test have the same distribution, which can make the ML models hard to generalize to other scenarios from what was trained. This work employs Transfer Learning (TL) techniques to create QoE estimation over different cloud gaming services (wired/mobile) and contexts (game type/players). We improved our previous work by performing a subjective QoE assessment with real users playing new games on a mobile cloud gaming testbed. Results show that transfer learning can decrease the average MSE error by at least 34.7% compared to the source model (wired) performance on the mobile cloud gaming and to 81.5% compared with the model trained from scratch.","PeriodicalId":13423,"journal":{"name":"IEEE Transactions on Network and Service Management","volume":"21 6","pages":"5935-5946"},"PeriodicalIF":4.7,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Energy Efficient UAV-Assisted IoT Data Collection: A Graph-Based Deep Reinforcement Learning Approach 高能效无人机辅助物联网数据采集:基于图的深度强化学习方法
IF 4.7 2区 计算机科学
IEEE Transactions on Network and Service Management Pub Date : 2024-08-28 DOI: 10.1109/TNSM.2024.3450964
Qianqian Wu;Qiang Liu;Wenliang Zhu;Zefan Wu
{"title":"Energy Efficient UAV-Assisted IoT Data Collection: A Graph-Based Deep Reinforcement Learning Approach","authors":"Qianqian Wu;Qiang Liu;Wenliang Zhu;Zefan Wu","doi":"10.1109/TNSM.2024.3450964","DOIUrl":"10.1109/TNSM.2024.3450964","url":null,"abstract":"With the advancements in technologies such as 5G, Unmanned Aerial Vehicles (UAVs) have exhibited their potential in various application scenarios, including wireless coverage, search operations, and disaster response. In this paper, we consider the utilization of a group of UAVs as aerial base stations (BS) to collect data from IoT sensor devices. The objective is to maximize the volume of collected data while simultaneously enhancing the geographical fairness among these points of interest, all within the constraints of limited energy resources. Therefore, we propose a deep reinforcement learning (DRL) method based on Graph Attention Networks (GAT), referred to as “GADRL”. GADRL utilizes graph convolutional neural networks to extract spatial correlations among multiple UAVs and makes decisions in a distributed manner under the guidance of DRL. Furthermore, we employ Long Short-Term Memory to establish memory units for storing and utilizing historical information. Numerical results demonstrate that GADRL consistently outperforms four baseline methods, validating its computational efficiency.","PeriodicalId":13423,"journal":{"name":"IEEE Transactions on Network and Service Management","volume":"21 6","pages":"6082-6094"},"PeriodicalIF":4.7,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distributed Learning Framework for eMBB-URLLC Multiplexing in Open Radio Access Networks 开放无线接入网络中 eMBB-URLLC 复用的分布式学习框架
IF 4.7 2区 计算机科学
IEEE Transactions on Network and Service Management Pub Date : 2024-08-28 DOI: 10.1109/TNSM.2024.3451295
Madyan Alsenwi;Eva Lagunas;Symeon Chatzinotas
{"title":"Distributed Learning Framework for eMBB-URLLC Multiplexing in Open Radio Access Networks","authors":"Madyan Alsenwi;Eva Lagunas;Symeon Chatzinotas","doi":"10.1109/TNSM.2024.3451295","DOIUrl":"10.1109/TNSM.2024.3451295","url":null,"abstract":"Next-generation (NextG) cellular networks are expected to evolve towards virtualization and openness, incorporating reprogrammable components that facilitate intelligence and real-time analytics. This paper builds on these innovations to address the network slicing problem in multi-cell open radio access wireless networks, focusing on two key services: enhanced Mobile BroadBand (eMBB) and Ultra-Reliable Low Latency Communications (URLLC). A stochastic resource allocation problem is formulated with the goal of balancing the average eMBB data rate and its variance, while ensuring URLLC constraints. A distributed learning framework based on the Deep Reinforcement Learning (DRL) technique is developed following the Open Radio Access Networks (O-RAN) architectures to solve the formulated optimization problem. The proposed learning approach enables training a global machine learning model at a central cloud server and sharing it with edge servers for executions. Specifically, deep learning agents are distributed at network edge servers and embedded within the Near-Real-Time Radio access network Intelligent Controller (Near-RT RIC) to collect network information and perform online executions. A global deep learning model is trained by a central training engine embedded within the Non-Real-Time RIC (Non-RT RIC) at the central server using received data from edge servers. The performed simulation results validate the efficacy of the proposed algorithm in achieving URLLC constraints while maintaining the eMBB Quality of Service (QoS).","PeriodicalId":13423,"journal":{"name":"IEEE Transactions on Network and Service Management","volume":"21 5","pages":"5718-5732"},"PeriodicalIF":4.7,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamic Flow Scheduling for DNN Training Workloads in Data Centers 数据中心 DNN 训练工作负载的动态流量调度
IF 4.7 2区 计算机科学
IEEE Transactions on Network and Service Management Pub Date : 2024-08-27 DOI: 10.1109/TNSM.2024.3450670
Xiaoyang Zhao;Chuan Wu;Xia Zhu
{"title":"Dynamic Flow Scheduling for DNN Training Workloads in Data Centers","authors":"Xiaoyang Zhao;Chuan Wu;Xia Zhu","doi":"10.1109/TNSM.2024.3450670","DOIUrl":"10.1109/TNSM.2024.3450670","url":null,"abstract":"Distributed deep learning (DL) training constitutes a significant portion of workloads in modern data centers that are equipped with high computational capacities, such as GPU servers. However, frequent tensor exchanges among workers during distributed deep neural network (DNN) training can result in heavy traffic in the data center network, leading to congestion at server NICs and in the switching network. Unfortunately, none of the existing DL communication libraries support active flow control to optimize tensor transmission performance, instead relying on passive adjustments to the congestion window or sending rate based on packet loss or delay. To address this issue, we propose a flow scheduler per host that dynamically tunes the sending rates of outgoing tensor flows from each server, maximizing network bandwidth utilization and expediting job training progress. Our scheduler comprises two main components: a monitoring module that interacts with state-of-the-art communication libraries supporting parameter server and all-reduce paradigms to track the training progress of DNN jobs, and a congestion control protocol that receives in-network feedback from traversing switches and computes optimized flow sending rates. For data centers where switches are not programmable, we provide a software solution that emulates switch behavior and interacts with the scheduler on servers. Experiments with real-world GPU testbed and trace-driven simulation demonstrate that our scheduler outperforms common rate control protocols and representative learning-based schemes in various settings.","PeriodicalId":13423,"journal":{"name":"IEEE Transactions on Network and Service Management","volume":"21 6","pages":"6643-6657"},"PeriodicalIF":4.7,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信