{"title":"Diagnosing End-Host Network Bottlenecks in RDMA Servers","authors":"Kefei Liu;Jiao Zhang;Zhuo Jiang;Haoran Wei;Xiaolong Zhong;Lizhuang Tan;Tian Pan;Tao Huang","doi":"10.1109/TNET.2024.3416419","DOIUrl":"10.1109/TNET.2024.3416419","url":null,"abstract":"In RDMA (Remote Direct Memory Access) networks, end-host networks, including intra-host networks and RNICs (RDMA NIC), were considered robust and have received little attention. However, as the RNIC line rate rapidly increases to multi-hundred gigabits, the intra-host network becomes a potential performance bottleneck for network applications. Intra-host network bottlenecks can result in degraded intra-host bandwidth and increased intra-host latency. In addition, RNIC network problems can result in connection failures and packet drops. Host network problems can severely degrade network performance. However, when host network problems occur, they can hardly be noticed due to the lack of a monitoring system. Furthermore, existing diagnostic mechanisms cannot efficiently diagnose host network problems. In this paper, we analyze the symptom of host network problems based on our long-term troubleshooting experience and propose Hostping, the first monitoring and diagnostic system dedicated to host networks. The core idea of Hostping is to conduct 1) loopback tests between RNICs and endpoints within the host to measure intra-host latency and bandwidth, and 2) mutual probing between RNICs on a host to measure RNIC connectivity. We have deployed Hostping on thousands of servers in our distributed machine learning system. Not only can Hostping detect and diagnose host network problems we already knew in minutes, but it also reveals eight problems we did not notice before.","PeriodicalId":13443,"journal":{"name":"IEEE/ACM Transactions on Networking","volume":"32 5","pages":"4302-4316"},"PeriodicalIF":3.0,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141720149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Distributional Reinforcement Learning-Based Adaptive Routing With Guaranteed Delay Bounds","authors":"Jianmin Liu;Dan Li;Yongjun Xu","doi":"10.1109/TNET.2024.3425652","DOIUrl":"10.1109/TNET.2024.3425652","url":null,"abstract":"Real-time applications that require timely data delivery over wireless multi-hop networks within specified deadlines are growing increasingly. Effective routing protocols that can guarantee real-time QoS are crucial, yet challenging, due to the unpredictable variations in end-to-end delay caused by unreliable wireless channels. In such conditions, the upper bound on the end-to-end delay, i.e., worst-case end-to-end delay, should be guaranteed within the deadline. However, existing routing protocols with guaranteed delay bounds cannot strictly guarantee real-time QoS because they assume that the worst-case end-to-end delay is known and ignore the impact of routing policies on the worst-case end-to-end delay determination. In this paper, we relax this assumption and propose DDRL-ARGB, an Adaptive Routing with Guaranteed delay Bounds using Deep Distributional Reinforcement Learning (DDRL). DDRL-ARGB adopts DDRL to jointly determine the worst-case end-to-end delay and learn routing policies. To accurately determine worst-case end-to-end delay, DDRL-ARGB employs a quantile regression deep Q-network to learn the end-to-end delay cumulative distribution. To guarantee real-time QoS, DDRL-ARGB optimizes routing decisions under the constraint of worst-case end-to-end delay within the deadline. To improve traffic congestion, DDRL-ARGB considers the network congestion status when making routing decisions. Extensive results show that DDRL-ARGB can accurately calculate worst-case end-to-end delay, and can strictly guarantee real-time QoS under a small tolerant violation probability against two state-of-the-art routing protocols.","PeriodicalId":13443,"journal":{"name":"IEEE/ACM Transactions on Networking","volume":"32 6","pages":"4692-4706"},"PeriodicalIF":3.0,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141720152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"How Valuable is Your Data? Optimizing Client Recruitment in Federated Learning","authors":"Yichen Ruan;Xiaoxi Zhang;Carlee Joe-Wong","doi":"10.1109/TNET.2024.3422264","DOIUrl":"10.1109/TNET.2024.3422264","url":null,"abstract":"Federated learning allows distributed clients to train a shared machine learning model while preserving user privacy. In this framework, user devices (i.e., clients) perform local iterations of the learning algorithm on their data. These updates are periodically aggregated to form a shared model. Thus, a client represents the bundle of the user data, the device, and the user’s willingness to participate: since participating in federated learning requires clients to expend resources and reveal some information about their data, users may require some form of compensation to contribute to the training process. Recruiting more users generally results in higher accuracy, but slower completion time and higher cost. We propose the first work to theoretically analyze the resulting performance tradeoffs in deciding which clients to recruit for the federated learning algorithm. Our framework accounts for both accuracy (training and testing) and efficiency (completion time and cost) metrics. We provide solutions to this NP-Hard optimization problem and verify the value of client recruitment in experiments on synthetic and real-world data. The results of this work can serve as a guideline for the real-world deployment of federated learning and an initial investigation of the client recruitment problem.","PeriodicalId":13443,"journal":{"name":"IEEE/ACM Transactions on Networking","volume":"32 5","pages":"4207-4221"},"PeriodicalIF":3.0,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141720155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Asynchronous Decentralized Federated Learning for Heterogeneous Devices","authors":"Yunming Liao;Yang Xu;Hongli Xu;Min Chen;Lun Wang;Chunming Qiao","doi":"10.1109/TNET.2024.3424444","DOIUrl":"10.1109/TNET.2024.3424444","url":null,"abstract":"Data generated at the network edge can be processed locally by leveraging the emerging technology of Federated Learning (FL). However, non-IID local data will lead to degradation of model accuracy and the heterogeneity of edge nodes inevitably slows down model training efficiency. Moreover, to avoid the potential communication bottleneck in the parameter-server-based FL, we concentrate on the Decentralized Federated Learning (DFL) that performs distributed model training in Peer-to-Peer (P2P) manner. To address these challenges, we propose an asynchronous DFL system by incorporating neighbor selection and gradient push, termed AsyDFL. Specifically, we require each edge node to push gradients only to a subset of neighbors for resource efficiency. Herein, we first give a theoretical convergence analysis of AsyDFL under the complicated non-IID and heterogeneous scenario, and further design a priority-based algorithm to dynamically select neighbors for each edge node so as to achieve the trade-off between communication cost and model performance. We evaluate the performance of AsyDFL through extensive experiments on a physical platform with 30 NVIDIA Jetson edge devices. Evaluation results show that AsyDFL can reduce the communication cost by 57% and the completion time by about 35% for achieving the same test accuracy, and improve model accuracy by at least 6% under the non-IID scenario, compared to the baselines.","PeriodicalId":13443,"journal":{"name":"IEEE/ACM Transactions on Networking","volume":"32 5","pages":"4535-4550"},"PeriodicalIF":3.0,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141722383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tung-Anh Nguyen;Long Tan Le;Tuan Dung Nguyen;Wei Bao;Suranga Seneviratne;Choong Seon Hong;Nguyen H. Tran
{"title":"Federated PCA on Grassmann Manifold for IoT Anomaly Detection","authors":"Tung-Anh Nguyen;Long Tan Le;Tuan Dung Nguyen;Wei Bao;Suranga Seneviratne;Choong Seon Hong;Nguyen H. Tran","doi":"10.1109/TNET.2024.3423780","DOIUrl":"10.1109/TNET.2024.3423780","url":null,"abstract":"With the proliferation of the Internet of Things (IoT) and the rising interconnectedness of devices, network security faces significant challenges, especially from anomalous activities. While traditional machine learning-based intrusion detection systems (ML-IDS) effectively employ supervised learning methods, they possess limitations such as the requirement for labeled data and challenges with high dimensionality. Recent unsupervised ML-IDS approaches such as AutoEncoders and Generative Adversarial Networks (GAN) offer alternative solutions but pose challenges in deployment onto resource-constrained IoT devices and in interpretability. To address these concerns, this paper proposes a novel federated unsupervised anomaly detection framework – FedPCA – that leverages Principal Component Analysis (PCA) and the Alternating Directions Method Multipliers (ADMM) to learn common representations of distributed non-i.i.d. datasets. Building on the FedPCA framework, we propose two algorithms, FedPE in Euclidean space and FedPG on Grassmann manifolds. Our approach enables real-time threat detection and mitigation at the device level, enhancing network resilience while ensuring privacy. Moreover, the proposed algorithms are accompanied by theoretical convergence rates even under a sub-sampling scheme, a novel result. Experimental results on the UNSW-NB15 and TON-IoT datasets show that our proposed methods offer performance in anomaly detection comparable to non-linear baselines, while providing significant improvements in communication and memory efficiency, underscoring their potential for securing IoT networks.","PeriodicalId":13443,"journal":{"name":"IEEE/ACM Transactions on Networking","volume":"32 5","pages":"4456-4471"},"PeriodicalIF":3.0,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141609172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Coalition Formation-Based Sub-Channel Allocation in Full-Duplex-Enabled mmWave IABN With D2D","authors":"Zhongyu Ma;Yajing Wang;Zijun Wang;Guangjie Han;Zhanjun Hao;Qun Guo","doi":"10.1109/TNET.2024.3423775","DOIUrl":"10.1109/TNET.2024.3423775","url":null,"abstract":"One of the key techniques for future wireless network is full-duplex-enabled millimeter wave integrated access and backhaul network underlaying device-to-device communication, which is a 3GPP-inspired comprehensive paradigm for higher spectral efficiency and lower latency. However, the multi-user interference (MUI) and residual self-interference (RSI) become the major bottleneck before the commercial application of the system. To this end, we investigate the sub-channel allocation problem for this networking paradigm. To maximize the overall achievable rate under the considerations of MUI and RSI, the sub-channel allocation problem is firstly formulated as an integer nonlinear programming problem, which is intractable to search an optimal solution in polynomial time. Secondly, a coalition formation based sub-channel allocation (CFSA) algorithm is proposed, where the final partition of the sub-channel coalition is iteratively formed by the concurrent link players according to the two defined switching criterions. Thirdly, the properties of the proposed CFSA algorithm are analyzed from the perspectives of Nash stability and uniform convergence. Fourthly, the proposed CFSA algorithm is compared with other reference algorithms through abundant simulations, and superiorities including effectiveness, convergence and sub-optimality of the proposed CFSA algorithm are demonstrated through the kernel indicators.","PeriodicalId":13443,"journal":{"name":"IEEE/ACM Transactions on Networking","volume":"32 5","pages":"4503-4518"},"PeriodicalIF":3.0,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141574631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Toward Aggregated Payment Channel Networks","authors":"Xiaoxue Zhang;Chen Qian","doi":"10.1109/TNET.2024.3423000","DOIUrl":"10.1109/TNET.2024.3423000","url":null,"abstract":"Payment channel networks (PCNs) have been designed and utilized to address the scalability challenge and throughput limitation of blockchains. It provides a high-throughput solution for blockchain-based payment systems. However, such “layer-2” blockchain solutions have their own problems: payment channels require a separate deposit for each channel of two users. Thus it significantly locks funds from users into particular channels without the flexibility of moving these funds across channels. In this paper, we proposed Aggregated Payment Channel Network (APCN), in which flexible funds are used as a per-user basis instead of a per-channel basis. To prevent users from misbehaving such as double-spending, APCN includes mechanisms that make use of hardware trusted execution environments (TEEs) to control funds, balances, and payments. The distributed routing protocol in APCN also addresses the congestion problem to further improve resource utilization. Our prototype implementation and simulation results show that APCN achieves significant improvements on transaction success ratio with low routing latency, compared to even the most advanced PCN routing.","PeriodicalId":13443,"journal":{"name":"IEEE/ACM Transactions on Networking","volume":"32 5","pages":"4333-4348"},"PeriodicalIF":3.0,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141574632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Balázs Vass;Erika R. Bérczi-Kovács;Ádám Fraknói;Costin Raiciu;Gábor Rétvári
{"title":"Charting the Complexity Landscape of Compiling Packet Programs to Reconfigurable Switches","authors":"Balázs Vass;Erika R. Bérczi-Kovács;Ádám Fraknói;Costin Raiciu;Gábor Rétvári","doi":"10.1109/TNET.2024.3424337","DOIUrl":"10.1109/TNET.2024.3424337","url":null,"abstract":"P4 is a widely used Domain-specific Language for Programmable Data Planes. A critical step in P4 compilation is finding a feasible and efficient mapping of the high-level P4 source code constructs to the physical resources exposed by the underlying hardware, while meeting data and control flow dependencies in the program. In this paper, we take a new look at the algorithmic aspects of this problem, with the motivation to understand the fundamental theoretical limits and obtain better P4 pipeline embeddings, and to speed up practical P4 compilation times for RMT and dRMT target architectures. We report mixed results: we find that P4 compilation is computationally hard even in a severely relaxed formulation, and there is no polynomial-time approximation of arbitrary precision (unless \u0000<inline-formula> <tex-math>$mathcal {P}$ </tex-math></inline-formula>\u0000=\u0000<inline-formula> <tex-math>$mathcal {N}$ </tex-math></inline-formula>\u0000<inline-formula> <tex-math>$mathcal {P}$ </tex-math></inline-formula>\u0000), while the good news is that, despite its inherent complexity, P4 compilation is approximable in linear time with a small constant bound even for the most complex, nearly real-life models.","PeriodicalId":13443,"journal":{"name":"IEEE/ACM Transactions on Networking","volume":"32 5","pages":"4519-4534"},"PeriodicalIF":3.0,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141574633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Releasing the Power of In-Network Aggregation With Aggregator-Aware Routing Optimization","authors":"Shouxi Luo;Xiaoyu Yu;Ke Li;Huanlai Xing","doi":"10.1109/TNET.2024.3423380","DOIUrl":"10.1109/TNET.2024.3423380","url":null,"abstract":"By offloading partial of the aggregation computation from the logical central parameter servers to network devices like programmable switches, In-Network Aggregation (INA) is a general, effective, and widely used approach to reduce network load thus alleviating the communication bottlenecks suffered by large-scale distributed training. Given the fact that INA would take effects if and only if associated traffic goes through the same in-network aggregator, the key to taking advantage of INA lies in routing control. However, existing proposals fall short in doing so and thus are far from optimal, since they select routes for INA-supported traffic without comprehensively considering the characteristics, limitations, and requirements of the network environment, aggregator hardware, and distributed training jobs. To fill the gap, in this paper, we systematically establish a mathematical model to formulate i) the up-down routing constraints of Clos datacenter networks, ii) the limitations raised by modern programmable switches’ pipeline hardware structure, and iii) the various aggregator-aware routing optimization goals required by distributed training tasks under different parallelism strategies. Based on the model, we develop ARO, an Aggregator-aware Routing Optimization solution for INA-accelerated distributed training applications. To be efficient, ARO involves a suite of search space pruning designs, by using the model’s characteristics, yielding tens of times improvement in the solving time with trivial performance loss. Extensive experiments show that ARO is able to find near-optimal results for large-scale routing optimization in tens of seconds, achieving \u0000<inline-formula> <tex-math>$1.8sim 4.0times $ </tex-math></inline-formula>\u0000 higher throughput than the state-of-the-art solution.","PeriodicalId":13443,"journal":{"name":"IEEE/ACM Transactions on Networking","volume":"32 5","pages":"4488-4502"},"PeriodicalIF":3.0,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141574634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Avik Kar;Rahul Singh;Fang Liu;Xin Liu;Ness B. Shroff
{"title":"Linear Bandits With Side Observations on Networks","authors":"Avik Kar;Rahul Singh;Fang Liu;Xin Liu;Ness B. Shroff","doi":"10.1109/TNET.2024.3422323","DOIUrl":"10.1109/TNET.2024.3422323","url":null,"abstract":"We investigate linear bandits in a network setting in the presence of side-observations across nodes in order to design recommendation algorithms for users connected via social networks. Users in social networks respond to their friends’ activity and, hence, provide information about each other’s preferences. In our model, when a learning algorithm recommends an article to a user, not only does it observe her response (e.g., an ad click) but also the side-observations, i.e., the response of her neighbors if they were presented with the same article. We model these observation dependencies by a graph \u0000<inline-formula> <tex-math>$mathcal {G}$ </tex-math></inline-formula>\u0000 in which nodes correspond to users and edges to social links. We derive a problem/instance-dependent lower-bound on the regret of any consistent algorithm. We propose an optimization-based data-driven learning algorithm that utilizes the structure of \u0000<inline-formula> <tex-math>$mathcal {G}$ </tex-math></inline-formula>\u0000 in order to make recommendations to users and show that it is asymptotically optimal, in the sense that its regret matches the lower-bound as the number of rounds \u0000<inline-formula> <tex-math>$Tto infty $ </tex-math></inline-formula>\u0000. We show that this asymptotically optimal regret is upper-bounded as \u0000<inline-formula> <tex-math>$Oleft ({{|chi (mathcal {G})|log T}}right)$ </tex-math></inline-formula>\u0000, where \u0000<inline-formula> <tex-math>$|chi (mathcal {G})|$ </tex-math></inline-formula>\u0000 is the domination number of \u0000<inline-formula> <tex-math>$mathcal {G}$ </tex-math></inline-formula>\u0000. In contrast, a naive application of the existing learning algorithms results in \u0000<inline-formula> <tex-math>$Oleft ({{Nlog T}}right)$ </tex-math></inline-formula>\u0000 regret, where N is the number of users.","PeriodicalId":13443,"journal":{"name":"IEEE/ACM Transactions on Networking","volume":"32 5","pages":"4222-4237"},"PeriodicalIF":3.0,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141574636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}