Daojing Guo;Khaled Nakhleh;I-Hong Hou;Sastry Kompella;Clement Kam
{"title":"AoI, Timely-Throughput, and Beyond: A Theory of Second-Order Wireless Network Optimization","authors":"Daojing Guo;Khaled Nakhleh;I-Hong Hou;Sastry Kompella;Clement Kam","doi":"10.1109/TNET.2024.3432655","DOIUrl":"10.1109/TNET.2024.3432655","url":null,"abstract":"This paper introduces a new theoretical framework for optimizing second-order behaviors of wireless networks. Unlike existing techniques for network utility maximization, which only consider first-order statistics, this framework models every random process by its mean and temporal variance. The inclusion of temporal variance makes this framework well-suited for modeling Markovian fading wireless channels and emerging network performance metrics such as age-of-information (AoI) and timely-throughput. Using this framework, we sharply characterize the second-order capacity region of wireless access networks. We also propose a simple scheduling policy and prove that it can achieve every interior point in the second-order capacity region. To demonstrate the utility of this framework, we apply it to an unsolved network optimization problem where some clients wish to minimize AoI while others wish to maximize timely-throughput. We show that this framework accurately characterizes AoI and timely-throughput. Moreover, it leads to a tractable scheduling policy that outperforms other existing work.","PeriodicalId":13443,"journal":{"name":"IEEE/ACM Transactions on Networking","volume":"32 6","pages":"4707-4721"},"PeriodicalIF":3.0,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141775720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Blind Tag-Based Physical-Layer Authentication","authors":"Chen Wang;Mingrui Sha;Wei Xiong;Ning Xie;Rui Mao;Peichang Zhang;Lei Huang","doi":"10.1109/TNET.2024.3430980","DOIUrl":"10.1109/TNET.2024.3430980","url":null,"abstract":"In comparison with upper-layer authentication mechanisms, the tag-based Physical-Layer Authentication (PLA) attracts many research interests because of high security and low complexity. This paper mainly concerns two problems in prior tag-based PLA schemes, where the first one is extra overhead and vulnerability due to the reason that the parameter is broadcasted and the other one is the problem of setting the parameter empirically. Therefore, two new tag-based PLA schemes are proposed to address the above limitations. Specifically, a blind tag-based PLA scheme (BTP) is presented to achieve accurate authentication without knowing the tag parameter of the legitimate transmitter, which not only saves the communication overhead but also improves security. Then, an adaptive blind tag-based PLA scheme (ABTP) is further proposed, which adaptively sets the tag parameter according to the wireless channel state to achieve a better balance among robustness, security, and compatibility. Rigorous theoretical analyses are provided for the two proposed schemes and the prior schemes’ performance comparisons are given. The accuracy of the theoretical analyses is verified through simulation results. At last, the advantages and disadvantages of the two proposed schemes are discussed, and suggestions are given according to different scenarios.","PeriodicalId":13443,"journal":{"name":"IEEE/ACM Transactions on Networking","volume":"32 6","pages":"4735-4748"},"PeriodicalIF":3.0,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141775721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hanlin Huang;Xinle Du;Tong Li;Haiyang Wang;Ke Xu;Mowei Wang;Huichen Dai
{"title":"Re-Architecting Buffer Management in Lossless Ethernet","authors":"Hanlin Huang;Xinle Du;Tong Li;Haiyang Wang;Ke Xu;Mowei Wang;Huichen Dai","doi":"10.1109/TNET.2024.3430989","DOIUrl":"10.1109/TNET.2024.3430989","url":null,"abstract":"Converged Ethernet employs Priority-based Flow Control (PFC) to provide a lossless network. However, issues caused by PFC, including victim flow, congestion spreading, and deadlock, impede its large-scale deployment in production systems. The fine-grained experimental observations on switch buffer occupancy find that the root cause of these performance problems is a mismatch of sending rates between end-to-end congestion control and hop-by-hop flow control. Resolving this mismatch requires the switch to provide an additional buffer, which is not supported by the classic dynamic threshold (DT) policy in current shared-buffer commercial switches. In this paper, we propose Selective-PFC (SPFC), a practical buffer management scheme that handles such mismatch. Specifically, SPFC incrementally modifies DT by proactively detecting port traffic and adjusting buffer allocation accordingly to trigger PFC PAUSE frames selectively. Extensive case studies demonstrate that SPFC can reduce the number of PFC PAUSEs on non-bursty ports by up to 69.0%, and reduce the average flow completion time by up to 83.5% for large victim flows.","PeriodicalId":13443,"journal":{"name":"IEEE/ACM Transactions on Networking","volume":"32 6","pages":"4749-4764"},"PeriodicalIF":3.0,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141775722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Online Task Scheduling and Termination With Throughput Constraint","authors":"Qingsong Liu;Zhixuan Fang","doi":"10.1109/TNET.2024.3425617","DOIUrl":"10.1109/TNET.2024.3425617","url":null,"abstract":"We consider the task scheduling scenario where the controller activates one from K task types at each time. Each task induces a random completion time, and a reward is obtained only after the task is completed. The statistics of the completion time and the reward distributions of all task types are unknown to the controller. The controller needs to learn to schedule tasks to maximize the accumulated reward within a given time horizon T. Motivated by the practical scenarios, we require the designed policy to satisfy a system throughput constraint. In addition, we introduce the interruption mechanism to terminate ongoing tasks that last longer than certain deadlines. To address this scheduling problem, we model it as an online learning problem with deadline and throughput constraints. Then, we characterize the optimal offline policy and develop efficient online learning algorithms based on the Lyapunov method. We prove that our online learning algorithm achieves an \u0000<inline-formula> <tex-math>$O(sqrt {T})$ </tex-math></inline-formula>\u0000 regret and zero constraint violation. We also conduct simulations to evaluate the performance of our developed learning algorithms.","PeriodicalId":13443,"journal":{"name":"IEEE/ACM Transactions on Networking","volume":"32 6","pages":"4629-4643"},"PeriodicalIF":3.0,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141742620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Su Wang;Roberto Morabito;Seyyedali Hosseinalipour;Mung Chiang;Christopher G. Brinton
{"title":"Device Sampling and Resource Optimization for Federated Learning in Cooperative Edge Networks","authors":"Su Wang;Roberto Morabito;Seyyedali Hosseinalipour;Mung Chiang;Christopher G. Brinton","doi":"10.1109/TNET.2024.3423673","DOIUrl":"10.1109/TNET.2024.3423673","url":null,"abstract":"The conventional federated learning (FedL) architecture distributes machine learning (ML) across worker devices by having them train local models that are periodically aggregated by a server. FedL ignores two important characteristics of contemporary wireless networks, however: (i) the network may contain heterogeneous communication/computation resources, and (ii) there may be significant overlaps in devices’ local data distributions. In this work, we develop a novel optimization methodology that jointly accounts for these factors via intelligent device sampling complemented by device-to-device (D2D) offloading. Our optimization methodology aims to select the best combination of sampled nodes and data offloading configuration to maximize FedL training accuracy while minimizing data processing and D2D communication resource consumption subject to realistic constraints on the network topology and device capabilities. Theoretical analysis of the D2D offloading subproblem leads to new FedL convergence bounds and an efficient sequential convex optimizer. Using these results, we develop a sampling methodology based on graph convolutional networks (GCNs) which learns the relationship between network attributes, sampled nodes, and D2D data offloading to maximize FedL accuracy. Through evaluation on popular datasets and real-world network measurements from our edge testbed, we find that our methodology outperforms popular device sampling methodologies from literature in terms of ML model performance, data processing overhead, and energy consumption.","PeriodicalId":13443,"journal":{"name":"IEEE/ACM Transactions on Networking","volume":"32 5","pages":"4365-4381"},"PeriodicalIF":3.0,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141742621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Online Optimization of DNN Inference Network Utility in Collaborative Edge Computing","authors":"Rui Li;Tao Ouyang;Liekang Zeng;Guocheng Liao;Zhi Zhou;Xu Chen","doi":"10.1109/TNET.2024.3421356","DOIUrl":"10.1109/TNET.2024.3421356","url":null,"abstract":"Collaborative Edge Computing (CEC) is an emerging paradigm that collaborates heterogeneous edge devices as a resource pool to compute DNN inference tasks in proximity such as edge video analytics. Nevertheless, as the key knob to improve network utility in CEC, existing works mainly focus on the workload routing strategies among edge devices with the aim of minimizing the routing cost, remaining an open question for joint workload allocation and routing optimization problem from a system perspective. To this end, this paper presents a holistic, learned optimization for CEC towards maximizing the total network utility in an online manner, even though the utility functions of task input rates are unknown a priori. In particular, we characterize the CEC system in a flow model and formulate an online learning problem in a form of cross-layer optimization. We propose a nested-loop algorithm to solve workload allocation and distributed routing iteratively, using the tools of gradient sampling and online mirror descent. To improve the convergence rate over the nested-loop version, we further devise a single-loop algorithm. Rigorous analysis is provided to show its inherent convexity, efficient convergence, as well as algorithmic optimality. Finally, extensive numerical simulations demonstrate the superior performance of our solutions.","PeriodicalId":13443,"journal":{"name":"IEEE/ACM Transactions on Networking","volume":"32 5","pages":"4414-4426"},"PeriodicalIF":3.0,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141720151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Diagnosing End-Host Network Bottlenecks in RDMA Servers","authors":"Kefei Liu;Jiao Zhang;Zhuo Jiang;Haoran Wei;Xiaolong Zhong;Lizhuang Tan;Tian Pan;Tao Huang","doi":"10.1109/TNET.2024.3416419","DOIUrl":"10.1109/TNET.2024.3416419","url":null,"abstract":"In RDMA (Remote Direct Memory Access) networks, end-host networks, including intra-host networks and RNICs (RDMA NIC), were considered robust and have received little attention. However, as the RNIC line rate rapidly increases to multi-hundred gigabits, the intra-host network becomes a potential performance bottleneck for network applications. Intra-host network bottlenecks can result in degraded intra-host bandwidth and increased intra-host latency. In addition, RNIC network problems can result in connection failures and packet drops. Host network problems can severely degrade network performance. However, when host network problems occur, they can hardly be noticed due to the lack of a monitoring system. Furthermore, existing diagnostic mechanisms cannot efficiently diagnose host network problems. In this paper, we analyze the symptom of host network problems based on our long-term troubleshooting experience and propose Hostping, the first monitoring and diagnostic system dedicated to host networks. The core idea of Hostping is to conduct 1) loopback tests between RNICs and endpoints within the host to measure intra-host latency and bandwidth, and 2) mutual probing between RNICs on a host to measure RNIC connectivity. We have deployed Hostping on thousands of servers in our distributed machine learning system. Not only can Hostping detect and diagnose host network problems we already knew in minutes, but it also reveals eight problems we did not notice before.","PeriodicalId":13443,"journal":{"name":"IEEE/ACM Transactions on Networking","volume":"32 5","pages":"4302-4316"},"PeriodicalIF":3.0,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141720149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Distributional Reinforcement Learning-Based Adaptive Routing With Guaranteed Delay Bounds","authors":"Jianmin Liu;Dan Li;Yongjun Xu","doi":"10.1109/TNET.2024.3425652","DOIUrl":"10.1109/TNET.2024.3425652","url":null,"abstract":"Real-time applications that require timely data delivery over wireless multi-hop networks within specified deadlines are growing increasingly. Effective routing protocols that can guarantee real-time QoS are crucial, yet challenging, due to the unpredictable variations in end-to-end delay caused by unreliable wireless channels. In such conditions, the upper bound on the end-to-end delay, i.e., worst-case end-to-end delay, should be guaranteed within the deadline. However, existing routing protocols with guaranteed delay bounds cannot strictly guarantee real-time QoS because they assume that the worst-case end-to-end delay is known and ignore the impact of routing policies on the worst-case end-to-end delay determination. In this paper, we relax this assumption and propose DDRL-ARGB, an Adaptive Routing with Guaranteed delay Bounds using Deep Distributional Reinforcement Learning (DDRL). DDRL-ARGB adopts DDRL to jointly determine the worst-case end-to-end delay and learn routing policies. To accurately determine worst-case end-to-end delay, DDRL-ARGB employs a quantile regression deep Q-network to learn the end-to-end delay cumulative distribution. To guarantee real-time QoS, DDRL-ARGB optimizes routing decisions under the constraint of worst-case end-to-end delay within the deadline. To improve traffic congestion, DDRL-ARGB considers the network congestion status when making routing decisions. Extensive results show that DDRL-ARGB can accurately calculate worst-case end-to-end delay, and can strictly guarantee real-time QoS under a small tolerant violation probability against two state-of-the-art routing protocols.","PeriodicalId":13443,"journal":{"name":"IEEE/ACM Transactions on Networking","volume":"32 6","pages":"4692-4706"},"PeriodicalIF":3.0,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141720152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Asynchronous Decentralized Federated Learning for Heterogeneous Devices","authors":"Yunming Liao;Yang Xu;Hongli Xu;Min Chen;Lun Wang;Chunming Qiao","doi":"10.1109/TNET.2024.3424444","DOIUrl":"10.1109/TNET.2024.3424444","url":null,"abstract":"Data generated at the network edge can be processed locally by leveraging the emerging technology of Federated Learning (FL). However, non-IID local data will lead to degradation of model accuracy and the heterogeneity of edge nodes inevitably slows down model training efficiency. Moreover, to avoid the potential communication bottleneck in the parameter-server-based FL, we concentrate on the Decentralized Federated Learning (DFL) that performs distributed model training in Peer-to-Peer (P2P) manner. To address these challenges, we propose an asynchronous DFL system by incorporating neighbor selection and gradient push, termed AsyDFL. Specifically, we require each edge node to push gradients only to a subset of neighbors for resource efficiency. Herein, we first give a theoretical convergence analysis of AsyDFL under the complicated non-IID and heterogeneous scenario, and further design a priority-based algorithm to dynamically select neighbors for each edge node so as to achieve the trade-off between communication cost and model performance. We evaluate the performance of AsyDFL through extensive experiments on a physical platform with 30 NVIDIA Jetson edge devices. Evaluation results show that AsyDFL can reduce the communication cost by 57% and the completion time by about 35% for achieving the same test accuracy, and improve model accuracy by at least 6% under the non-IID scenario, compared to the baselines.","PeriodicalId":13443,"journal":{"name":"IEEE/ACM Transactions on Networking","volume":"32 5","pages":"4535-4550"},"PeriodicalIF":3.0,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141722383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tung-Anh Nguyen;Long Tan Le;Tuan Dung Nguyen;Wei Bao;Suranga Seneviratne;Choong Seon Hong;Nguyen H. Tran
{"title":"Federated PCA on Grassmann Manifold for IoT Anomaly Detection","authors":"Tung-Anh Nguyen;Long Tan Le;Tuan Dung Nguyen;Wei Bao;Suranga Seneviratne;Choong Seon Hong;Nguyen H. Tran","doi":"10.1109/TNET.2024.3423780","DOIUrl":"10.1109/TNET.2024.3423780","url":null,"abstract":"With the proliferation of the Internet of Things (IoT) and the rising interconnectedness of devices, network security faces significant challenges, especially from anomalous activities. While traditional machine learning-based intrusion detection systems (ML-IDS) effectively employ supervised learning methods, they possess limitations such as the requirement for labeled data and challenges with high dimensionality. Recent unsupervised ML-IDS approaches such as AutoEncoders and Generative Adversarial Networks (GAN) offer alternative solutions but pose challenges in deployment onto resource-constrained IoT devices and in interpretability. To address these concerns, this paper proposes a novel federated unsupervised anomaly detection framework – FedPCA – that leverages Principal Component Analysis (PCA) and the Alternating Directions Method Multipliers (ADMM) to learn common representations of distributed non-i.i.d. datasets. Building on the FedPCA framework, we propose two algorithms, FedPE in Euclidean space and FedPG on Grassmann manifolds. Our approach enables real-time threat detection and mitigation at the device level, enhancing network resilience while ensuring privacy. Moreover, the proposed algorithms are accompanied by theoretical convergence rates even under a sub-sampling scheme, a novel result. Experimental results on the UNSW-NB15 and TON-IoT datasets show that our proposed methods offer performance in anomaly detection comparable to non-linear baselines, while providing significant improvements in communication and memory efficiency, underscoring their potential for securing IoT networks.","PeriodicalId":13443,"journal":{"name":"IEEE/ACM Transactions on Networking","volume":"32 5","pages":"4456-4471"},"PeriodicalIF":3.0,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141609172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}