{"title":"实现网络堆栈延迟的全覆盖和低开销剖析","authors":"Xiang Chen;Hongyan Liu;Wenbin Zhang;Qun Huang;Dong Zhang;Haifeng Zhou;Xuan Liu;Chunming Wu","doi":"10.1109/TNET.2024.3421327","DOIUrl":null,"url":null,"abstract":"In modern data center networks (DCNs), network-stack processing denotes a large portion of the end-to-end latency of TCP flows. So profiling network-stack latency anomalies has been considered as a crucial part in DCN performance diagnosis and troubleshooting. In particular, such profiling requires full coverage (i.e., profiling every TCP packet) and low overhead (i.e., profiling should avoid high CPU consumption in end-hosts). However, existing solutions rely on system calls or tracepoints in end-hosts to implement network-stack latency profiling, leading to either low coverage or high overhead. We propose Torp, a framework that offers full-coverage and low-overhead profiling of network-stack latency. Our key idea is to offload as much of the profiling from costly system calls or tracepoints to the Torp agent built on eBPF modules, and further to include a Torp handler on the ToR switch to accelerate the remaining profiling operations. Torp efficiently coordinates the ToR switch and the Torp agent on end-hosts to jointly execute the entire latency profiling task. We have implemented Torp on \n<inline-formula> <tex-math>$32\\times 100$ </tex-math></inline-formula>\nGbps Tofino switches. Testbed experiments indicate that Torp achieves full coverage and orders of magnitude lower host-side overhead compared to other solutions.","PeriodicalId":13443,"journal":{"name":"IEEE/ACM Transactions on Networking","volume":"32 5","pages":"4441-4455"},"PeriodicalIF":3.0000,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Toward Full-Coverage and Low-Overhead Profiling of Network-Stack Latency\",\"authors\":\"Xiang Chen;Hongyan Liu;Wenbin Zhang;Qun Huang;Dong Zhang;Haifeng Zhou;Xuan Liu;Chunming Wu\",\"doi\":\"10.1109/TNET.2024.3421327\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In modern data center networks (DCNs), network-stack processing denotes a large portion of the end-to-end latency of TCP flows. So profiling network-stack latency anomalies has been considered as a crucial part in DCN performance diagnosis and troubleshooting. In particular, such profiling requires full coverage (i.e., profiling every TCP packet) and low overhead (i.e., profiling should avoid high CPU consumption in end-hosts). However, existing solutions rely on system calls or tracepoints in end-hosts to implement network-stack latency profiling, leading to either low coverage or high overhead. We propose Torp, a framework that offers full-coverage and low-overhead profiling of network-stack latency. Our key idea is to offload as much of the profiling from costly system calls or tracepoints to the Torp agent built on eBPF modules, and further to include a Torp handler on the ToR switch to accelerate the remaining profiling operations. Torp efficiently coordinates the ToR switch and the Torp agent on end-hosts to jointly execute the entire latency profiling task. We have implemented Torp on \\n<inline-formula> <tex-math>$32\\\\times 100$ </tex-math></inline-formula>\\nGbps Tofino switches. Testbed experiments indicate that Torp achieves full coverage and orders of magnitude lower host-side overhead compared to other solutions.\",\"PeriodicalId\":13443,\"journal\":{\"name\":\"IEEE/ACM Transactions on Networking\",\"volume\":\"32 5\",\"pages\":\"4441-4455\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2024-07-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE/ACM Transactions on Networking\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10583922/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/ACM Transactions on Networking","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10583922/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
Toward Full-Coverage and Low-Overhead Profiling of Network-Stack Latency
In modern data center networks (DCNs), network-stack processing denotes a large portion of the end-to-end latency of TCP flows. So profiling network-stack latency anomalies has been considered as a crucial part in DCN performance diagnosis and troubleshooting. In particular, such profiling requires full coverage (i.e., profiling every TCP packet) and low overhead (i.e., profiling should avoid high CPU consumption in end-hosts). However, existing solutions rely on system calls or tracepoints in end-hosts to implement network-stack latency profiling, leading to either low coverage or high overhead. We propose Torp, a framework that offers full-coverage and low-overhead profiling of network-stack latency. Our key idea is to offload as much of the profiling from costly system calls or tracepoints to the Torp agent built on eBPF modules, and further to include a Torp handler on the ToR switch to accelerate the remaining profiling operations. Torp efficiently coordinates the ToR switch and the Torp agent on end-hosts to jointly execute the entire latency profiling task. We have implemented Torp on
$32\times 100$
Gbps Tofino switches. Testbed experiments indicate that Torp achieves full coverage and orders of magnitude lower host-side overhead compared to other solutions.
期刊介绍:
The IEEE/ACM Transactions on Networking’s high-level objective is to publish high-quality, original research results derived from theoretical or experimental exploration of the area of communication/computer networking, covering all sorts of information transport networks over all sorts of physical layer technologies, both wireline (all kinds of guided media: e.g., copper, optical) and wireless (e.g., radio-frequency, acoustic (e.g., underwater), infra-red), or hybrids of these. The journal welcomes applied contributions reporting on novel experiences and experiments with actual systems.