了解主机网络堆栈开销

Proceedings of the 2021 ACM SIGCOMM 2021 Conference Pub Date : 2021-08-09 DOI:10.1145/3452296.3472888

Qizhe Cai, Shubham Chaudhary, Midhul Vuppalapati, Jaehyun Hwang, R. Agarwal

{"title":"了解主机网络堆栈开销","authors":"Qizhe Cai, Shubham Chaudhary, Midhul Vuppalapati, Jaehyun Hwang, R. Agarwal","doi":"10.1145/3452296.3472888","DOIUrl":null,"url":null,"abstract":"Traditional end-host network stacks are struggling to keep up with rapidly increasing datacenter access link bandwidths due to their unsustainable CPU overheads. Motivated by this, our community is exploring a multitude of solutions for future network stacks: from Linux kernel optimizations to partial hardware offload to clean-slate userspace stacks to specialized host network hardware. The design space explored by these solutions would benefit from a detailed understanding of CPU inefficiencies in existing network stacks. This paper presents measurement and insights for Linux kernel network stack performance for 100Gbps access link bandwidths. Our study reveals that such high bandwidth links, coupled with relatively stagnant technology trends for other host resources (e.g., CPU speeds and capacity, cache sizes, NIC buffer sizes, etc.), mark a fundamental shift in host network stack bottlenecks. For instance, we find that a single core is no longer able to process packets at line rate, with data copy from kernel to application buffers at the receiver becoming the core performance bottleneck. In addition, increase in bandwidth-delay products have outpaced the increase in cache sizes, resulting in inefficient DMA pipeline between the NIC and the CPU. Finally, we find that traditional loosely-coupled design of network stack and CPU schedulers in existing operating systems becomes a limiting factor in scaling network stack performance across cores. Based on insights from our study, we discuss implications to design of future operating systems, network protocols, and host hardware.","PeriodicalId":20487,"journal":{"name":"Proceedings of the 2021 ACM SIGCOMM 2021 Conference","volume":"10 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"59","resultStr":"{\"title\":\"Understanding host network stack overheads\",\"authors\":\"Qizhe Cai, Shubham Chaudhary, Midhul Vuppalapati, Jaehyun Hwang, R. Agarwal\",\"doi\":\"10.1145/3452296.3472888\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Traditional end-host network stacks are struggling to keep up with rapidly increasing datacenter access link bandwidths due to their unsustainable CPU overheads. Motivated by this, our community is exploring a multitude of solutions for future network stacks: from Linux kernel optimizations to partial hardware offload to clean-slate userspace stacks to specialized host network hardware. The design space explored by these solutions would benefit from a detailed understanding of CPU inefficiencies in existing network stacks. This paper presents measurement and insights for Linux kernel network stack performance for 100Gbps access link bandwidths. Our study reveals that such high bandwidth links, coupled with relatively stagnant technology trends for other host resources (e.g., CPU speeds and capacity, cache sizes, NIC buffer sizes, etc.), mark a fundamental shift in host network stack bottlenecks. For instance, we find that a single core is no longer able to process packets at line rate, with data copy from kernel to application buffers at the receiver becoming the core performance bottleneck. In addition, increase in bandwidth-delay products have outpaced the increase in cache sizes, resulting in inefficient DMA pipeline between the NIC and the CPU. Finally, we find that traditional loosely-coupled design of network stack and CPU schedulers in existing operating systems becomes a limiting factor in scaling network stack performance across cores. Based on insights from our study, we discuss implications to design of future operating systems, network protocols, and host hardware.\",\"PeriodicalId\":20487,\"journal\":{\"name\":\"Proceedings of the 2021 ACM SIGCOMM 2021 Conference\",\"volume\":\"10 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-08-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"59\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2021 ACM SIGCOMM 2021 Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3452296.3472888\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 ACM SIGCOMM 2021 Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3452296.3472888","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 59

摘要

由于其不可持续的CPU开销，传统的终端主机网络堆栈正在努力跟上快速增长的数据中心访问链路带宽。受此激励，我们的社区正在为未来的网络栈探索多种解决方案:从Linux内核优化到部分硬件卸载，从全新的用户空间栈到专门的主机网络硬件。这些解决方案所探索的设计空间将受益于对现有网络堆栈中CPU效率低下的详细理解。本文介绍了100Gbps访问链路带宽下Linux内核网络堆栈性能的测量和见解。我们的研究表明，这样的高带宽链路，加上其他主机资源相对停滞的技术趋势(例如，CPU速度和容量，缓存大小，NIC缓冲区大小等)，标志着主机网络堆栈瓶颈的根本转变。例如，我们发现单个内核不再能够以线速率处理数据包，从内核到接收器的应用程序缓冲区的数据复制成为核心性能瓶颈。此外，带宽延迟产品的增加超过了缓存大小的增加，导致网卡和CPU之间的DMA管道效率低下。最后，我们发现，在现有的操作系统中，传统的网络堆栈和CPU调度程序的松耦合设计成为跨核扩展网络堆栈性能的限制因素。基于我们研究的见解，我们讨论了对未来操作系统、网络协议和主机硬件设计的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Understanding host network stack overheads

Traditional end-host network stacks are struggling to keep up with rapidly increasing datacenter access link bandwidths due to their unsustainable CPU overheads. Motivated by this, our community is exploring a multitude of solutions for future network stacks: from Linux kernel optimizations to partial hardware offload to clean-slate userspace stacks to specialized host network hardware. The design space explored by these solutions would benefit from a detailed understanding of CPU inefficiencies in existing network stacks. This paper presents measurement and insights for Linux kernel network stack performance for 100Gbps access link bandwidths. Our study reveals that such high bandwidth links, coupled with relatively stagnant technology trends for other host resources (e.g., CPU speeds and capacity, cache sizes, NIC buffer sizes, etc.), mark a fundamental shift in host network stack bottlenecks. For instance, we find that a single core is no longer able to process packets at line rate, with data copy from kernel to application buffers at the receiver becoming the core performance bottleneck. In addition, increase in bandwidth-delay products have outpaced the increase in cache sizes, resulting in inefficient DMA pipeline between the NIC and the CPU. Finally, we find that traditional loosely-coupled design of network stack and CPU schedulers in existing operating systems becomes a limiting factor in scaling network stack performance across cores. Based on insights from our study, we discuss implications to design of future operating systems, network protocols, and host hardware.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2021 ACM SIGCOMM 2021 Conference

自引率

0.00%

发文量