{"title":"CocoSketch","authors":"Yinda Zhang, Zaoxing Liu, Ruixin Wang, Tong Yang, Jizhou Li, Ruijie Miao, Peng Liu, Ruwen Zhang, Junchen Jiang","doi":"10.1145/3452296.3472892","DOIUrl":"https://doi.org/10.1145/3452296.3472892","url":null,"abstract":"Sketch-based measurement has emerged as a promising alternative to the traditional sampling-based network measurement approaches due to its high accuracy and resource efficiency. While there have been various designs around sketches, they focus on measuring one particular flow key, and it is infeasible to support many keys based on these sketches. In this work, we take a significant step towards supporting arbitrary partial key queries, where we only need to specify a full range of possible flow keys that are of interest before measurement starts, and in query time, we can extract the information of any key in that range. We design CocoSketch, which casts arbitrary partial key queries to the subset sum estimation problem and makes the theoretical tools for subset sum estimation practical. To realize desirable resource-accuracy tradeoffs in software and hardware platforms, we propose two techniques: (1) stochastic variance minimization to significantly reduce per-packet update delay, and (2) removing circular dependencies in the per-packet update logic to make the implementation hardware-friendly. We implement CocoSketch on four popular platforms (CPU, Open vSwitch, P4, and FPGA) and show that compared to baselines that use traditional single-key sketches, CocoSketch improves average packet processing throughput by 27.2x and accuracy by 10.4x when measuring six flow keys.","PeriodicalId":20487,"journal":{"name":"Proceedings of the 2021 ACM SIGCOMM 2021 Conference","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81500936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"1Pipe","authors":"Bojie Li, Gefei Zuo, Wei Bai, Lintao Zhang","doi":"10.1145/3452296.3472909","DOIUrl":"https://doi.org/10.1145/3452296.3472909","url":null,"abstract":"This paper proposes 1Pipe, a novel communication abstraction that enables different receivers to process messages from senders in a consistent total order. More precisely, 1Pipe provides both unicast and scattering (i.e., a group of messages to different destinations) in a causally and totally ordered manner. 1Pipe provides a best effort service that delivers each message at most once, as well as a reliable service that guarantees delivery and provides restricted atomic delivery for each scattering. 1Pipe can simplify and accelerate many distributed applications, e.g., transactional key-value stores, log replication, and distributed data structures. We propose a scalable and efficient method to implement 1Pipe inside data centers. To achieve total order delivery in a scalable manner, 1Pipe separates the bookkeeping of order information from message forwarding, and distributes the work to each switch and host. 1Pipe aggregates order information using in-network computation at switches. This forms the “control plane” of the system. On the “data plane”, 1Pipe forwards messages in the network as usual and reorders them at the receiver based on the order information. Evaluation on a 32-server testbed shows that 1Pipe achieves scalable throughput (80M messages per second per host) and low latency (10𝜇s) with little CPU and network overhead. 1Pipe achieves linearly scalable throughput and low latency in transactional key-value store, TPC-C, remote data structures, and replication that outperforms traditional designs by 2∼20x.","PeriodicalId":20487,"journal":{"name":"Proceedings of the 2021 ACM SIGCOMM 2021 Conference","volume":"53 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73621185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ACC: automatic ECN tuning for high-speed datacenter networks","authors":"Siyu Yan, Xiaoliang Wang, Xiaolong Zheng, Yinben Xia, Derui Liu, Weishan Deng","doi":"10.1145/3452296.3472927","DOIUrl":"https://doi.org/10.1145/3452296.3472927","url":null,"abstract":"For the widely deployed ECN-based congestion control schemes, the marking threshold is the key to deliver high bandwidth and low latency. However, due to traffic dynamics in the high-speed production networks, it is difficult to maintain persistent performance by using the static ECN setting. To meet the operational challenge, in this paper we report the design and implementation of an automatic run-time optimization scheme, ACC, which leverages the multi-agent reinforcement learning technique to dynamically adjust the marking threshold at each switch. The proposed approach works in a distributed fashion and combines offline and online training to adapt to dynamic traffic patterns. It can be easily deployed based on the common features supported by major commodity switching chips. Both testbed experiments and large-scale simulations have shown that ACC achieves low flow completion time (FCT) for both mice flows and elephant flows at line-rate. Under heterogeneous production environments with 300 machines, compared with the well-tuned static ECN settings, ACC achieves up to 20% improvement on IOPS and 30% lower FCT for storage service. ACC has been applied in high-speed datacenter networks and significantly simplifies the network operations.","PeriodicalId":20487,"journal":{"name":"Proceedings of the 2021 ACM SIGCOMM 2021 Conference","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82069135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Mahimkar, Carlos Eduardo de Andrade, R. Sinha, Giritharan Rana
{"title":"A composition framework for change management","authors":"A. Mahimkar, Carlos Eduardo de Andrade, R. Sinha, Giritharan Rana","doi":"10.1145/3452296.3472901","DOIUrl":"https://doi.org/10.1145/3452296.3472901","url":null,"abstract":"Change management has been a long-standing challenge for network operations. The large scale and diversity of networks, their complex dependencies, and continuous evolution through technology and software updates combined with the risk of service impact create tremendous challenges to effectively manage changes. In this paper, we use data from a large service provider and experiences of their operations teams to highlight the need for quick and easy adaptation of change management capabilities and keep up with the continuous network changes. We propose a new framework CORNET (COmposition fRamework for chaNge managEmenT) with key ideas of modularization of changes into building blocks, flexible composition into change workflows, change plan optimization, change impact verification, and automated translation of high-level change management intent into low-level implementations and mathematical models. We demonstrate the effectiveness of CORNET using real-world data collected from 4G and 5G cellular networks and virtualized services such as VPN and SDWAN running in the cloud as well as experiments conducted on a testbed of virtualized network functions. We also share our operational experiences and lessons learned from successfully using CORNET within a large service provider network over the last three years.","PeriodicalId":20487,"journal":{"name":"Proceedings of the 2021 ACM SIGCOMM 2021 Conference","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82292515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jaehong Min, Ming G. Liu, Tapan Chugh, Chenxingyu Zhao, Andrew Wei, I. Doh, A. Krishnamurthy
{"title":"Gimbal","authors":"Jaehong Min, Ming G. Liu, Tapan Chugh, Chenxingyu Zhao, Andrew Wei, I. Doh, A. Krishnamurthy","doi":"10.1145/3452296.3472940","DOIUrl":"https://doi.org/10.1145/3452296.3472940","url":null,"abstract":"Emerging SmartNIC-based disaggregated NVMe storage has become a promising storage infrastructure due to its competitive IO performance and low cost. These SmartNIC JBOFs are shared among multiple co-resident applications, and there is a need for the platform to ensure fairness, QoS, and high utilization. Unfortunately, given the limited computing capability of the SmartNICs and the non-deterministic nature of NVMe drives, it is challenging to provide such support on today's SmartNIC JBOFs. This paper presents Gimbal, a software storage switch that orchestrates IO traffic between Ethernet ports and NVMe drives for co-located tenants. It enables efficient multi-tenancy on SmartNIC JBOFs using the following techniques: a delay-based SSD congestion control algorithm, dynamic estimation of SSD write costs, a fair scheduler that operates at the granularity of a virtual slot, and an end-to-end credit-based flow control channel. Our prototyped system not only achieves up to x6.6 better utilization and 62.6% less tail latency but also improves the fairness for complex workloads. It also improves a commercial key-value store performance in a multi-tenant environment with x1.7 better throughput and 35.0% less tail latency on average.","PeriodicalId":20487,"journal":{"name":"Proceedings of the 2021 ACM SIGCOMM 2021 Conference","volume":"63 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80761101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qizhe Cai, Shubham Chaudhary, Midhul Vuppalapati, Jaehyun Hwang, R. Agarwal
{"title":"Understanding host network stack overheads","authors":"Qizhe Cai, Shubham Chaudhary, Midhul Vuppalapati, Jaehyun Hwang, R. Agarwal","doi":"10.1145/3452296.3472888","DOIUrl":"https://doi.org/10.1145/3452296.3472888","url":null,"abstract":"Traditional end-host network stacks are struggling to keep up with rapidly increasing datacenter access link bandwidths due to their unsustainable CPU overheads. Motivated by this, our community is exploring a multitude of solutions for future network stacks: from Linux kernel optimizations to partial hardware offload to clean-slate userspace stacks to specialized host network hardware. The design space explored by these solutions would benefit from a detailed understanding of CPU inefficiencies in existing network stacks. This paper presents measurement and insights for Linux kernel network stack performance for 100Gbps access link bandwidths. Our study reveals that such high bandwidth links, coupled with relatively stagnant technology trends for other host resources (e.g., CPU speeds and capacity, cache sizes, NIC buffer sizes, etc.), mark a fundamental shift in host network stack bottlenecks. For instance, we find that a single core is no longer able to process packets at line rate, with data copy from kernel to application buffers at the receiver becoming the core performance bottleneck. In addition, increase in bandwidth-delay products have outpaced the increase in cache sizes, resulting in inefficient DMA pipeline between the NIC and the CPU. Finally, we find that traditional loosely-coupled design of network stack and CPU schedulers in existing operating systems becomes a limiting factor in scaling network stack performance across cores. Based on insights from our study, we discuss implications to design of future operating systems, network protocols, and host hardware.","PeriodicalId":20487,"journal":{"name":"Proceedings of the 2021 ACM SIGCOMM 2021 Conference","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86539323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tian Pan, Nianbing Yu, Chenhao Jia, Jianwen Pi, Liang Xu, Yisong Qiao, Zhiguo Li, Kun Liu, Jie Lu, Jianyuan Lu, Enge Song, Jiao Zhang, Tao Huang, Shunmin Zhu
{"title":"Sailfish","authors":"Tian Pan, Nianbing Yu, Chenhao Jia, Jianwen Pi, Liang Xu, Yisong Qiao, Zhiguo Li, Kun Liu, Jie Lu, Jianyuan Lu, Enge Song, Jiao Zhang, Tao Huang, Shunmin Zhu","doi":"10.1145/3452296.3472889","DOIUrl":"https://doi.org/10.1145/3452296.3472889","url":null,"abstract":"The cloud gateway is essential in the public cloud as the central hub of cloud traffic. We show that horizontal scaling of software gateways, once sustainable for years, is no longer future-proof facing the massive scale and rapid growth of today's cloud. The root cause is the stagnant performance of the CPU core, which is prone to be overloaded by heavy hitters as traffic growth goes far beyond Moore's law. To address this, we propose emph{Sailfish}, a cloud-scale multi-tenant multi-service gateway accelerated by programmable switches. The new challenge is that large forwarding tables due to multi-tenancy cannot be fit into the limited on-chip memories. To this end, we devise a multi-pronged approach with (1) hardware/software co-design for table sharing, (2) horizontal table splitting among gateway clusters, (3) pipeline-aware table compression for a single node. Compared with the x86 gateway of a similar price, Sailfish reduces latency by 95% (2μs), improves throughput by more than 20x in bps (3.2Tbps) and 71x in pps (1.8Gpps) with packet length < 256B. Sailfish has been deployed in Alibaba Cloud for more than two years. It is the first P4-based cloud gateway in the industry, of which a single cluster carries dozens of Tbps traffic, withstanding peak-hour traffic in large online shopping festivals.","PeriodicalId":20487,"journal":{"name":"Proceedings of the 2021 ACM SIGCOMM 2021 Conference","volume":"25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81637277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Congestion detection in lossless networks","authors":"Yiran Zhang, Yifan Liu, Qingkai Meng, Fengyuan Ren","doi":"10.1145/3452296.3472899","DOIUrl":"https://doi.org/10.1145/3452296.3472899","url":null,"abstract":"Congestion detection is the cornerstone of end-to-end congestion control. Through in-depth observations and understandings, we reveal that existing congestion detection mechanisms in mainstream lossless networks (i.e., Converged Enhanced Ethernet and InfiniBand) are improper, due to failing to cognize the interaction between hop-by-hop flow controls and congestion detection behaviors in switches. We define ternary states of switch ports and present Ternary Congestion Detection (TCD) for mainstream lossless networks. Testbed and extensive simulations demonstrate that TCD can detect congestion ports accurately and identify flows contributing to congestion as well as flows only affected by hop-by-hop flow controls. Meanwhile, we shed light on how to incorporate TCD with rate control. Case studies show that existing congestion control algorithms can achieve 3.3x and 2.0x better median and 99th-percentile FCT slowdown by combining with TCD.","PeriodicalId":20487,"journal":{"name":"Proceedings of the 2021 ACM SIGCOMM 2021 Conference","volume":"113 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85491846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"mmTag","authors":"M. Mazaheri, Alex K Chen, Omid Abari","doi":"10.1145/3452296.3472917","DOIUrl":"https://doi.org/10.1145/3452296.3472917","url":null,"abstract":"Recent advances in IoT, machine learning and cloud computing have placed a huge strain on wireless networks. In particular, many emerging applications require streaming rich content (such as videos) in real time, while they are constrained by energy sources. A wireless network which supports high data-rate while consuming low-power would be very attractive for these applications. Unfortunately, existing wireless networks do not satisfy this requirement. For example, WiFi backscatter and Bluetooth networks have very low power consumption, but their data-rate is very limited (less than a Mbps). On the other hand, modern WiFi and mmWave networks support high throughput, but have a high power consumption (more than a watt). To address this problem, we present mmTag, a novel mmWave backscatter network which enables low-power high-throughput wireless links for emerging applications. mmTag is a backscatter system which operates in the mmWave frequency bands. mmTag addresses the key challenges that prevent existing backscatter networks from operating at mmWave bands. We implemented mmTag and evaluated its performance empirically. Our results show that mmTag is capable of achieving 1 Gbps and 100 Mbps at 4.6 m and 8 m, respectively, while consuming only 2.4 nJ/bit.","PeriodicalId":20487,"journal":{"name":"Proceedings of the 2021 ACM SIGCOMM 2021 Conference","volume":"77 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85730133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RoS","authors":"J. Nolan, Kun Qian, Xinyu Zhang","doi":"10.1007/978-3-662-48986-4_301467","DOIUrl":"https://doi.org/10.1007/978-3-662-48986-4_301467","url":null,"abstract":"","PeriodicalId":20487,"journal":{"name":"Proceedings of the 2021 ACM SIGCOMM 2021 Conference","volume":"98 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79038489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}