R. I. Zelaya, W. Sussman, Jeremy Gummeson, K. Jamieson, Wenjun Hu
{"title":"LAVA: fine-grained 3D indoor wireless coverage for small IoT devices","authors":"R. I. Zelaya, W. Sussman, Jeremy Gummeson, K. Jamieson, Wenjun Hu","doi":"10.1145/3452296.3472890","DOIUrl":"https://doi.org/10.1145/3452296.3472890","url":null,"abstract":"Small IoT devices deployed in challenging locations suffer from uneven 3D coverage in complex environments. This work optimizes indoor coverage with LAVA, a Large Array of Vanilla Amplifiers. LAVA is a standard-agnostic cooperative mesh of elements, i.e., RF devices each consisting of several switched input and output antennas connected to fixed-gain amplifiers. Each LAVA element is further equipped with rudimentary power sensing to detect nearby transmissions. The elements report power readings to the LAVA control plane, which then infers active link sessions without explicitly interacting with the endpoint transmitter or receiver. With simple on-off control of amplifiers and antenna switching, LAVA boosts passing signals via multi hop amplify-and-forward. LAVA explores a middle ground between smart surfaces and physical-layer relays. Multi-hopping over short inter-hop distances exerts more control over the end-to-end trajectory, supporting fine-grained coverage and spatial reuse. Ceiling testbed results show throughput improvements to individual Wi-Fi links by 50% on average and up to 100% at 15 dBm transmit power (193% on average, up to 8x at 0 dBm). ZigBee links see up to 17 dB power gain. For pairs of co-channel concurrent links, LAVA provides average per-link throughput improvements of 517% at 0 dBm and 80% at 15 dBm.","PeriodicalId":20487,"journal":{"name":"Proceedings of the 2021 ACM SIGCOMM 2021 Conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88897101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lucid: a language for control in the data plane","authors":"J. Sonchack, Devon Loehr, J. Rexford, D. Walker","doi":"10.1145/3452296.3472903","DOIUrl":"https://doi.org/10.1145/3452296.3472903","url":null,"abstract":"Programmable switch hardware makes it possible to move fine-grained control logic inside the network data plane, improving performance for a wide range of applications. However, applications with integrated control are inherently hard to write in existing data-plane programming languages such as P4. This paper presents Lucid, a language that raises the level of abstraction for putting control functionality in the data plane. Lucid introduces abstractions that make it easy to write sophisticated data-plane applications with interleaved packet-handling and control logic, specialized type and syntax systems that prevent programmer bugs related to data-plane state, and an open-sourced compiler that translates Lucid programs into P4 optimized for the Intel Tofino. These features make Lucid general and easy to use, as we demonstrate by writing a suite of ten different data-plane applications in Lucid. Working prototypes take well under an hour to write, even for a programmer without prior Tofino experience, have around 10x fewer lines of code compared to P4, and compile efficiently to real hardware. In a stateful firewall written in Lucid, we find that moving control from a switch's CPU to its data-plane processor using Lucid reduces the latency of performance-sensitive operations by over 300X.","PeriodicalId":20487,"journal":{"name":"Proceedings of the 2021 ACM SIGCOMM 2021 Conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89675011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qiongwen Xu, Michael D. Wong, Tanvi Wagle, S. Narayana, Anirudh Sivaraman
{"title":"Synthesizing safe and efficient kernel extensions for packet processing","authors":"Qiongwen Xu, Michael D. Wong, Tanvi Wagle, S. Narayana, Anirudh Sivaraman","doi":"10.1145/3452296.3472929","DOIUrl":"https://doi.org/10.1145/3452296.3472929","url":null,"abstract":"Extended Berkeley Packet Filter (BPF) has emerged as a powerful method to extend packet-processing functionality in the Linux operating system. BPF allows users to write code in high-level languages (like C or Rust) and execute them at specific hooks in the kernel, such as the network device driver. To ensure safe execution of a user-developed BPF program in kernel context, Linux uses an in-kernel static checker. The checker allows a program to execute only if it can prove that the program is crash-free, always accesses memory within safe bounds, and avoids leaking kernel data. BPF programming is not easy. One, even modest-sized BPF programs are deemed too large to analyze and rejected by the kernel checker. Two, the kernel checker may incorrectly determine that a BPF program exhibits unsafe behaviors. Three, even small performance optimizations to BPF code (e.g., 5% gains) must be meticulously hand-crafted by expert developers. Traditional optimizing compilers for BPF are often inadequate since the kernel checker's safety constraints are incompatible with rule-based optimizations. We present K2, a program-synthesis-based compiler that automatically optimizes BPF bytecode with formal correctness and safety guarantees. K2 produces code with 6--26% reduced size, 1.36%--55.03% lower average packet-processing latency, and 0--4.75% higher throughput (packets per second per core) relative to the best clang-compiled program, across benchmarks drawn from Cilium, Facebook, and the Linux kernel. K2 incorporates several domain-specific techniques to make synthesis practical by accelerating equivalence-checking of BPF programs by 6 orders of magnitude.","PeriodicalId":20487,"journal":{"name":"Proceedings of the 2021 ACM SIGCOMM 2021 Conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89537857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Two beams are better than one: towards reliable and high throughput mmWave links","authors":"I. Jain, Raghav Subbaraman, Dinesh Bharadia","doi":"10.1145/3452296.3472924","DOIUrl":"https://doi.org/10.1145/3452296.3472924","url":null,"abstract":"Millimeter-wave communication with high throughput and high reliability is poised to be a gamechanger for V2X and VR applications. However, mmWave links are notorious for low reliability since they suffer from frequent outages due to blockage and user mobility. We build mmReliable, a reliable mmWave system that implements multi-beamforming and user tracking to handle environmental vulnerabilities. It creates constructive multi-beam patterns and optimizes their angle, phase, and amplitude to maximize the signal strength at the receiver. Multi-beam links are reliable since they are resilient to occasional blockages of few constituent beams compared to a single-beam system. We implement mmReliable on a 28 GHz testbed with 400 MHz bandwidth, and a 64 element phased array supporting 5G NR waveforms. Rigorous indoor and outdoor experiments demonstrate that mmReliable achieves close to 100% reliability providing 2.3x improvement in the throughput-reliability product than single-beam systems.","PeriodicalId":20487,"journal":{"name":"Proceedings of the 2021 ACM SIGCOMM 2021 Conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87280501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jane Yen, Tam'as L'evai, Qinyuan Ye, Xiang Ren, R. Govindan, B. Raghavan
{"title":"Semi-automated protocol disambiguation and code generation","authors":"Jane Yen, Tam'as L'evai, Qinyuan Ye, Xiang Ren, R. Govindan, B. Raghavan","doi":"10.1145/3452296.3472910","DOIUrl":"https://doi.org/10.1145/3452296.3472910","url":null,"abstract":"For decades, Internet protocols have been specified using natural language. Given the ambiguity inherent in such text, it is not surprising that protocol implementations have long exhibited bugs. In this paper, we apply natural language processing (NLP) to effect semi-automated generation of protocol implementations from specification text. Our system, Sage, can uncover ambiguous or under-specified sentences in specifications; once these are clarified by the author of the protocol specification, Sage can generate protocol code automatically. Using Sage, we discover 5 instances of ambiguity and 6 instances of under-specification in the ICMP RFC; after fixing these, Sage is able to automatically generate code that interoperates perfectly with Linux implementations. We show that Sage generalizes to sections of BFD, IGMP, and NTP and identify additional conceptual components that Sage needs to support to generalize to complete, complex protocols like BGP and TCP.","PeriodicalId":20487,"journal":{"name":"Proceedings of the 2021 ACM SIGCOMM 2021 Conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90387998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiawei Fei, Chen-Yu Ho, Atal Narayan Sahu, M. Canini, Amedeo Sapio
{"title":"Efficient sparse collective communication and its application to accelerate distributed deep learning","authors":"Jiawei Fei, Chen-Yu Ho, Atal Narayan Sahu, M. Canini, Amedeo Sapio","doi":"10.1145/3452296.3472904","DOIUrl":"https://doi.org/10.1145/3452296.3472904","url":null,"abstract":"Efficient collective communication is crucial to parallel-computing applications such as distributed training of large-scale recommendation systems and natural language processing models. Existing collective communication libraries focus on optimizing operations for dense inputs, resulting in transmissions of many zeros when inputs are sparse. This counters current trends that see increasing data sparsity in large models. We propose OmniReduce, an efficient streaming aggregation system that exploits sparsity to maximize effective bandwidth use by sending only non-zero data blocks. We demonstrate that this idea is beneficial and accelerates distributed training by up to 8.2x. Even at 100 Gbps, OmniReduce delivers 1.4--2.9x better performance for network-bottlenecked DNNs.","PeriodicalId":20487,"journal":{"name":"Proceedings of the 2021 ACM SIGCOMM 2021 Conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82003610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Siyuan Zhuang, Zhuohan Li, Danyang Zhuo, Stephanie Wang, Eric Liang, Robert Nishihara, Philipp Moritz, I. Stoica
{"title":"Hoplite: efficient and fault-tolerant collective communication for task-based distributed systems","authors":"Siyuan Zhuang, Zhuohan Li, Danyang Zhuo, Stephanie Wang, Eric Liang, Robert Nishihara, Philipp Moritz, I. Stoica","doi":"10.1145/3452296.3472897","DOIUrl":"https://doi.org/10.1145/3452296.3472897","url":null,"abstract":"Task-based distributed frameworks (e.g., Ray, Dask, Hydro) have become increasingly popular for distributed applications that contain asynchronous and dynamic workloads, including asynchronous gradient descent, reinforcement learning, and model serving. As more data-intensive applications move to run on top of task-based systems, collective communication efficiency has become an important problem. Unfortunately, traditional collective communication libraries (e.g., MPI, Horovod, NCCL) are an ill fit, because they require the communication schedule to be known before runtime and they do not provide fault tolerance. We design and implement Hoplite, an efficient and fault-tolerant collective communication layer for task-based distributed systems. Our key technique is to compute data transfer schedules on the fly and execute the schedules efficiently through fine-grained pipelining. At the same time, when a task fails, the data transfer schedule adapts quickly to allow other tasks to keep making progress. We apply Hoplite to a popular task-based distributed framework, Ray. We show that Hoplite speeds up asynchronous stochastic gradient descent, reinforcement learning, and serving an ensemble of machine learning models that are difficult to execute efficiently with traditional collective communication by up to 7.8x, 3.9x, and 3.3x, respectively.","PeriodicalId":20487,"journal":{"name":"Proceedings of the 2021 ACM SIGCOMM 2021 Conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80857852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}