{"title":"Achieving Sublinear Complexity under Constant T in T-interval Dynamic Networks","authors":"Ruomu Hou, Irvan Jahja, Yucheng Sun, Jiyan Wu, Haifeng Yu","doi":"10.1145/3490148.3538571","DOIUrl":"https://doi.org/10.1145/3490148.3538571","url":null,"abstract":"This paper considers standard T-interval dynamic networks, where the N nodes in the network proceed in lock-step rounds, and where the topology of the network can change arbitrarily from round to round, as determined by an adversary. The adversary promises that in every T consecutive rounds, the T (potentially different) topologies in those T rounds contain a common connected subgraph that spans all nodes. Within such a context, we propose novel algorithms for solving some fundamental distributed computing problems such as Count/Consensus/Max. Our algorithms are the first algorithms whose complexities do not contain an Ømega(N) term, under constant T values. Previous sublinear algorithms require significantly larger T values.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"89 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134128198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. A. Bender, Seth Gilbert, F. Kuhn, John Kuszmaul, M. Médard
{"title":"Contention Resolution for Coded Radio Networks","authors":"M. A. Bender, Seth Gilbert, F. Kuhn, John Kuszmaul, M. Médard","doi":"10.1145/3490148.3538573","DOIUrl":"https://doi.org/10.1145/3490148.3538573","url":null,"abstract":"Randomized backoff protocols, such as exponential backoff, are a powerful tool for managing access to a shared resource, often a wireless communication channel (e.g., [1]). For a wireless device to transmit successfully, it uses a backoff protocol to ensure exclusive access to the channel. Modern radios, however, do not need exclusive access to the channel to communicate; in particular, they have the ability to receive useful information even when more than one device transmits at the same time. These capabilities have now been exploited for many years by systems that rely on interference cancellation, physical layer network coding and analog network coding to improve efficiency. For example, Zigzag decoding [56] demonstrated how a base station can decode messages sent by multiple devices simultaneously. In this paper, we address the following question: Can we design a backoff protocol that is better than exponential backoff when exclusive channel access is not required. We define the Coded Radio Network Model, which generalizes traditional radio network models (e.g., [30]). We then introduce the Decodable Backoff Algorithm, a randomized backoff protocol that achieves an optimal throughput of 1 - o (1). (Throughput 1 is optimal, as simultaneous reception does not increase the channel capacity.) The algorithm breaks the constant throughput lower bound for traditional radio networks [47-49], showing the power of these new hardware capabilities.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133664904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallel Shortest Paths with Negative Edge Weights","authors":"Nairen Cao, Jeremy T. Fineman, Katina Russell","doi":"10.1145/3490148.3538583","DOIUrl":"https://doi.org/10.1145/3490148.3538583","url":null,"abstract":"This paper presents a parallel version of Goldberg's algorithm for the problem of single-source shortest paths with integer (including negatives) edge weights. Given an input graph with n vertices, m edges, and integer weights ≥-N, our algorithms solves the problem with Õ(m √n log N) work and n5/4+o(1) log N span, both with high probability. Our algorithm thus has work similar to Goldberg's algorithm while also achieving at least m1/4-o(1) parallelism. To generate our parallel version of Goldberg's algorithm, we solve two specific distance-limited shortest-path problems, both with work Õ(m) and span √L · n1/2+o(1), where L is the distance limit.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114495428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Average Awake Complexity of MIS and Matching","authors":"M. Ghaffari, Julian Portmann","doi":"10.1145/3490148.3538566","DOIUrl":"https://doi.org/10.1145/3490148.3538566","url":null,"abstract":"Chatterjee, Gmyr, and Pandurangan [PODC 2020] recently introduced the notion of awake complexity for distributed algorithms, which measures the number of rounds in which a node is awake. In the other rounds, the node is sleeping and performs no computation or communication. Measuring the number of awake rounds can be of significance in many settings of distributed computing, e.g., in sensor networks where energy consumption is of concern. In that paper, Chatterjee et al. provide an elegant randomized algorithm for the Maximal Independent Set (MIS) problem that achieves an O(1) node-averaged awake complexity. That is, the average awake time among the nodes is O(1) rounds. However, to achieve that, the algorithm sacrifices the more standard round complexity measure from the well-known O(łog n) bound of MIS, due to Luby [STOC'85], to O(łog^3.41 n) rounds. Our first contribution is to present a simple randomized distributed MIS algorithm that, with high probability, has O(1) node-averaged awake complexity and O(łog n) worst-case round complexity. Our second, and more technical contribution, is to show algorithms with the same O(1) node-averaged awake complexity and O(łog n) worst-case round complexity for 1+ε approximation of maximum matching and 2+ε approximation of minimum vertex cover, where ε denotes an arbitrary small positive constant.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"15 12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125761526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A NUMA-Aware Recoverable Mutex Lock","authors":"Ahmed I. Fahmy, W. Golab","doi":"10.1145/3490148.3538594","DOIUrl":"https://doi.org/10.1145/3490148.3538594","url":null,"abstract":"The mutual exclusion (ME) problem has been of interest to the scientific community since it was first defined by Dijkstra. Various algorithms have been developed to solve the problem, like the MCS and CLH queue-based locks. The problem was generalized into the recoverable mutual exclusion (RME) problem by Golab and Ramaraju to accommodate the possibility of process crash failures. Since then, multiple RME algorithms have been presented in the literature that vary in design and performance. Furthermore, non-uniform memory access (NUMA) architecture has become mainstream in designing modern distributed systems, stimulating the development of NUMA-aware mutex locks. None of the existing NUMA-aware mutex locks are recoverable to the best of our knowledge. In addition, none of the transformation techniques in the literature, such as flat-combining and cohort-locking, is a black-box transformation. Precisely, each of the existing transformation techniques requires specific characteristics of, and possible modifications to, the underlying NUMA-oblivious lock. In this work, we propose the Recoverable Filter (RF) lock, a black-box transformation approach that exploits memory locality to transform a NUMA-oblivious recoverable mutex lock into a NUMA-aware one. Practical experiments are conducted using two existing RME algorithms, Golab and Hendler's (GH) and Jayanti, Jayanti, and Joshi's (JJJ). The two RME locks are transformed into NUMA-aware locks using the proposed RF and the existing cohort algorithms. Results show that, in multi-socket configurations, our transformation boosts the performance of the NUMA-oblivious RME locks by up to 45%. The RME locks transformed using the proposed RF lock are slower than their non-recoverable cohort variants by up to 9%. Outcomes demonstrate that the overhead of our algorithm is minimal when using a single socket. Moreover, a deeper empirical assessment shows that the gap in performance between GH and JJJ is due to the entry section of JJJ, not its exit section.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133109878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallel Cover Trees and their Applications","authors":"Yan Gu, Zachary Napier, Yihan Sun, Letong Wang","doi":"10.1145/3490148.3538581","DOIUrl":"https://doi.org/10.1145/3490148.3538581","url":null,"abstract":"The cover tree is the canonical data structure that efficiently maintains a dynamic set of points on a metric space and supports nearest and k-nearest neighbor searches. For most real-world datasets with reasonable distributions (constant expansion rate and bounded aspect ratio mathematically), single-point insertion, single-point deletion, and nearest neighbor search (NNS) only cost logarithmically to the size of the point set. Unfortunately, due to the complication and the use of depth-first traversal order in the cover tree algorithms, we were unaware of any parallel approaches for these cover tree algorithms. This paper shows highly parallel and work-efficient cover tree algorithms that can handle batch insertions (and thus construction) and batch deletions. Assuming constant expansion rate and bounded aspect ratio, inserting or deleting m points into a cover tree with n points takes O(m log n) expected work and polylogarithmic span with high probability. Our algorithms rely on some novel algorithmic insights. We model the insertion and deletion process as a graph and use a maximal independent set (MIS) to generate tree nodes without conflicts. We use three key ideas to guarantee work-efficiency: the prefix-doubling scheme, a careful design to limit the graph size on which we apply MIS, and a strategy to propagate information among different levels in the cover tree. We also use path-copying to make our parallel cover tree a persistent data structure, which is useful in several applications. Using our parallel cover trees, we show work-efficient (or near-work-efficient) and highly parallel solutions for a list of problems in computational geometry and machine learning, including Euclidean minimum spanning tree (EMST), single-linkage clustering, bichromatic closest pair (BCP), density-based clustering and its hierarchical version, and others. To the best of our knowledge, many of them are the first solutions to achieve work-efficiency and polylogarithmic span assuming constant expansion rate and bounded aspect ratio.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"377 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115173435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Preparing for Disaster: Leveraging Precomputation to Efficiently Repair Graph Structures Upon Failures","authors":"Calvin C. Newport, N. Vaidya, A. Weaver","doi":"10.1145/3490148.3538564","DOIUrl":"https://doi.org/10.1145/3490148.3538564","url":null,"abstract":"Distributed algorithms for constructing structures such as a maximal independent set (MIS) or maximal matching (MM) are well-studied in standard message-passing network models. In this paper, we consider a natural variant of this problem in which we begin with an instance of the graph structure and partition our algorithm execution that follows into two stages. During the first stage after the graph structure is calculated, some additional precomputation is done. In the second stage, an arbitrary collection of k nodes are crashed. The goal is to then repair the structure as efficiently as possible. We are interested in the circumstances under which the repair can be faster than the time required to build the structure from scratch, and focus, in particular, on trade-offs in which extra precomputation rounds during the first stage can be traded for faster repairs during the second.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122385675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zafar Ahmad, R. Chowdhury, Rathish Das, P. Ganapathi, Aaron Gregory, Yimin Zhu
{"title":"Brief Announcement: Faster Stencil Computations using Gaussian Approximations","authors":"Zafar Ahmad, R. Chowdhury, Rathish Das, P. Ganapathi, Aaron Gregory, Yimin Zhu","doi":"10.1145/3490148.3538558","DOIUrl":"https://doi.org/10.1145/3490148.3538558","url":null,"abstract":"Stencil computations are widely used to simulate the change of state of physical systems. The current best algorithm for performing aperiodic linear stencil computations on a d (≥ 1)-dimensional grid of size N for T timesteps does Θ(TN1-1/d+N Log N) work. We introduce novel techniques based on random walks and Gaussian approximations for an asymptotic improvement of this work bound for a class of linear stencils. We also improve the span (i.e., parallel running time on an unbounded number of processors) asymptotically from the current state of the art.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122042974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
John E. Augustine, Soumyottam Chatterjee, Gopal Pandurangan
{"title":"A Fully-Distributed Scalable Peer-to-Peer Protocol for Byzantine-Resilient Distributed Hash Tables","authors":"John E. Augustine, Soumyottam Chatterjee, Gopal Pandurangan","doi":"10.1145/3490148.3538588","DOIUrl":"https://doi.org/10.1145/3490148.3538588","url":null,"abstract":"Performing computation in the presence of faulty and malicious nodes is a central problem in distributed computing. Over 35 years ago, Dwork, Peleg, Pippenger, and Upfal [STOC 1986, SICOMP 1988] studied the fundamental Byzantine agreement problem in sparse, bounded degree networks and presented the first protocol that achieved almost-everywhere agreement among good nodes. However, this protocol and several subsequent protocols including that of King, Saia, Sanwalani, and Vee [FOCS 2006] had the drawback that they were not fully-distributed - in those protocols, nodes are required to have initial knowledge of the entire network topology. This drawback makes such protocols not applicable to real-world communication networks such as peer-to-peer (P2P) networks, which are typically sparse and bounded degree and where nodes initially have only local knowledge of themselves and of their neighbors.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129712260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance Analysis and Modelling of Concurrent Multi-access Data Structures","authors":"A. Rukundo, A. Atalar, P. Tsigas","doi":"10.1145/3490148.3538578","DOIUrl":"https://doi.org/10.1145/3490148.3538578","url":null,"abstract":"The major impediment to scaling concurrent data structures is memory contention when accessing shared data structure access-points, leading to thread serialisation, hindering parallelism. Aiming to address this challenge, significant amount of work in the literature has proposed multi-access techniques that improve concurrent data structure parallelism. However, there is little work on analysing and modelling the execution behaviour of concurrent multi-access data structures especially in a shared memory setting. In this paper, we analyse and model the general execution behaviour of concurrent multi-access data structures in the shared memory setting. We study and analyse the behaviour of the two popular random access patterns: shared (Remote) and exclusive (Local) access, and the behaviour of the two most commonly used atomic primitives for designing lock-free data structures: Compare and Swap, and, Fetch and Add. We model the concurrent multi-accesses by splitting the thread execution procedure into five logical sessions: i) side-work, ii) access-point search iii) access-point acquisition, iv) access-point data acquisition and v) access-point data operation. We model the acquisition of an access-point, as a system of closed queuing networks with parallel servers, and data acquisition in terms of where the data is located within the memory system. We evaluate our model on a set of concurrent data structure designs including a counter, a stack and a FIFO queue. The evaluation is carried out on two state of the art multi-core processors: Intel Xeon Phi CPU 7290 with 72 physical cores and Intel Xeon E5-2695 with 14 physical cores. Our model is able to predict the throughput performance of the given concurrent data structures with 80% to 100% accuracy on both architectures.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"207 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132106609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}