{"title":"Brief Announcement: Fast Concurrent Cuckoo Kick-Out Eviction Schemes for High-Density Tables","authors":"William Kuszmaul","doi":"10.1145/2935764.2935814","DOIUrl":"https://doi.org/10.1145/2935764.2935814","url":null,"abstract":"Cuckoo hashing guarantees constant-time lookups regardless of table density, making it a viable candidate for high-density tables. Cuckoo hashing insertions perform poorly at high table densities, however. In this paper, we mitigate this problem through the introduction of novel kick-out eviction algorithms. Experimentally, our algorithms reduce the number of bins viewed per insertion for high-density tables by as much as a factor of ten. We also implement an optimistic concurrency scheme for serializable multi-writer cuckoo hash tables (not using hardware transactional memory). For delete-light loads, one of our kick-out schemes avoids all competition between insertions with high probability, and significantly reduces transaction-abort frequency. This result is extended to arbitrary workloads using a new mechanism called a claim flag.","PeriodicalId":346939,"journal":{"name":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"132 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127590096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
William E. Devanny, M. Goodrich, Kristopher Jetviroj
{"title":"Parallel Equivalence Class Sorting: Algorithms, Lower Bounds, and Distribution-Based Analysis","authors":"William E. Devanny, M. Goodrich, Kristopher Jetviroj","doi":"10.1145/2935764.2935778","DOIUrl":"https://doi.org/10.1145/2935764.2935778","url":null,"abstract":"We study parallel comparison-based algorithms for finding all equivalence classes of a set of $n$ elements, where sorting according to some total order is not possible. Such scenarios arise, for example, in applications, such as in distributed computer security, where each of n agents are working to identify the private group to which they belong, with the only operation available to them being a zero-knowledge pairwise-comparison (which is sometimes called a \"secret handshake\") that reveals only whether two agents are in the same group or in different groups. We provide new parallel algorithms for this problem, as well as new lower bounds and distribution-based analysis.","PeriodicalId":346939,"journal":{"name":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124511836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Better Bounds for Coalescing-Branching Random Walks","authors":"M. Mitzenmacher, R. Rajaraman, Scott T. Roche","doi":"10.1145/2935764.2935791","DOIUrl":"https://doi.org/10.1145/2935764.2935791","url":null,"abstract":"Coalescing-branching random walks, or cobra walks for short, are a natural variant of random walks on graphs that can model the spread of disease through contacts or the spread of information in networks. In a k-cobra walk, at each time step a subset of the vertices are active; each active vertex chooses k random neighbors (sampled independently and uniformly with replacement) that become active at the next step, and these are the only active vertices at the next step. A natural quantity to study for cobra walks is the cover time, which corresponds to the expected time when all nodes have become infected or received the disseminated information. In this work, we extend previous results for cobra walks in multiple ways. We show that the cover time for the 2-cobra walk on an n-vertex d-dimensional grid is O(n1/d) (where the order notation hides constant factors that depend on d); previous work had shown the cover time was O(n1/d ⋅ polylog(n)). We show that the cover time for a 2-cobra walk on an n-vertex d-regular graph with conductance φG is O(d4 φG-2 log2 n), significantly generalizing a previous result that held only for expander graphs with sufficiently high expansion. And finally we show that the cover time for a 2-cobra walk on a graph with n vertices is always O(n11/4 log n); this is the first result showing that the bound of Θ(n3) for the worst-case cover time for random walks can be beaten using 2-cobra walks.","PeriodicalId":346939,"journal":{"name":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"111 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132227666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Extending the Nested Parallel Model to the Nested Dataflow Model with Provably Efficient Schedulers","authors":"David Dinh, H. Simhadri, Yuan Tang","doi":"10.1145/2935764.2935797","DOIUrl":"https://doi.org/10.1145/2935764.2935797","url":null,"abstract":"The nested parallel (a.k.a. fork-join) model is widely used for writing parallel programs. However, the two composition constructs, i.e. \"||\" (parallel) and \";\" (serial), that comprise the nested-parallel model are insufficient in expressing \"partial dependencies\" in a program. We propose a new dataflow composition construct \"↝\" to express partial dependencies in algorithms in a processor- and cache-oblivious way, thus extending the Nested Parallel (NP) model to the Nested Dataflow (ND) model. We redesign several divide-and-conquer algorithms ranging from dense linear algebra to dynamic-programming in the ND model and prove that they all have optimal span while retaining optimal cache complexity. We propose the design of runtime schedulers that map ND programs to multicore processors with multiple levels of possibly shared caches (i.e, Parallel Memory Hierarchies) and prove guarantees on their ability to balance nodes across processors and preserve locality. For this, we adapt space-bounded (SB) schedulers for the ND model. We show that our algorithms have increased \"parallelizability\" in the ND model, and that SB schedulers can use the extra parallelizability to achieve asymptotically optimal bounds on cache misses and running time on a greater number of processors than in the NP model. The running time for the algorithms in this paper is O((∑i=0h-1 Q*(t;σ⋅ Mi)⋅ Ci)/p) on a p-processor machine, where Q* is the parallel cache complexity of task t, Ci is the cost of cache miss at level-i cache which is of size Mi, and σ∈(0,1) is a constant.","PeriodicalId":346939,"journal":{"name":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133890407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Randomized Approximate Nearest Neighbor Search with Limited Adaptivity","authors":"Mingmou Liu, Xiaoyin Pan, Yitong Yin","doi":"10.1145/2935764.2935776","DOIUrl":"https://doi.org/10.1145/2935764.2935776","url":null,"abstract":"We study the problem of approximate nearest neighbor search in $d$-dimensional Hamming space {0,1}d. We study the complexity of the problem in the famous cell-probe model, a classic model for data structures. We consider algorithms in the cell-probe model with limited adaptivity, where the algorithm makes k rounds of parallel accesses to the data structure for a given k. For any k ≥ 1, we give a simple randomized algorithm solving the approximate nearest neighbor search using k rounds of parallel memory accesses, with O(k(log d)1/k) accesses in total. We also give a more sophisticated randomized algorithm using O(k+(1/k log d)O(1/k)) memory accesses in k rounds for large enough k. Both algorithms use data structures of size polynomial in n, the number of points in the database. We prove an Ω(1/k(log d)1/k) lower bound for the total number of memory accesses required by any randomized algorithm solving the approximate nearest neighbor search within k ≤ (log log d)/(2 log log log d) rounds of parallel memory accesses on any data structures of polynomial size. This lower bound shows that our first algorithm is asymptotically optimal for any constant round k. And our second algorithm approaches the asymptotically optimal tradeoff between rounds and memory accesses, in a sense that the lower bound of memory accesses for any k1 rounds can be matched by the algorithm within k2=O(k1) rounds. In the extreme, for some large enough k=Θ((log log d)/(log log log d)), our second algorithm matches the Θ((log log d)/(log log log d)) tight bound for fully adaptive algorithms for approximate nearest neighbor search due to Chakrabarti and Regev.","PeriodicalId":346939,"journal":{"name":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126809362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallel Shortest Paths Using Radius Stepping","authors":"G. Blelloch, Yan Gu, Yihan Sun, Kanat Tangwongsan","doi":"10.1145/2935764.2935765","DOIUrl":"https://doi.org/10.1145/2935764.2935765","url":null,"abstract":"The single-source shortest path problem (SSSP) with nonnegative edge weights is notoriously difficult to solve efficiently in parallel---it is one of the graph problems said to suffer from the transitive-closure bottleneck. Yet, in practice, the Δ-stepping algorithm of Meyer and Sanders (J. Algorithms, 2003) often works efficiently but has no known theoretical bounds on general graphs. The algorithm takes a sequence of steps, each increasing the radius by a user-specified value Δ. Each step settles the vertices in its annulus but can take Θ(n) substeps, each requiring Θ(m) work (n vertices and m edges). Building on the success of Δ-stepping, this paper describes Radius Stepping, an algorithm with one of the best-known tradeoffs between work and depth bounds for SSSP with nearly-linear (~O(m)) work. The algorithm is a Δ-stepping-like algorithm but uses a variable instead of a fixed-size increase in radii, allowing us to prove a bound on the number of steps. In particular, by using what we define as a vertex k-radius, each step takes at most k+2 substeps. Furthermore, we define a (k, ρ)-graph property and show that if an undirected graph has this property, then the number of steps can be bounded by O(n/ρ log ρ L), for a total of O(kn/ρ log ρ L) substeps, each parallel. We describe how to preprocess a graph to have this property. Altogether, for an arbitrary input graph with n vertices and m edges, Radius Stepping, after preprocessing, takes O((m+nρ)log n) work and $O(n/ρ log n log (ρ L)) depth per source. The preprocessing step takes O(m log n + nρ2) work and O(ρlog ρ) depth, adding no more than O(nρ) edges.","PeriodicalId":346939,"journal":{"name":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"31 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128680363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andreas Cord-Landwehr, M. Fischer, Daniel Jung, F. Heide
{"title":"Asymptotically Optimal Gathering on a Grid","authors":"Andreas Cord-Landwehr, M. Fischer, Daniel Jung, F. Heide","doi":"10.1145/2935764.2935789","DOIUrl":"https://doi.org/10.1145/2935764.2935789","url":null,"abstract":"In this paper, we solve the local gathering problem of a swarm of n indistinguishable, point-shaped robots on a two-dimensional grid in asymptotically optimal time O(n) in the fully synchronous FSYNC time model. Given an arbitrarily distributed (yet connected) swarm of robots, the gathering problem on the grid is to locate all robots within a 2 x 2-sized area that is not known beforehand. Two robots are connected if they are vertical or horizontal neighbors on the grid. The locality constraint means that no global control, no compass, no global communication and only local vision is available; hence, a robot can see its grid neighbors only up to a constant L1-distance, which also limits its movements. A robot can move to one of its eight neighboring grid cells and if two or more robots move to the same location they are merged to be only one robot. The locality constraint is the significant challenging issue here, since robot movements must not harm the (only globally checkable) swarm connectivity. For solving the gathering problem, we provide a synchronous algorithm -- executed by every robot -- which ensures that robots merge without breaking the swarm connectivity. In our model, robots can obtain a special state, which marks such a robot to be performing specific connectivity preserving movements in order to allow later merge operations of the swarm. Compared to the grid, for gathering in the Euclidean plane for the same robot and time model the best known upper bound is O(n2).","PeriodicalId":346939,"journal":{"name":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130977254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Just Join for Parallel Ordered Sets","authors":"G. Blelloch, Daniel Ferizovic, Yihan Sun","doi":"10.1145/2935764.2935768","DOIUrl":"https://doi.org/10.1145/2935764.2935768","url":null,"abstract":"Ordered sets (and maps when data is associated with each key) are one of the most important and useful data types. The set-set functions union, intersection and difference are particularly useful in certain applications. Brown and Tarjan first described an algorithm for these functions, based on 2-3 trees, that meet the optimal Θ(m log (n/m+1)) time bounds in the comparison model (n and m ≤ n are the input sizes). Later Adams showed very elegant algorithms for the functions, and others, based on weight-balanced trees. They only require a single function that is specific to the balancing scheme---a function that joins two balanced trees---and hence can be applied to other balancing schemes. Furthermore the algorithms are naturally parallel. However, in the twenty-four years since, no one has shown that the algorithms, sequential or parallel are asymptotically work optimal. In this paper we show that Adams' algorithms are both work efficient and highly parallel (polylog span) across four different balancing schemes---AVL trees, red-black trees, weight balanced trees and treaps. To do this we use careful, but simple, algorithms for Join that maintain certain invariants, and our proof is (mostly) generic across the schemes. To understand how the algorithms perform in practice we have also implemented them (all code except Join is generic across the balancing schemes). Interestingly the implementations on all four balancing schemes and three set functions perform similarly in time and speedup (more than 45x on 64 cores). We also compare the performance of our implementation to other existing libraries and algorithms.","PeriodicalId":346939,"journal":{"name":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"13 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130856920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallel Metric Tree Embedding based on an Algebraic View on Moore-Bellman-Ford","authors":"Stephan Friedrichs, C. Lenzen","doi":"10.1145/2935764.2935777","DOIUrl":"https://doi.org/10.1145/2935764.2935777","url":null,"abstract":"A metric tree embedding of expected stretch α maps a weighted n-node graph G = (V, E, w) to a weighted tree T = (VT, ET, wT) with V ⊆ VT, and dist(v, w, G) ≤ dist(v, w, T) and E[dist(v, w, T)] ≤ α dist(v, w, G) for all v, w ∈ V. Such embeddings are highly useful for designing fast approximation algorithms, as many hard problems are easy to solve on tree instances. However, to date the best parallel polylog n depth algorithm that achieves an asymptotically optimal expected stretch of α ∈ Ω(log n) uses Ω(n2) work and requires a metric as input. In this paper, we show how to achieve the same guarantees using Ω(m1+ε) work, where $m$ is the number of edges of G and ε >0 is an arbitrarily small constant. Moreover, one may reduce the work further to Ω(m + n1+ε), at the expense of increasing the expected stretch α to Ω(ε-1 log n) using the spanner construction of Baswana and Sen as preprocessing step. Our main tool in deriving these parallel algorithms is an algebraic characterization of a generalization of the classic Moore-Bellman-Ford algorithm. We consider this framework, which subsumes a large variety of previous \"Moore-Bellman-Ford-flavored\" algorithms, to be of independent interest.","PeriodicalId":346939,"journal":{"name":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125328820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hossein Esfandiari, M. Hajiaghayi, David P. Woodruff
{"title":"Brief Announcement: Applications of Uniform Sampling: Densest Subgraph and Beyond","authors":"Hossein Esfandiari, M. Hajiaghayi, David P. Woodruff","doi":"10.1145/2935764.2935813","DOIUrl":"https://doi.org/10.1145/2935764.2935813","url":null,"abstract":"In this paper we provide a framework to analyze the effect of uniform sampling on graph optimization problems. Interestingly, we apply this framework to a general class of graph optimization problems that we call heavy subgraph problems, and show that uniform sampling preserves a 1-ε approximate solution to these problems. This class contains many interesting problems such as densest subgraph, directed densest subgraph, densest bipartite subgraph, d-max cut, and d-sum-max clustering. As an immediate impact of this result, one can use uniform sampling to solve these problems in streaming, turnstile or Map-Reduce settings. Indeed, our results by characterizing heavy subgraph problems address Open Problem 13 at the IITK Workshop on Algorithms for Data Streams in 2006 regarding the effects of subsampling, in the context of graph streams. Recently Bhattacharya et al. in STOC 2015 provide the first one pass algorithm for the densest subgraph problem in the streaming model with additions and deletions to its edges, i.e., for dynamic graph streams. They present a (0.5-ε)-approximation algorithm using ~O(n) space, where factors of ε and log(n) are suppressed in the ~O notation. In this paper we improve the (0.5-ε)-approximation algorithm of Bhattacharya et al. by providing a (1-ε)-approximation algorithm using ~O(n) space.","PeriodicalId":346939,"journal":{"name":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124664441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}