Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures最新文献_第3页

Brief Announcement: Benchmarking Concurrent Priority Queues: 简要公告:对并发优先队列进行基准测试:

Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2016-07-11 DOI: 10.1145/2935764.2935803

J. Gruber, J. Träff, Martin Wimmer

引用次数: 3

Brief Announcement: A Tight Distributed Algorithm for All Pairs Shortest Paths and Applications 简短公告:全对最短路径的紧密分布式算法及其应用

Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2016-07-11 DOI: 10.1145/2935764.2935812

Qiang-Sheng Hua, Haoqiang Fan, Lixiang Qian, Ming Ai, Yangyang Li, Xuanhua Shi, Hai Jin

{"title":"Brief Announcement: A Tight Distributed Algorithm for All Pairs Shortest Paths and Applications","authors":"Qiang-Sheng Hua, Haoqiang Fan, Lixiang Qian, Ming Ai, Yangyang Li, Xuanhua Shi, Hai Jin","doi":"10.1145/2935764.2935812","DOIUrl":"https://doi.org/10.1145/2935764.2935812","url":null,"abstract":"Given an unweighted and undirected graph, this paper aims to give a tight distributed algorithm for computing the all pairs shortest paths (APSP) under synchronous communications and the CONGEST(B) model, where each node can only transfer B bits of information along each incident edge in a round. The best previous results for distributively computing APSP need O(N+D) time where N is the number of nodes and D is the diameter [1,2]. However, there is still a B factor gap from the lower bound Ω(N/B+D) [1]. In order to close this gap, we propose a multiplexing technique to push the parallelization of distributed BFS tree constructions to the limit such that we can solve APSP in O(N/B+D) time which meets the lower bound. This result also implies a Θ(N/B+D) time distributed algorithm for diameter. In addition, we extend our distributed algorithm to compute girth which is the length of the shortest cycle and clustering coefficient (CC) which is related to counting the number of triangles incident to each node. The time complexities for computing these two graph properties are also O(N/B+D).","PeriodicalId":346939,"journal":{"name":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"213 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122807638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Fast and Robust Memory Reclamation for Concurrent Data Structures 面向并发数据结构的快速鲁棒内存回收

Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2016-07-11 DOI: 10.1145/2935764.2935790

Oana Balmau, R. Guerraoui, M. Herlihy, I. Zablotchi

{"title":"Fast and Robust Memory Reclamation for Concurrent Data Structures","authors":"Oana Balmau, R. Guerraoui, M. Herlihy, I. Zablotchi","doi":"10.1145/2935764.2935790","DOIUrl":"https://doi.org/10.1145/2935764.2935790","url":null,"abstract":"In concurrent systems without automatic garbage collection, it is challenging to determine when it is safe to reclaim memory, especially for lock-free data structures. Existing concurrent memory reclamation schemes are either fast but do not tolerate process delays, robust to delays but with high overhead, or both robust and fast but narrowly applicable. This paper proposes QSense, a novel concurrent memory reclamation technique. QSense is a hybrid technique with a fast path and a fallback path. In the common case (without process delays), a high-performing memory reclamation scheme is used (fast path). If process delays block memory reclamation through the fast path, a robust fallback path is used to guarantee progress. The fallback path uses hazard pointers, but avoids their notorious need for frequent and expensive memory fences. QSense is widely applicable, as we illustrate through several lock-free data structure algorithms. Our experimental evaluation shows that QSense has an overhead comparable to the fastest memory reclamation techniques, while still tolerating prolonged process delays.","PeriodicalId":346939,"journal":{"name":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"162 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121395702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 34

Parallel Approaches to the String Matching Problem on the GPU GPU上字符串匹配问题的并行方法

Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2016-07-11 DOI: 10.1145/2935764.2935800

Saman Ashkiani, N. Amenta, John Douglas Owens

{"title":"Parallel Approaches to the String Matching Problem on the GPU","authors":"Saman Ashkiani, N. Amenta, John Douglas Owens","doi":"10.1145/2935764.2935800","DOIUrl":"https://doi.org/10.1145/2935764.2935800","url":null,"abstract":"We design a family of parallel algorithms and GPU implementations for the exact string matching problem, based on Rabin-Karp (RK) randomized string matching. We describe and analyze three primary parallel approaches to binary string matching: cooperative (CRK), divide-and-conquer (DRK), and a novel hybrid of both (HRK). The CRK is most effective for large patterns (>8K characters), while the DRK approach is superior for shorter patterns. We then generalize the DRK to support any alphabet size without loss of performance. Our DRK method achieves up to a 64 GB/s processing rate on 8-character patterns from an 8-bit alphabet on an NVIDIA Tesla K40c GPU. We next demonstrate a novel parallel two-stage matching method (DRK-2S), which first skims the text for a smaller subset of the pattern and then verifies all potential matches in parallel. Our DRK-2S method is superior for pattern sizes up to 64k compared to the fastest CPU-based string matching implementations. With an 8-bit alphabet and up to 1k-character patterns, we get a geometric mean speedup of 4.81x against the best CPU methods, and can achieve a processing rate of at least 53 GB/s.","PeriodicalId":346939,"journal":{"name":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114923143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Brief Announcement: Flexible Resource Allocation for Clouds and All-Optical Networks 简报:面向云和全光网络的灵活资源分配

Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2016-07-11 DOI: 10.1145/2935764.2935806

Dmitriy A. Katz, B. Schieber, H. Shachnai

{"title":"Brief Announcement: Flexible Resource Allocation for Clouds and All-Optical Networks","authors":"Dmitriy A. Katz, B. Schieber, H. Shachnai","doi":"10.1145/2935764.2935806","DOIUrl":"https://doi.org/10.1145/2935764.2935806","url":null,"abstract":"Motivated by the cloud computing paradigm, and by key optimization problems in all-optical networks, we study two variants of the classic job interval scheduling problem, where a reusable resource is allocated to competing job intervals in a flexible manner. Each job, Ji, requires the use of up to rmax(i) units of the resource, with a profit of pi ≥ 1 accrued for each allocated unit. The goal is to feasibly schedule a subset of the jobs so as to maximize the total profit. The resource can be allocated either in contiguous or non-contiguous blocks. These problems can be viewed as flexible variants of the well known storage allocation and bandwidth allocation problems. We show that the contiguous version is strongly NP-hard, already for instances where all jobs have the same profit and the same maximum resource requirement. We derive the best possible positive result for such instances, namely, a polynomial time approximation scheme (PTAS). We further show that the contiguous variant admits a (5/4+ε)-approximation algorithm, for any fixed ε >0, on instances whose job intervals form a proper interval graph. At the heart of the algorithm lies a non-standard parameterization of the approximation ratio itself. For the non-contiguous case, we uncover an interesting relation to the paging problem that leads to a simple O(n log n) algorithm for uniform profit instances of n jobs.","PeriodicalId":346939,"journal":{"name":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128174357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Brief Announcement: Dynamic Determinacy Race Detection for Task Parallelism with Futures 简要公告:动态确定性竞争检测与未来任务并行

Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2016-07-11 DOI: 10.1145/2935764.2935815

R. Surendran, Vivek Sarkar

引用次数: 3

Parallel Algorithms for Asymmetric Read-Write Costs 非对称读写代价的并行算法

Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2016-07-11 DOI: 10.1145/2935764.2935767

N. Ben-David, G. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, Yan Gu, Charles McGuffey, Julian Shun

{"title":"Parallel Algorithms for Asymmetric Read-Write Costs","authors":"N. Ben-David, G. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, Yan Gu, Charles McGuffey, Julian Shun","doi":"10.1145/2935764.2935767","DOIUrl":"https://doi.org/10.1145/2935764.2935767","url":null,"abstract":"Motivated by the significantly higher cost of writing than reading in emerging memory technologies, we consider parallel algorithm design under such asymmetric read-write costs, with the goal of reducing the number of writes while preserving work-efficiency and low span. We present a nested-parallel model of computation that combines (i) small per-task stack-allocated memories with symmetric read-write costs and (ii) an unbounded heap-allocated shared memory with asymmetric read-write costs, and show how the costs in the model map efficiently onto a more concrete machine model under a work-stealing scheduler. We use the new model to design reduced write, work-efficient, low span parallel algorithms for a number of fundamental problems such as reduce, list contraction, tree contraction, breadth-first search, ordered filter, and planar convex hull. For the latter two problems, our algorithms are output-sensitive in that the work and number of writes decrease with the output size. We also present a reduced write, low span minimum spanning tree algorithm that is nearly work-efficient (off by the inverse Ackermann function). Our algorithms reveal several interesting techniques for significantly reducing shared memory writes in parallel algorithms without asymptotically increasing the number of shared memory reads.","PeriodicalId":346939,"journal":{"name":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132386338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 45

Universal Shape Formation for Programmable Matter 可编程物质的通用形状形成

Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2016-07-11 DOI: 10.1145/2935764.2935784

Zahra Derakhshandeh, R. Gmyr, A. Richa, C. Scheideler, Thim Strothmann

{"title":"Universal Shape Formation for Programmable Matter","authors":"Zahra Derakhshandeh, R. Gmyr, A. Richa, C. Scheideler, Thim Strothmann","doi":"10.1145/2935764.2935784","DOIUrl":"https://doi.org/10.1145/2935764.2935784","url":null,"abstract":"We envision programmable matter consisting of systems of computationally limited devices (which we call particles) that are able to self-organize in order to achieve a desired collective goal without the need for central control or external intervention. Central problems for these particle systems are shape formation and coating problems. In this paper, we present a universal shape formation algorithm which takes an arbitrary shape composed of a constant number of equilateral triangles of unit size and lets the particles build that shape at a scale depending on the number of particles in the system. Our algorithm runs in O(√n) asynchronous execution rounds, where $n$ is the number of particles in the system, provided we start from a well-initialized configuration of the particles. This is optimal in a sense that for any shape deviating from the initial configuration, any movement strategy would require Ω(√n) rounds in the worst case (over all asynchronous activations of the particles). Our algorithm relies only on local information (e.g., particles do not have ids, nor do they know n, or have any sort of global coordinate system), and requires only a constant-size memory per particle.","PeriodicalId":346939,"journal":{"name":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127822209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 73

On Computational Thinking, Inferential Thinking and Data Science 论计算思维、推理思维与数据科学

Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2016-07-11 DOI: 10.1145/2935764.2935826

Michael I. Jordan

{"title":"On Computational Thinking, Inferential Thinking and Data Science","authors":"Michael I. Jordan","doi":"10.1145/2935764.2935826","DOIUrl":"https://doi.org/10.1145/2935764.2935826","url":null,"abstract":"The rapid growth in the size and scope of datasets in science and technology has created a need for novel foundational perspectives on data analysis that blend the inferential and computational sciences. That classical perspectives from these fields are not adequate to address emerging problems in \"Big Data\" is apparent from their sharply divergent nature at an elementary level-in computer science, the growth of the number of data points is a source of \"complexity\" that must be tamed via algorithms or hardware, whereas in statistics, the growth of the number of data points is a source of \"simplicity\" in that inferences are generally stronger and asymptotic results can be invoked. On a formal level, the gap is made evident by the lack of a role for computational concepts such as \"runtime\" in core statistical theory and the lack of a role for statistical concepts such as \"risk\" in core computational theory. I present several research vignettes aimed at bridging computation and statistics, including the problem of inference under privacy and communication constraints, and ways to exploit parallelism so as to trade off the speed and accuracy of inference.","PeriodicalId":346939,"journal":{"name":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124848936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Extending TM Primitives using Low Level Semantics 使用低级语义扩展TM原语

Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2016-07-11 DOI: 10.1145/2935764.2935794

Mohamed M. Saad, R. Palmieri, Ahmed Hassan, B. Ravindran

引用次数: 5