Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures最新文献

筛选
英文 中文
Brief Announcement: Benchmarking Concurrent Priority Queues: 简要公告:对并发优先队列进行基准测试:
J. Gruber, J. Träff, Martin Wimmer
{"title":"Brief Announcement: Benchmarking Concurrent Priority Queues:","authors":"J. Gruber, J. Träff, Martin Wimmer","doi":"10.1145/2935764.2935803","DOIUrl":"https://doi.org/10.1145/2935764.2935803","url":null,"abstract":"A number of concurrent, relaxed priority queues have recently been proposed and implemented. Results are commonly reported for a throughput benchmark that uses a uniform distribution of keys from a large integer range, a balanced mixture of operations, and mostly for single systems. We have conducted more extensive benchmarking of three recent, relaxed priority queues on four different types of systems with different key ranges and distributions. While we can show superior throughput and scalability for our own k-LSM priority queue for the uniform key distribution, the picture changes drastically for other distributions, both with respect to achieved throughput and relative merit of the priority queues. The throughput benchmark alone is thus not sufficient to characterize the performance of concurrent priority queues. Our priority queue, benchmark code and full set of results are publicly available to foster comparison.","PeriodicalId":346939,"journal":{"name":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127395042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Brief Announcement: A Tight Distributed Algorithm for All Pairs Shortest Paths and Applications 简短公告:全对最短路径的紧密分布式算法及其应用
Qiang-Sheng Hua, Haoqiang Fan, Lixiang Qian, Ming Ai, Yangyang Li, Xuanhua Shi, Hai Jin
{"title":"Brief Announcement: A Tight Distributed Algorithm for All Pairs Shortest Paths and Applications","authors":"Qiang-Sheng Hua, Haoqiang Fan, Lixiang Qian, Ming Ai, Yangyang Li, Xuanhua Shi, Hai Jin","doi":"10.1145/2935764.2935812","DOIUrl":"https://doi.org/10.1145/2935764.2935812","url":null,"abstract":"Given an unweighted and undirected graph, this paper aims to give a tight distributed algorithm for computing the all pairs shortest paths (APSP) under synchronous communications and the CONGEST(B) model, where each node can only transfer B bits of information along each incident edge in a round. The best previous results for distributively computing APSP need O(N+D) time where N is the number of nodes and D is the diameter [1,2]. However, there is still a B factor gap from the lower bound Ω(N/B+D) [1]. In order to close this gap, we propose a multiplexing technique to push the parallelization of distributed BFS tree constructions to the limit such that we can solve APSP in O(N/B+D) time which meets the lower bound. This result also implies a Θ(N/B+D) time distributed algorithm for diameter. In addition, we extend our distributed algorithm to compute girth which is the length of the shortest cycle and clustering coefficient (CC) which is related to counting the number of triangles incident to each node. The time complexities for computing these two graph properties are also O(N/B+D).","PeriodicalId":346939,"journal":{"name":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"213 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122807638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Fast and Robust Memory Reclamation for Concurrent Data Structures 面向并发数据结构的快速鲁棒内存回收
Oana Balmau, R. Guerraoui, M. Herlihy, I. Zablotchi
{"title":"Fast and Robust Memory Reclamation for Concurrent Data Structures","authors":"Oana Balmau, R. Guerraoui, M. Herlihy, I. Zablotchi","doi":"10.1145/2935764.2935790","DOIUrl":"https://doi.org/10.1145/2935764.2935790","url":null,"abstract":"In concurrent systems without automatic garbage collection, it is challenging to determine when it is safe to reclaim memory, especially for lock-free data structures. Existing concurrent memory reclamation schemes are either fast but do not tolerate process delays, robust to delays but with high overhead, or both robust and fast but narrowly applicable. This paper proposes QSense, a novel concurrent memory reclamation technique. QSense is a hybrid technique with a fast path and a fallback path. In the common case (without process delays), a high-performing memory reclamation scheme is used (fast path). If process delays block memory reclamation through the fast path, a robust fallback path is used to guarantee progress. The fallback path uses hazard pointers, but avoids their notorious need for frequent and expensive memory fences. QSense is widely applicable, as we illustrate through several lock-free data structure algorithms. Our experimental evaluation shows that QSense has an overhead comparable to the fastest memory reclamation techniques, while still tolerating prolonged process delays.","PeriodicalId":346939,"journal":{"name":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"162 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121395702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Parallel Approaches to the String Matching Problem on the GPU GPU上字符串匹配问题的并行方法
Saman Ashkiani, N. Amenta, John Douglas Owens
{"title":"Parallel Approaches to the String Matching Problem on the GPU","authors":"Saman Ashkiani, N. Amenta, John Douglas Owens","doi":"10.1145/2935764.2935800","DOIUrl":"https://doi.org/10.1145/2935764.2935800","url":null,"abstract":"We design a family of parallel algorithms and GPU implementations for the exact string matching problem, based on Rabin-Karp (RK) randomized string matching. We describe and analyze three primary parallel approaches to binary string matching: cooperative (CRK), divide-and-conquer (DRK), and a novel hybrid of both (HRK). The CRK is most effective for large patterns (>8K characters), while the DRK approach is superior for shorter patterns. We then generalize the DRK to support any alphabet size without loss of performance. Our DRK method achieves up to a 64 GB/s processing rate on 8-character patterns from an 8-bit alphabet on an NVIDIA Tesla K40c GPU. We next demonstrate a novel parallel two-stage matching method (DRK-2S), which first skims the text for a smaller subset of the pattern and then verifies all potential matches in parallel. Our DRK-2S method is superior for pattern sizes up to 64k compared to the fastest CPU-based string matching implementations. With an 8-bit alphabet and up to 1k-character patterns, we get a geometric mean speedup of 4.81x against the best CPU methods, and can achieve a processing rate of at least 53 GB/s.","PeriodicalId":346939,"journal":{"name":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114923143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Brief Announcement: Flexible Resource Allocation for Clouds and All-Optical Networks 简报:面向云和全光网络的灵活资源分配
Dmitriy A. Katz, B. Schieber, H. Shachnai
{"title":"Brief Announcement: Flexible Resource Allocation for Clouds and All-Optical Networks","authors":"Dmitriy A. Katz, B. Schieber, H. Shachnai","doi":"10.1145/2935764.2935806","DOIUrl":"https://doi.org/10.1145/2935764.2935806","url":null,"abstract":"Motivated by the cloud computing paradigm, and by key optimization problems in all-optical networks, we study two variants of the classic job interval scheduling problem, where a reusable resource is allocated to competing job intervals in a flexible manner. Each job, Ji, requires the use of up to rmax(i) units of the resource, with a profit of pi ≥ 1 accrued for each allocated unit. The goal is to feasibly schedule a subset of the jobs so as to maximize the total profit. The resource can be allocated either in contiguous or non-contiguous blocks. These problems can be viewed as flexible variants of the well known storage allocation and bandwidth allocation problems. We show that the contiguous version is strongly NP-hard, already for instances where all jobs have the same profit and the same maximum resource requirement. We derive the best possible positive result for such instances, namely, a polynomial time approximation scheme (PTAS). We further show that the contiguous variant admits a (5/4+ε)-approximation algorithm, for any fixed ε >0, on instances whose job intervals form a proper interval graph. At the heart of the algorithm lies a non-standard parameterization of the approximation ratio itself. For the non-contiguous case, we uncover an interesting relation to the paging problem that leads to a simple O(n log n) algorithm for uniform profit instances of n jobs.","PeriodicalId":346939,"journal":{"name":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128174357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Brief Announcement: Dynamic Determinacy Race Detection for Task Parallelism with Futures 简要公告:动态确定性竞争检测与未来任务并行
R. Surendran, Vivek Sarkar
{"title":"Brief Announcement: Dynamic Determinacy Race Detection for Task Parallelism with Futures","authors":"R. Surendran, Vivek Sarkar","doi":"10.1145/2935764.2935815","DOIUrl":"https://doi.org/10.1145/2935764.2935815","url":null,"abstract":"Existing dynamic determinacy race detectors for task-parallel programs are limited to programs with strict computation graphs, where a task can only wait for its descendant tasks to complete. In this paper, we present the first known determinacy race detector for non-strict computation graphs with futures. The space and time complexity of our algorithm are similar to those of the classical SP-bags algorithm, when using only structured parallel constructs such as spawn-sync and async-finish. In the presence of point-to-point synchronization using futures, the complexity of the algorithm increases by a factor determined by the number of future operations, which includes future task creation and future get operations. The experimental results show that the slowdown factor observed for our algorithm relative to the sequential version is in the range of 1.00x to 9.92x, which is very much in line with slowdowns experienced for fully strict computation graphs.","PeriodicalId":346939,"journal":{"name":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131951886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Parallel Algorithms for Asymmetric Read-Write Costs 非对称读写代价的并行算法
N. Ben-David, G. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, Yan Gu, Charles McGuffey, Julian Shun
{"title":"Parallel Algorithms for Asymmetric Read-Write Costs","authors":"N. Ben-David, G. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, Yan Gu, Charles McGuffey, Julian Shun","doi":"10.1145/2935764.2935767","DOIUrl":"https://doi.org/10.1145/2935764.2935767","url":null,"abstract":"Motivated by the significantly higher cost of writing than reading in emerging memory technologies, we consider parallel algorithm design under such asymmetric read-write costs, with the goal of reducing the number of writes while preserving work-efficiency and low span. We present a nested-parallel model of computation that combines (i) small per-task stack-allocated memories with symmetric read-write costs and (ii) an unbounded heap-allocated shared memory with asymmetric read-write costs, and show how the costs in the model map efficiently onto a more concrete machine model under a work-stealing scheduler. We use the new model to design reduced write, work-efficient, low span parallel algorithms for a number of fundamental problems such as reduce, list contraction, tree contraction, breadth-first search, ordered filter, and planar convex hull. For the latter two problems, our algorithms are output-sensitive in that the work and number of writes decrease with the output size. We also present a reduced write, low span minimum spanning tree algorithm that is nearly work-efficient (off by the inverse Ackermann function). Our algorithms reveal several interesting techniques for significantly reducing shared memory writes in parallel algorithms without asymptotically increasing the number of shared memory reads.","PeriodicalId":346939,"journal":{"name":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132386338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 45
Universal Shape Formation for Programmable Matter 可编程物质的通用形状形成
Zahra Derakhshandeh, R. Gmyr, A. Richa, C. Scheideler, Thim Strothmann
{"title":"Universal Shape Formation for Programmable Matter","authors":"Zahra Derakhshandeh, R. Gmyr, A. Richa, C. Scheideler, Thim Strothmann","doi":"10.1145/2935764.2935784","DOIUrl":"https://doi.org/10.1145/2935764.2935784","url":null,"abstract":"We envision programmable matter consisting of systems of computationally limited devices (which we call particles) that are able to self-organize in order to achieve a desired collective goal without the need for central control or external intervention. Central problems for these particle systems are shape formation and coating problems. In this paper, we present a universal shape formation algorithm which takes an arbitrary shape composed of a constant number of equilateral triangles of unit size and lets the particles build that shape at a scale depending on the number of particles in the system. Our algorithm runs in O(√n) asynchronous execution rounds, where $n$ is the number of particles in the system, provided we start from a well-initialized configuration of the particles. This is optimal in a sense that for any shape deviating from the initial configuration, any movement strategy would require Ω(√n) rounds in the worst case (over all asynchronous activations of the particles). Our algorithm relies only on local information (e.g., particles do not have ids, nor do they know n, or have any sort of global coordinate system), and requires only a constant-size memory per particle.","PeriodicalId":346939,"journal":{"name":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127822209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 73
On Computational Thinking, Inferential Thinking and Data Science 论计算思维、推理思维与数据科学
Michael I. Jordan
{"title":"On Computational Thinking, Inferential Thinking and Data Science","authors":"Michael I. Jordan","doi":"10.1145/2935764.2935826","DOIUrl":"https://doi.org/10.1145/2935764.2935826","url":null,"abstract":"The rapid growth in the size and scope of datasets in science and technology has created a need for novel foundational perspectives on data analysis that blend the inferential and computational sciences. That classical perspectives from these fields are not adequate to address emerging problems in \"Big Data\" is apparent from their sharply divergent nature at an elementary level-in computer science, the growth of the number of data points is a source of \"complexity\" that must be tamed via algorithms or hardware, whereas in statistics, the growth of the number of data points is a source of \"simplicity\" in that inferences are generally stronger and asymptotic results can be invoked. On a formal level, the gap is made evident by the lack of a role for computational concepts such as \"runtime\" in core statistical theory and the lack of a role for statistical concepts such as \"risk\" in core computational theory. I present several research vignettes aimed at bridging computation and statistics, including the problem of inference under privacy and communication constraints, and ways to exploit parallelism so as to trade off the speed and accuracy of inference.","PeriodicalId":346939,"journal":{"name":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124848936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Extending TM Primitives using Low Level Semantics 使用低级语义扩展TM原语
Mohamed M. Saad, R. Palmieri, Ahmed Hassan, B. Ravindran
{"title":"Extending TM Primitives using Low Level Semantics","authors":"Mohamed M. Saad, R. Palmieri, Ahmed Hassan, B. Ravindran","doi":"10.1145/2935764.2935794","DOIUrl":"https://doi.org/10.1145/2935764.2935794","url":null,"abstract":"Transactional Memory (TM) has recently emerged as an optimistic concurrency control technique that isolates concurrent executions at the level of memory reads and writes, therefore providing an easy programming interface. However, such transparency could be overly conservative from an application-level perspective. In this work, we propose an extension to the classical TM primitives (read and write) to capture program code semantics (e.g., conditional expressions) while maintaining the same level of programming abstraction. We deployed this extension on two state-of-the-art STM algorithms and integrated it into the GCC compiler and the RSTM software framework. Results showed speedups of up to 4x (average 1.6x) on different applications including micro benchmarks and STAMP.","PeriodicalId":346939,"journal":{"name":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114546231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信