Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures最新文献_第4页

Near-Optimal Distributed Algorithms for Fault-Tolerant Tree Structures 容错树结构的近最优分布式算法

Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2016-07-11 DOI: 10.1145/2935764.2935795

M. Ghaffari, M. Parter

引用次数: 18

A Practical Solution to the Cactus Stack Problem 仙人掌堆问题的实用解决方案

Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2016-07-11 DOI: 10.1145/2935764.2935787

Chaoran Yang, J. Mellor-Crummey

{"title":"A Practical Solution to the Cactus Stack Problem","authors":"Chaoran Yang, J. Mellor-Crummey","doi":"10.1145/2935764.2935787","DOIUrl":"https://doi.org/10.1145/2935764.2935787","url":null,"abstract":"Work-stealing is a popular method for load-balancing dynamic multithreaded computations on shared-memory systems. In theory, a randomized work-stealing scheduler can achieve near linear speedup when the computation has sufficient parallelism and requires stack space that is linear in the number of processors. In practice, however, work-stealing runtimes sacrifice interoperability with serial code to achieve these bounds. For example, both Cilk and Cilk++ prohibit a C function from calling aCilk function. Other work-stealing runtime systems that do not have this restriction either lack a strong time bound, which might cause them to deliver little or no speedup in the worst case, or lack a strong space bound, which might lead to an excessive memory footprint. This problem was previously described as the cactus stack problem. In this paper, we present Fibril, a new multithreading library that supports a fork-join programming model using work-stealing. Fibril solves the cactus stack problem by (1) implementing on a cactus stack that conforms to the calling conventions of serial code and (2) returning unused memory pages of suspended stacks to the operating system to bound consumption of physical memory. Theoretically, Fibril achieves strong bounds on both time and memory usage without sacrificing interoperability with serial code. Empirically, Fibril achieves up to 3x the performance of Intel Cilk Plus and up to 8x the performance of Intel Threading Building Blocks for the 12 benchmarks we evaluated.","PeriodicalId":346939,"journal":{"name":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"201 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124402400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Scheduling Parallelizable Jobs Online to Minimize the Maximum Flow Time 在线调度可并行作业以最小化最大流程时间

Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2016-07-11 DOI: 10.1145/2935764.2935782

Kunal Agrawal, Jing Li, Kefu Lu, Benjamin Moseley

{"title":"Scheduling Parallelizable Jobs Online to Minimize the Maximum Flow Time","authors":"Kunal Agrawal, Jing Li, Kefu Lu, Benjamin Moseley","doi":"10.1145/2935764.2935782","DOIUrl":"https://doi.org/10.1145/2935764.2935782","url":null,"abstract":"In this paper we study the problem of scheduling a set of dynamic multithreaded jobs with the objective of minimizing the maximum latency experienced by any job. We assume that jobs arrive online and the scheduler has no information about the arrival rate, arrival time or work distribution of the jobs. The scheduling goal is to minimize the maximum amount of time between the arrival of a job and its completion --- this goal is referred to in scheduling literature as maximum flow time. While theoretical online scheduling of parallel jobs has been studied extensively, most prior work has focussed on a highly stylized model of parallel jobs called the \"speedup curves model.\" We model parallel jobs as directed acyclic graphs, which is a more realistic way to model dynamic multithreaded jobs. In this context, we prove that a simple First-In-First-Out scheduler is (1+ε)-speed O(1/ε)-competitive for any ε >0. We then develop a more practical work-stealing scheduler and show that it has a maximum flow time of O(1/ε2 max{opt,ln(n)}) for n jobs, with (1+ε)-speed. This result is essentially tight as we also provide a lower bound of Ω(log(n)) for work stealing. In addition, for the case where jobs have weights (typically representing priorities) and the objective is minimizing the maximum weighted flow time, we show a non-clairvoyant algorithm is (1+ε)-speed O(1/ε2)-competitive for any ε >0, which is essentially the best positive result that can be shown in the online setting for the weighted case due to strong lower bounds without resource augmentation. After establishing theoretical results, we perform an empirical study of work-stealing. Our results indicate that, on both real world and synthetic workloads, work-stealing performs almost as well as an optimal scheduler.","PeriodicalId":346939,"journal":{"name":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116458484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures 第28届ACM并行算法和架构研讨会论文集

Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2016-07-11 DOI: 10.1145/2935764

C. Scheideler, Seth Gilbert

{"title":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","authors":"C. Scheideler, Seth Gilbert","doi":"10.1145/2935764","DOIUrl":"https://doi.org/10.1145/2935764","url":null,"abstract":"It is my great pleasure to welcome you to the 28thACM Symposium on Parallelism in Algorithms and Architectures. \u0000 \u0000The goal of SPAA is to develop a deeper understanding of parallelism in all its forms, bringing together the theory and practice of parallel computing. This year's program reflects that goal, with a diverse selection of papers at the cutting edge of parallel computing. The program includes 38 regular papers and 14 brief announcements, as well as keynote talks by Michael I. Jordan and Nir Shavit. \u0000 \u0000Traditional topics in parallelism are well represented at SPAA this year. The program includes papers on parallel algorithms for classical questions (e.g., sorting and graph problems, see Sessions 9 and 14). It includes papers on scheduling parallel computations (see Session 3) and scheduling tasks in parallel systems (see Sessions 6 and 8). The program also includes papers on concurrent data structures (see Session 11), and on parallelism in distributed systems (see Session 13). These topics all have a long history at SPAA. \u0000 \u0000Over the last several years, the study of parallelism has expanded to include new models of parallel computation (e.g., Map-Reduce, see Session 1), new architectures (e.g., GPUs, see Session 9), new techniques for managing parallelism (e.g., transactional memory, see Session 4), and new types of parallel systems (e.g., programmable matter, see Session 10). These increasingly important topics are represented at SPAA this year. \u0000 \u0000The best paper award for SPAA 2016 is awarded to a paper focusing on the limitations of certain new models of parallel computation: \u0000Shuffles and Circuits (On Lower Bounds for Modern Parallel Computation) by Tim Roughgarden, Sergei Vassilvitskii and Joshua Wang. \u0000 \u0000 \u0000 \u0000The authors develop lower bounds on the speed of large-scale parallel computation in a model meant to capture the capabilities of Map-Reduce and Hadoop. They discover an important connection between these computations and polynomials representing boolean functions, and use this fact to show lower bounds for a variety of natural and important problems. \u0000 \u0000We would also like to recognize (in no particular order) three finalists for the best paper award: \u0000Randomized approximate nearest neighbor search with limited adaptivity by Mingmou Liu, Xiaoyin Pan and Yitong Yin. \u0000Robust and Probabilistic Failure-Aware Placement by Madhukar Korupolu and Rajmohan Rajaraman. \u0000Lock-free Transactions without Aborts for Linked Data Structures by Deli Zhang and Damian Dechev \u0000 \u0000 \u0000 \u0000These papers highlight the variety of exciting work in parallelism that is represented at SPAA 2016.","PeriodicalId":346939,"journal":{"name":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129889445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Lock-free Transactions without Rollbacks for Linked Data Structures 链接数据结构无回滚的无锁事务

Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2016-07-11 DOI: 10.1145/2935764.2935780

Deli Zhang, D. Dechev

{"title":"Lock-free Transactions without Rollbacks for Linked Data Structures","authors":"Deli Zhang, D. Dechev","doi":"10.1145/2935764.2935780","DOIUrl":"https://doi.org/10.1145/2935764.2935780","url":null,"abstract":"Non-blocking data structures allow scalable and thread-safe accesses to shared data. They provide individual operations that appear to execute atomically. However, it is often desirable to execute multiple operations atomically in a transactional manner. Previous solutions, such as software transactional memory (STM) and transactional boosting, manage transaction synchronization in an external layer separated from the data structure's own thread-level concurrency control. Although this reduces programming effort, it leads to overhead associated with additional synchronization and the need to rollback aborted transactions. In this work, we present a new methodology for transforming high-performance lock-free linked data structures into high-performance lock-free transactional linked data structures without revamping the data structures' original synchronization design. Our approach leverages the semantic knowledge of the data structure to eliminate the overhead of false conflicts and rollbacks. We encapsulate all operations, operands, and transaction status in a transaction descriptor, which is shared among the nodes accessed by the same transaction. We coordinate threads to help finish the remaining operations of delayed transactions based on their transaction descriptors. When transaction fails, we recover the correct abstract state by reversely interpreting the logical status of a node. In our experimental evaluation using transactions with randomly generated operations, our lock-free transactional lists and skiplist outperform the transactional boosted ones by 40% on average and as much as 125% for large transactions. They also outperform the alternative STM-based approaches by a factor of 3 to 10 across all scenarios. More importantly, we achieve 4 to 6 orders of magnitude less spurious aborts than the alternatives.","PeriodicalId":346939,"journal":{"name":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131217379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 35

Robust and Probabilistic Failure-Aware Placement 鲁棒和概率故障感知放置

Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2016-07-11 DOI: 10.1145/2935764.2935802

M. Korupolu, R. Rajaraman

{"title":"Robust and Probabilistic Failure-Aware Placement","authors":"M. Korupolu, R. Rajaraman","doi":"10.1145/2935764.2935802","DOIUrl":"https://doi.org/10.1145/2935764.2935802","url":null,"abstract":"Motivated by the growing complexity and heterogeneity of modern data centers, and the prevalence of commodity component failures, this paper studies the failure-aware placement problem of placing tasks of a parallel job on machines in the data center with the goal of increasing availability. We consider two models of failures: adversarial and probabilistic. In the adversarial model, each node has a weight (higher weight implying higher reliability) and the adversary can remove any subset of nodes of total weight at most a given bound W and our goal is to find a placement that incurs the least disruption against such an adversary. In the probabilistic model, each node has a probability of failure and we need to find a placement that maximizes the probability that at least K out of N tasks survive at any time. For adversarial failures, we first show that (i) the problems are in Σ2, the second level of the polynomial hierarchy, (ii) a basic variant, that we call RobustFAP, is co-NP-hard, and (iii) an all-or-nothing version of RobustFAP is Σ2-complete. We then give a PTAS for RobustFAP, a key ingredient of which is a solution that we design for a fractional version of RobustFAP. We then study fractional RobustFAP over hierarchies, denoted HierRobustFAP, and introduce a notion of hierarchical max-min fairness/ and a novel Generalized Spreading/ algorithm which is simultaneously optimal for all W. These generalize the classical notion of max-min fairness to work with nodes of differing capacities, differing reliability weights and hierarchical structures. Using randomized rounding, we extend this to give an algorithm for integral HierRobustFAP. For the probabilistic version, we first give an algorithm that achieves an additive ε approximation in the failure probability for the single level version, called ProbFAP, while giving up a (1 + ε) multiplicative factor in the number of failures. We then extend the result to the hierarchical version, HierProbFAP, achieving an ε additive approximation in failure probability while giving up an (L + ε) multiplicative factor in the number of failures, where $L$ is the number of levels in the hierarchy.","PeriodicalId":346939,"journal":{"name":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124396529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Brief Announcement: Transactional Data Structure Libraries 简短公告:事务性数据结构库

Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2016-07-11 DOI: 10.1145/2935764.2935805

A. Spiegelman, Guy Golan-Gueta, I. Keidar

引用次数: 0

Brief Announcement: MIC++: Accelerating Maximal Information Coefficient Calculation with GPUs and FPGAs 简短公告:MIC++:利用gpu和fpga加速最大信息系数计算

Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2016-07-11 DOI: 10.1145/2935764.2935804

Chao Wang, Xi Li, Aili Wang, Xuehai Zhou

引用次数: 2

Brief Announcement: Energy Optimization of Memory Intensive Parallel Workloads 简要公告:内存密集型并行工作负载的能量优化

Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2016-07-11 DOI: 10.1145/2935764.2935811

Chhaya Trehan, H. Vandierendonck, G. Karakonstantis, Dimitrios S. Nikolopoulos

{"title":"Brief Announcement: Energy Optimization of Memory Intensive Parallel Workloads","authors":"Chhaya Trehan, H. Vandierendonck, G. Karakonstantis, Dimitrios S. Nikolopoulos","doi":"10.1145/2935764.2935811","DOIUrl":"https://doi.org/10.1145/2935764.2935811","url":null,"abstract":"Energy consumption is an important concern in modern multicore processors. The energy consumed during the execution of an application can be minimized by tuning the hardware state utilizing knobs such as frequency, voltage etc. The existing theoretical work on energy minimization using Global DVFS (Dynamic Voltage and Frequency Scaling), despite being thorough, ignores the energy consumed by the CPU on memory accesses and the dynamic energy consumed by the idle cores. This article presents an analytical energy-performance model for parallel workloads that accounts for the energy consumed by the CPU chip on memory accesses in addition to the energy consumed on CPU instructions. In addition, the model we present also accounts for the dynamic energy consumed by the idle cores. We present an analytical framework around our energy-performance model to predict the operating frequencies for global DVFS that minimize the overall CPU energy consumption. We show how the optimal frequencies in our model differ from the optimal frequencies in a model that does not account for memory accesses.","PeriodicalId":346939,"journal":{"name":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126077880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Parallel Algorithms for Summing Floating-Point Numbers 浮点数求和的并行算法

Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2016-05-18 DOI: 10.1145/2935764.2935779

M. Goodrich, A. Eldawy

引用次数: 2