ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming最新文献_第9页

TigerQuoll: parallel event-based JavaScript TigerQuoll:并行的基于事件的JavaScript

ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming Pub Date : 2013-02-23 DOI: 10.1145/2442516.2442541

Daniele Bonetta, Walter Binder, C. Pautasso

引用次数: 8

Betweenness centrality: algorithms and implementations 中间中心性:算法和实现

ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming Pub Date : 2013-02-23 DOI: 10.1145/2442516.2442521

Dimitrios Prountzos, K. Pingali

{"title":"Betweenness centrality: algorithms and implementations","authors":"Dimitrios Prountzos, K. Pingali","doi":"10.1145/2442516.2442521","DOIUrl":"https://doi.org/10.1145/2442516.2442521","url":null,"abstract":"Betweenness centrality is an important metric in the study of social networks, and several algorithms for computing this metric exist in the literature. This paper makes three contributions. First, we show that the problem of computing betweenness centrality can be formulated abstractly in terms of a small set of operators that update the graph. Second, we show that existing parallel algorithms for computing betweenness centrality can be viewed as implementations of different schedules for these operators, permitting all these algorithms to be formulated in a single framework. Third, we derive a new asynchronous parallel algorithm for betweenness centrality that (i) works seamlessly for both weighted and unweighted graphs, (ii) can be applied to large graphs, and (iii) is able to extract large amounts of parallelism. We implemented this algorithm and compared it against a number of publicly available implementations of previous algorithms on two different multicore architectures. Our results show that the new algorithm is the best performing one in most cases, particularly for large graphs and large thread counts, and is always competitive against other algorithms.","PeriodicalId":286119,"journal":{"name":"ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116873355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 43

Data-only flattening for nested data parallelism 用于嵌套数据并行的数据平坦化

ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming Pub Date : 2013-02-23 DOI: 10.1145/2442516.2442525

Lars Bergstrom, M. Fluet, Mike Rainey, John H. Reppy, Stephen Rosen, Adam Shaw

{"title":"Data-only flattening for nested data parallelism","authors":"Lars Bergstrom, M. Fluet, Mike Rainey, John H. Reppy, Stephen Rosen, Adam Shaw","doi":"10.1145/2442516.2442525","DOIUrl":"https://doi.org/10.1145/2442516.2442525","url":null,"abstract":"Data parallelism has proven to be an effective technique for high-level programming of a certain class of parallel applications, but it is not well suited to irregular parallel computations. Blelloch and others proposed nested data parallelism (NDP) as a language mechanism for programming irregular parallel applications in a declarative data-parallel style. The key to this approach is a compiler transformation that flattens the NDP computation and data structures into a form that can be executed efficiently on a wide-vector SIMD architecture. Unfortunately, this technique is ill suited to execution on today's multicore machines. We present a new technique, called data-only flattening, for the compilation of NDP, which is suitable for multicore architectures. Data-only flattening transforms nested data structures in order to expose programs to various optimizations while leaving control structures intact. We present a formal semantics of data-only flattening in a core language with a rewriting system. We demonstrate the effectiveness of this technique in the Parallel ML implementation and we report encouraging experimental results across various benchmark applications.","PeriodicalId":286119,"journal":{"name":"ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124971511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 29

Parallel programming with big operators 使用大运算符的并行编程

ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming Pub Date : 2013-02-23 DOI: 10.1145/2442516.2442551

Changhee Park, G. Steele, Jean-Baptiste Tristan

{"title":"Parallel programming with big operators","authors":"Changhee Park, G. Steele, Jean-Baptiste Tristan","doi":"10.1145/2442516.2442551","DOIUrl":"https://doi.org/10.1145/2442516.2442551","url":null,"abstract":"In the sciences, it is common to use the so-called \"big operator\" notation to express the iteration of a binary operator (the reducer) over a collection of values. Such a notation typically assumes that the reducer is associative and abstracts the iteration process. Consequently, from a programming point-of-view, we can organize the reducer operations to minimize the depth of the overall reduction, allowing a potentially parallel evaluation of a big operator expression. We believe that the big operator notation is indeed an effective construct to express parallel computations in the Generate/Map/Reduce programming model, and our goal is to introduce it in programming languages to support parallel programming. The effective definition of such a big operator expression requires a simple way to generate elements, and a simple way to declare algebraic properties of the reducer (such as its identity, or its commutativity). In this poster, we want to present an extension of Scala with support for big operator expressions. We show how big operator expressions are defined and how the API is organized to support the simple definition of reducers with their algebraic properties.","PeriodicalId":286119,"journal":{"name":"ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123156415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Online-ABFT: an online algorithm based fault tolerance scheme for soft error detection in iterative methods 在线abft:一种基于在线算法的迭代法软错误检测容错方案

ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming Pub Date : 2013-02-23 DOI: 10.1145/2442516.2442533

Zizhong Chen

{"title":"Online-ABFT: an online algorithm based fault tolerance scheme for soft error detection in iterative methods","authors":"Zizhong Chen","doi":"10.1145/2442516.2442533","DOIUrl":"https://doi.org/10.1145/2442516.2442533","url":null,"abstract":"Soft errors are one-time events that corrupt the state of a computing system but not its overall functionality. Large supercomputers are especially susceptible to soft errors because of their large number of components. Soft errors can generally be detected offline through the comparison of the final computation results of two duplicated computations, but this approach often introduces significant overhead. This paper presents Online-ABFT, a simple but efficient online soft error detection technique that can detect soft errors in the widely used Krylov subspace iterative methods in the middle of the program execution so that the computation efficiency can be improved through the termination of the corrupted computation in a timely manner soon after a soft error occurs. Based on a simple verification of orthogonality and residual, Online-ABFT is easy to implement and highly efficient. Experimental results demonstrate that, when this online error detection approach is used together with checkpointing, it improves the time to obtain correct results by up to several orders of magnitude over the traditional offline approach.","PeriodicalId":286119,"journal":{"name":"ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming","volume":"161 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116139782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 170

Programming with hardware lock elision 用硬件编程锁省略

ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming Pub Date : 2013-02-23 DOI: 10.1145/2442516.2442552

Y. Afek, A. Levy, Adam Morrison

引用次数: 26

Automatic problem size sensitive task partitioning on heterogeneous parallel systems 异构并行系统中问题大小敏感任务的自动划分

ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming Pub Date : 2013-02-23 DOI: 10.1145/2442516.2442545

Ivan Grasso, Klaus Kofler, Biagio Cosenza, T. Fahringer

引用次数: 12

Scalable deterministic replay in a parallel full-system emulator 并行全系统仿真器中的可扩展确定性重放

ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming Pub Date : 2013-02-23 DOI: 10.1145/2442516.2442537

Yufei Chen, Haibo Chen

{"title":"Scalable deterministic replay in a parallel full-system emulator","authors":"Yufei Chen, Haibo Chen","doi":"10.1145/2442516.2442537","DOIUrl":"https://doi.org/10.1145/2442516.2442537","url":null,"abstract":"Full-system emulation has been an extremely useful tool in developing and debugging systems software like operating systems and hypervisors. However, current full-system emulators lack the support for deterministic replay, which limits the reproducibility of concurrency bugs that is indispensable for analyzing and debugging the essentially multi-threaded systems software.\u0000 This paper analyzes the challenges in supporting deterministic replay in parallel full-system emulators and makes a comprehensive study on the sources of non-determinism. Unlike application-level replay systems, our system, called ReEmu, needs to log sources of non-determinism in both the guest software stack and the dynamic binary translator for faithful replay. To provide scalable and efficient record and replay on multicore machines, ReEmu makes several notable refinements to the CREW protocol that replays shared memory systems. First, being aware of the performance bottlenecks in frequent lock operations in the CREW protocol, ReEmu refines the CREW protocol with a seqlock-like design, to avoid serious contention and possible starvation in instrumentation code tracking dependence of racy accesses on a shared memory object. Second, to minimize the required log files, ReEmu only logs minimal local information regarding accesses to a shared memory location, but instead relies on an offline log processing tool to derive precise shared memory dependence for faithful replay. Third, ReEmu adopts an automatic lock clustering mechanism that clusters a set of uncontended memory objects to a bulk to reduce the frequencies of lock operations, which noticeably boost performance.\u0000 Our prototype ReEmu is based on our open-source COREMU system and supports scalable and efficient record and replay of full-system environments (both x64 and ARM). Performance evaluation shows that ReEmu has very good performance scalability on an Intel multicore machine. It incurs only 68.9% performance overhead on average (ranging from 51.8% to 94.7%) over vanilla COREMU to record five PARSEC benchmarks running on a 16-core emulated system.","PeriodicalId":286119,"journal":{"name":"ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132420539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 29

Adoption protocols for fanout-optimal fault-tolerant termination detection 采用扇出最优容错终止检测协议

ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming Pub Date : 2013-02-23 DOI: 10.1145/2442516.2442519

J. Lifflander, P. Miller, L. Kalé

引用次数: 11

Distributed merge trees 分布式合并树

ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming Pub Date : 2013-02-23 DOI: 10.1145/2442516.2442526

D. Morozov, G. Weber

{"title":"Distributed merge trees","authors":"D. Morozov, G. Weber","doi":"10.1145/2442516.2442526","DOIUrl":"https://doi.org/10.1145/2442516.2442526","url":null,"abstract":"Improved simulations and sensors are producing datasets whose increasing complexity exhausts our ability to visualize and comprehend them directly. To cope with this problem, we can detect and extract significant features in the data and use them as the basis for subsequent analysis. Topological methods are valuable in this context because they provide robust and general feature definitions.\u0000 As the growth of serial computational power has stalled, data analysis is becoming increasingly dependent on massively parallel machines. To satisfy the computational demand created by complex datasets, algorithms need to effectively utilize these computer architectures. The main strength of topological methods, their emphasis on global information, turns into an obstacle during parallelization.\u0000 We present two approaches to alleviate this problem. We develop a distributed representation of the merge tree that avoids computing the global tree on a single processor and lets us parallelize subsequent queries. To account for the increasing number of cores per processor, we develop a new data structure that lets us take advantage of multiple shared-memory cores to parallelize the work on a single node. Finally, we present experiments that illustrate the strengths of our approach as well as help identify future challenges.","PeriodicalId":286119,"journal":{"name":"ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming","volume":"250 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121067705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 71