ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming最新文献

筛选
英文 中文
The tasks with effects model for safe concurrency 具有安全并发性的任务效果模型
ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming Pub Date : 2013-02-23 DOI: 10.1145/2442516.2442540
Stephen Heumann, Vikram S. Adve, Shengjie Wang
{"title":"The tasks with effects model for safe concurrency","authors":"Stephen Heumann, Vikram S. Adve, Shengjie Wang","doi":"10.1145/2442516.2442540","DOIUrl":"https://doi.org/10.1145/2442516.2442540","url":null,"abstract":"Today's widely-used concurrent programming models either provide weak safety guarantees, making it easy to write code with subtle errors, or are limited in the class of programs that they can express. We propose a new concurrent programming model based on tasks with effects that offers strong safety guarantees while still providing the flexibility needed to support the many ways that concurrency is used in complex applications. The core unit of work in our model is a dynamically-created task. The model's key feature is that each task has programmer-specified effects, and a run-time scheduler is used to ensure that two tasks are run concurrently only if they have non-interfering effects. Through the combination of statically verifying the declared effects of tasks and using an effect-aware run-time scheduler, our model is able to guarantee strong safety properties, including data race freedom and atomicity. It is also possible to use our model to write programs and computations that can be statically proven to behave deterministically. We describe the tasks with effects programming model and provide a formal dynamic semantics for it. We also describe our implementation of this model in an extended version of Java and evaluate its use in several programs exhibiting various patterns of concurrency.","PeriodicalId":286119,"journal":{"name":"ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116098859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Work-stealing with configurable scheduling strategies 利用可配置的调度策略窃取工作
ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming Pub Date : 2013-02-23 DOI: 10.1145/2442516.2442562
Martin Wimmer, Daniel Cederman, J. Träff, P. Tsigas
{"title":"Work-stealing with configurable scheduling strategies","authors":"Martin Wimmer, Daniel Cederman, J. Träff, P. Tsigas","doi":"10.1145/2442516.2442562","DOIUrl":"https://doi.org/10.1145/2442516.2442562","url":null,"abstract":"Work-stealing systems are typically oblivious to the nature of the tasks they are scheduling. They do not know or take into account how long a task will take to execute or how many subtasks it will spawn. Moreover, task execution order is typically determined by an underlying task storage data structure, and cannot be changed. There are thus possibilities for optimizing task parallel executions by providing information on specific tasks and their preferred execution order to the scheduling system.\u0000 We investigate generalizations of work-stealing and introduce a framework enabling applications to dynamically provide hints on the nature of specific tasks using scheduling strategies. Strategies can be used to independently control both local task execution and steal order. Strategies allow optimizations on specific tasks, in contrast to more conventional scheduling policies that are typically global in scope. Strategies are composable and allow different, specific scheduling choices for different parts of an application simultaneously. We have implemented a work-stealing system based on our strategy framework. A series of benchmarks demonstrates beneficial effects that can be achieved with scheduling strategies.","PeriodicalId":286119,"journal":{"name":"ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127011961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Relational algorithms for multi-bulk-synchronous processors 多批量同步处理器的关系算法
ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming Pub Date : 2013-02-23 DOI: 10.1145/2442516.2442555
G. Diamos, Haicheng Wu, Jin Wang, A. Lele, S. Yalamanchili
{"title":"Relational algorithms for multi-bulk-synchronous processors","authors":"G. Diamos, Haicheng Wu, Jin Wang, A. Lele, S. Yalamanchili","doi":"10.1145/2442516.2442555","DOIUrl":"https://doi.org/10.1145/2442516.2442555","url":null,"abstract":"Relational databases remain an important application infrastructure for organizing and analyzing massive volumes of data. At the same time, processor architectures are increasingly gravitating towards Multi-Bulk-Synchronous processor (Multi-BSP) architectures employing throughput-optimized memory systems, lightweight multi-threading, and Single-Instruction Multiple-Data (SIMD) core organizations. This paper explores the mapping of primitive relational algebra operations onto such architectures to improve the throughput of data warehousing applications built on relational databases.","PeriodicalId":286119,"journal":{"name":"ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126167513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Compiler aided manual speculation for high performance concurrent data structures
ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming Pub Date : 2013-02-23 DOI: 10.1145/2442516.2442522
Lingxiang Xiang, M. Scott
{"title":"Compiler aided manual speculation for high performance concurrent data structures","authors":"Lingxiang Xiang, M. Scott","doi":"10.1145/2442516.2442522","DOIUrl":"https://doi.org/10.1145/2442516.2442522","url":null,"abstract":"Speculation is a well-known means of increasing parallelism among concurrent methods that are usually but not always independent. Traditional nonblocking data structures employ a particularly restrictive form of speculation. Software transactional memory (STM) systems employ a much more general---though typically blocking---form, and there is a wealth of options in between.\u0000 Using several different concurrent data structures as examples, we show that manual addition of speculation to traditional lock-based code can lead to significant performance improvements. Successful speculation requires careful consideration of profitability, and of how and when to validate consistency. Unfortunately, it also requires substantial modifications to code structure and a deep understanding of the memory model. These latter requirements make it difficult to use in its purely manual form, even for expert programmers. To simplify the process, we present a compiler tool, CSpec, that automatically generates speculative code from baseline lock-based code with user annotations. Compiler-aided manual speculation keeps the original code structure for better readability and maintenance, while providing the flexibility to chose speculation and validation strategies. Experiments on UltraSPARC and x86 platforms demonstrate that with a small number annotations added to lock-based code, CSpec can generate speculative code that matches the performance of best-effort hand-written versions.","PeriodicalId":286119,"journal":{"name":"ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125801733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
WuKong: effective diagnosis of bugs at large system scales 悟空:在大系统尺度上有效的bug诊断
ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming Pub Date : 2013-02-23 DOI: 10.1145/2442516.2442563
Bowen Zhou, Milind Kulkarni, S. Bagchi
{"title":"WuKong: effective diagnosis of bugs at large system scales","authors":"Bowen Zhou, Milind Kulkarni, S. Bagchi","doi":"10.1145/2442516.2442563","DOIUrl":"https://doi.org/10.1145/2442516.2442563","url":null,"abstract":"A key challenge in developing large scale applications (both in system size and in input size) is finding bugs that are latent at the small scales of testing, only manifesting when a program is deployed at large scales. Traditional statistical techniques fail because no error-free run is available at deployment scales for training purposes. Prior work used scaling models to detect anomalous behavior at large scales without being trained on correct behavior at that scale. However, that work cannot localize bugs automatically. In this paper, we extend that work in three ways: (i) we develop an automatic diagnosis technique, based on feature reconstruction; (ii) we design a heuristic to effectively prune the feature space; and (iii) we validate our design through one fault-injection study, finding that our system can effectively localize bugs in a majority of cases.","PeriodicalId":286119,"journal":{"name":"ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128322244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Runtime elision of transactional barriers for captured memory 运行时对捕获内存的事务性屏障的省略
ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming Pub Date : 2013-02-23 DOI: 10.1145/2442516.2442556
F. Carvalho, João P. Cachopo
{"title":"Runtime elision of transactional barriers for captured memory","authors":"F. Carvalho, João P. Cachopo","doi":"10.1145/2442516.2442556","DOIUrl":"https://doi.org/10.1145/2442516.2442556","url":null,"abstract":"In this paper, we propose a new technique that can identify transaction-local memory (i.e. captured memory), in managed environments, while having a low runtime overhead. We implemented our proposal in a well known STM framework (Deuce) and we tested it in STMBench7 with two different STMs: TL2 and LSA. In both STMs the performance improved significantly (4 times and 2.6 times, respectively). Moreover, running the STAMP benchmarks with our approach shows improvements of 7 times in the best case for the Vacation application.","PeriodicalId":286119,"journal":{"name":"ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122285064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Parallel schedule synthesis for attribute grammars 属性语法的并行调度合成
ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming Pub Date : 2013-02-23 DOI: 10.1145/2442516.2442535
Leo A. Meyerovich, Matthew E. Torok, Eric Hamilton Atkinson, R. Bodík
{"title":"Parallel schedule synthesis for attribute grammars","authors":"Leo A. Meyerovich, Matthew E. Torok, Eric Hamilton Atkinson, R. Bodík","doi":"10.1145/2442516.2442535","DOIUrl":"https://doi.org/10.1145/2442516.2442535","url":null,"abstract":"We examine how to synthesize a parallel schedule of structured traversals over trees. In our system, programs are declaratively specified as attribute grammars. Our synthesizer automatically, correctly, and quickly schedules the attribute grammar as a composition of parallel tree traversals. Our downstream compiler optimizes for GPUs and multicore CPUs.\u0000 We provide support for designing efficient schedules. First, we introduce a declarative language of schedules where programmers may constrain any part of the schedule and the synthesizer will complete and autotune the rest. Furthermore, the synthesizer answers debugging queries about how schedules may be completed.\u0000 We evaluate our approach with two case studies. First, we created the first parallel schedule for a large fragment of CSS and report a 3X multicore speedup. Second, we created an interactive GPU-accelerated animation of over 100,000 nodes.","PeriodicalId":286119,"journal":{"name":"ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134258135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
NUMA-aware reader-writer locks numa感知的读写锁
ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming Pub Date : 2013-02-23 DOI: 10.1145/2442516.2442532
I. Calciu, D. Dice, Yossi Lev, Victor Luchangco, Virendra J. Marathe, N. Shavit
{"title":"NUMA-aware reader-writer locks","authors":"I. Calciu, D. Dice, Yossi Lev, Victor Luchangco, Virendra J. Marathe, N. Shavit","doi":"10.1145/2442516.2442532","DOIUrl":"https://doi.org/10.1145/2442516.2442532","url":null,"abstract":"Non-Uniform Memory Access (NUMA) architectures are gaining importance in mainstream computing systems due to the rapid growth of multi-core multi-chip machines. Extracting the best possible performance from these new machines will require us to revisit the design of the concurrent algorithms and synchronization primitives which form the building blocks of many of today's applications. This paper revisits one such critical synchronization primitive -- the reader-writer lock.\u0000 We present what is, to the best of our knowledge, the first family of reader-writer lock algorithms tailored to NUMA architectures. We present several variations which trade fairness between readers and writers for higher concurrency among readers and better back-to-back batching of writers from the same NUMA node. Our algorithms leverage the lock cohorting technique to manage synchronization between writers in a NUMA-friendly fashion, binary flags to coordinate readers and writers, and simple distributed reader counter implementations to enable NUMA-friendly concurrency among readers. The end result is a collection of surprisingly simple NUMA-aware algorithms that outperform the state-of-the-art reader-writer locks by up to a factor of 10 in our microbenchmark experiments. To evaluate our algorithms in a realistic setting we also present performance results of the kccachetest benchmark of the Kyoto-Cabinet distribution, an open-source database which makes heavy use of pthread reader-writer locks. Our locks boost the performance of kccachetest by up to 40% over the best prior alternatives.","PeriodicalId":286119,"journal":{"name":"ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming","volume":"258 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120861369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 91
Ownership passing: efficient distributed memory programming on multi-core systems 所有权传递:多核系统上高效的分布式内存编程
ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming Pub Date : 2013-02-23 DOI: 10.1145/2442516.2442534
A. Friedley, T. Hoefler, G. Bronevetsky, A. Lumsdaine, Ching-Chen Ma
{"title":"Ownership passing: efficient distributed memory programming on multi-core systems","authors":"A. Friedley, T. Hoefler, G. Bronevetsky, A. Lumsdaine, Ching-Chen Ma","doi":"10.1145/2442516.2442534","DOIUrl":"https://doi.org/10.1145/2442516.2442534","url":null,"abstract":"The number of cores in multi- and many-core high-performance processors is steadily increasing. MPI, the de-facto standard for programming high-performance computing systems offers a distributed memory programming model. MPI's semantics force a copy from one process' send buffer to another process' receive buffer. This makes it difficult to achieve the same performance on modern hardware than shared memory programs which are arguably harder to maintain and debug. We propose generalizing MPI's communication model to include ownership passing, which make it possible to fully leverage the shared memory hardware of multi- and many-core CPUs to stream communicated data concurrently with the receiver's computations on it. The benefits and simplicity of message passing are retained by extending MPI with calls to send (pass) ownership of memory regions, instead of their contents, between processes. Ownership passing is achieved with a hybrid MPI implementation that runs MPI processes as threads and is mostly transparent to the user. We propose an API and a static analysis technique to transform legacy MPI codes automatically and transparently to the programmer, demonstrating that this scheme is easy to use in practice. Using the ownership passing technique, we see up to 51% communication speedups over a standard message passing implementation on state-of-the art multicore systems. Our analysis and interface will lay the groundwork for future development of MPI-aware optimizing compilers and multi-core specific optimizations, which will be key for success in current and next-generation computing platforms.","PeriodicalId":286119,"journal":{"name":"ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123930817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Array dataflow analysis for polyhedral X10 programs 阵列数据流分析多面体X10程序
ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming Pub Date : 2013-02-23 DOI: 10.1145/2442516.2442520
Tomofumi Yuki, P. Feautrier, S. Rajopadhye, V. Saraswat
{"title":"Array dataflow analysis for polyhedral X10 programs","authors":"Tomofumi Yuki, P. Feautrier, S. Rajopadhye, V. Saraswat","doi":"10.1145/2442516.2442520","DOIUrl":"https://doi.org/10.1145/2442516.2442520","url":null,"abstract":"This paper addresses the static analysis of an important class of X10 programs, namely those with finish/async parallelism, and affine loops and array reference structure as in the polyhedral model. For such programs our analysis can certify whenever a program is deterministic or flags races.\u0000 Our key contributions are (i) adaptation of array dataflow analysis from the polyhedral model to programs with finish/async parallelism, and (ii) use of the array dataflow analysis result to certify determinacy. We distinguish our work from previous approaches by combining the precise statement instance-wise and array element-wise analysis capability of the polyhedral model with finish/async programs that are more expressive than doall parallelism commonly considered in the polyhedral literature. We show that our approach is exact (no false negative/positives) and more precise than previous approaches, but is limited to programs that fit the polyhedral model.","PeriodicalId":286119,"journal":{"name":"ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming","volume":"137 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123228430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信