Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming最新文献_第4页

Revealing parallel scans and reductions in sequential loops through function reconstruction 通过功能重建揭示并行扫描和顺序循环的减少

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming Pub Date : 2018-02-10 DOI: 10.1145/3178487.3178523

Peng Jiang, G. Agrawal

引用次数: 0

Bridging the gap between deep learning and sparse matrix format selection 弥合了深度学习和稀疏矩阵格式选择之间的差距

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming Pub Date : 2018-02-10 DOI: 10.1145/3178487.3178495

Yue Zhao, Jiajia Li, C. Liao, Xipeng Shen

引用次数: 89

Efficient parallel determinacy race detection for two-dimensional dags 二维标签的高效平行确定性竞赛检测

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming Pub Date : 2018-02-10 DOI: 10.1145/3178487.3178515

Yifan Xu, I. Lee, Kunal Agrawal

{"title":"Efficient parallel determinacy race detection for two-dimensional dags","authors":"Yifan Xu, I. Lee, Kunal Agrawal","doi":"10.1145/3178487.3178515","DOIUrl":"https://doi.org/10.1145/3178487.3178515","url":null,"abstract":"A program is said to have a determinacy race if logically parallel parts of a program access the same memory location and one of the accesses is a write. These races are generally bugs in the program since they lead to non-deterministic program behavior --- different schedules of the program can lead to different results. Most prior work on detecting these races focuses on a subclass of programs with fork-join parallelism. This paper presents a race-detection algorithm, 2D-Order, for detecting races in a more general class of programs, namely programs whose dependence structure can be represented as planar dags embedded in 2D grids. Such dependence structures arise from programs that use pipelined parallelism or dynamic programming recurrences. Given a computation with T1 work and T∞ span, 2D-Order executes it while also detecting races in O(T1/P + T∞) time on P processors, which is asymptotically optimal. We also implemented PRacer, a race-detection algorithm based on 2D-Order for Cilk-P, which is a language for expressing pipeline parallelism. Empirical results demonstrate that PRacer incurs reasonable overhead and exhibits scalability similar to the baseline (executions without race detection) when running on multiple cores.","PeriodicalId":193776,"journal":{"name":"Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128275969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

A predictable synchronisation algorithm 一个可预测的同步算法

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming Pub Date : 2018-02-10 DOI: 10.1145/3178487.3178533

S. Reif, Wolfgang Schröder-Preikschat

引用次数: 0

swSpTRSV: a fast sparse triangular solve with sparse level tile layout on sunway architectures swSpTRSV:一种快速的稀疏三角形求解方法，在双威架构上实现稀疏层贴图的布局

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming Pub Date : 2018-02-10 DOI: 10.1145/3178487.3178513

Xinliang Wang, Weifeng Liu, Wei Xue, Li Wu

{"title":"swSpTRSV: a fast sparse triangular solve with sparse level tile layout on sunway architectures","authors":"Xinliang Wang, Weifeng Liu, Wei Xue, Li Wu","doi":"10.1145/3178487.3178513","DOIUrl":"https://doi.org/10.1145/3178487.3178513","url":null,"abstract":"Sparse triangular solve (SpTRSV) is one of the most important kernels in many real-world applications. Currently, much research on parallel SpTRSV focuses on level-set construction for reducing the number of inter-level synchronizations. However, the out-of-control data reuse and high cost for global memory or shared cache access in inter-level synchronization have been largely neglected in existing work. In this paper, we propose a novel data layout called Sparse Level Tile to make all data reuse under control, and design a Producer-Consumer pairing method to make any inter-level synchronization only happen in very fast register communication. We implement our data layout and algorithms on an SW26010 many-core processor, which is the main building-block of the current world fastest supercomputer Sunway Taihulight. The experimental results of testing all 2057 square matrices from the Florida Matrix Collection show that our method achieves an average speedup of 6.9 and the best speedup of 38.5 over parallel level-set method. Our method also outperforms the latest methods on a KNC many-core processor in 1856 matrices and the latest methods on a K80 GPU in 1672 matrices, respectively.","PeriodicalId":193776,"journal":{"name":"Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131969433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 66

SecureMR: secure mapreduce using homomorphic encryption and program partitioning SecureMR:使用同态加密和程序分区的安全mapreduce

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming Pub Date : 2018-02-10 DOI: 10.1145/3178487.3178520

Yao Dong, Ana L. Milanova, Julian T Dolby

引用次数: 1

Performance modeling for GPUs using abstract kernel emulation 使用抽象内核仿真的gpu性能建模

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming Pub Date : 2018-02-10 DOI: 10.1145/3178487.3178524

Changwan Hong, Aravind Sukumaran-Rajam, Jinsung Kim, P. Rawat, S. Krishnamoorthy, L. Pouchet, F. Rastello, P. Sadayappan

引用次数: 1

Register optimizations for stencils on GPUs 在gpu上注册模板优化

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming Pub Date : 2018-02-10 DOI: 10.1145/3178487.3178500

P. Rawat, F. Rastello, Aravind Sukumaran-Rajam, L. Pouchet, A. Rountev, P. Sadayappan

引用次数: 46

vSensor: leveraging fixed-workload snippets of programs for performance variance detection vSensor:利用程序的固定工作负载片段进行性能差异检测

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming Pub Date : 2018-02-10 DOI: 10.1145/3178487.3178497

Xiongchao Tang, Jidong Zhai, Xuehai Qian, Bingsheng He, Wei Xue, Wenguang Chen

{"title":"vSensor: leveraging fixed-workload snippets of programs for performance variance detection","authors":"Xiongchao Tang, Jidong Zhai, Xuehai Qian, Bingsheng He, Wei Xue, Wenguang Chen","doi":"10.1145/3178487.3178497","DOIUrl":"https://doi.org/10.1145/3178487.3178497","url":null,"abstract":"Performance variance becomes increasingly challenging on current large-scale HPC systems. Even using a fixed number of computing nodes, the execution time of several runs can vary significantly. Many parallel programs executing on supercomputers suffer from such variance. Performance variance not only causes unpredictable performance requirement violations, but also makes it unintuitive to understand the program behavior. Despite prior efforts, efficient on-line detection of performance variance remains an open problem. In this paper, we propose vSensor, a novel approach for light-weight and on-line performance variance detection. The key insight is that, instead of solely relying on an external detector, the source code of a program itself could reveal the runtime performance characteristics. Specifically, many parallel programs contain code snippets that are executed repeatedly with an invariant quantity of work. Based on this observation, we use compiler techniques to automatically identify these fixed-workload snippets and use them as performance variance sensors (v-sensors) that enable effective detection. We evaluate vSensor with a variety of parallel programs on the Tianhe-2 system. Results show that vSensor can effectively detect performance variance on HPC systems. The performance overhead is smaller than 4% with up to 16,384 processes. In particular, with vSensor, we found a bad node with slow memory that slowed a program's performance by 21%. As a showcase, we also detected a severe network performance problem that caused a 3.37X slowdown for an HPC kernel program on the Tianhe-2 system.","PeriodicalId":193776,"journal":{"name":"Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127710820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Harnessing epoch-based reclamation for efficient range queries 利用基于时代的回收来进行有效的范围查询

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming Pub Date : 2018-02-10 DOI: 10.1145/3178487.3178489

Maya Arbel-Raviv, Trevor Brown

引用次数: 33