Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming最新文献

DisCVar: discovering critical variables using algorithmic differentiation for transient faults DisCVar:利用算法微分发现瞬态故障的关键变量

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming Pub Date : 2018-02-10 DOI: 10.1145/3178487.3178502

Harshitha Menon, K. Mohror

{"title":"DisCVar: discovering critical variables using algorithmic differentiation for transient faults","authors":"Harshitha Menon, K. Mohror","doi":"10.1145/3178487.3178502","DOIUrl":"https://doi.org/10.1145/3178487.3178502","url":null,"abstract":"Aggressive technology scaling trends have made the hardware of high performance computing (HPC) systems more susceptible to faults. Some of these faults can lead to silent data corruption (SDC), and represent a serious problem because they alter the HPC simulation results. In this paper, we present a full-coverage, systematic methodology called DisCVar to identify critical variables in HPC applications for protection against SDC. DisCVar uses automatic differentiation (AD) to determine the sensitivity of the simulation output to errors in program variables. We empirically validate our approach in identifying vulnerable variables by comparing the results against a full-coverage code-level fault injection campaign. We find that our DisCVar correctly identifies the variables that are critical to ensure application SDC resilience with a high degree of accuracy compared to the results of the fault injection campaign. Additionally, DisCVar requires only two executions of the target program to generate results, whereas in our experiments we needed to perform millions of executions to get the same information from a fault injection campaign.","PeriodicalId":193776,"journal":{"name":"Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming","volume":"663 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134251818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Reducing transaction aborts by looking to the future 通过展望未来来减少事务中止

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming Pub Date : 2018-02-10 DOI: 10.1145/3178487.3178518

Nachshon Cohen, E. Petrank, J. Larus

引用次数: 1

Shared-memory parallelization of MTTKRP for dense tensors 稠密张量MTTKRP的共享内存并行化

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming Pub Date : 2018-02-10 DOI: 10.1145/3178487.3178522

Koby Hayashi, Grey Ballard, Yujie Jiang, Michael J. Tobia

引用次数: 20

Lazygraph: lazy data coherency for replicas in distributed graph-parallel computation 懒图:分布式图并行计算中副本的懒数据一致性

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming Pub Date : 2018-02-10 DOI: 10.1145/3178487.3178508

Lei Wang, Liangji Zhuang, Junhang Chen, Huimin Cui, Fang Lv, Y. Liu, Xiaobing Feng

{"title":"Lazygraph: lazy data coherency for replicas in distributed graph-parallel computation","authors":"Lei Wang, Liangji Zhuang, Junhang Chen, Huimin Cui, Fang Lv, Y. Liu, Xiaobing Feng","doi":"10.1145/3178487.3178508","DOIUrl":"https://doi.org/10.1145/3178487.3178508","url":null,"abstract":"Replicas 1 of a vertex play an important role in existing distributed graph processing systems which make a single vertex to be parallel processed by multiple machines and access remote neighbors locally without any remote access. However, replicas of vertices introduce data coherency problem. Existing distributed graph systems treat replicas of a vertex v as an atomic and indivisible vertex, and use an eager data coherency approach to guarantee replicas atomicity. In eager data coherency approach, any changes to vertex data must be immediately communicated to all replicas of v, thus leading to frequent global synchronizations and communications. In this paper, we propose a lazy data coherency approach, called LazyAsync, which treats replicas of a vertex as independent vertices and maintains the data coherency by computations, rather than communications in existing eager approach. Our approach automatically selects some data coherency points from the graph algorithm, and maintains all replicas to share the same global view only at such points, which means the replicas are enabled to maintain different local views between any two adjacent data coherency points. Based on PowerGraph, we develop a distributed graph processing system LazyGraph to implement the LazyAsync approach and exploit graph-aware optimizations. On a 48-node EC2-like cluster, LazyGraph outperforms PowerGraph on four widely used graph algorithms across a variety of real-world graphs, with a speedup ranging from 1.25x to 10.69x.","PeriodicalId":193776,"journal":{"name":"Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131101522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Practical concurrent traversals in search trees 搜索树中的实际并发遍历

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming Pub Date : 2018-02-10 DOI: 10.1145/3178487.3178503

Dana Drachsler-Cohen, Martin T. Vechev, Eran Yahav

引用次数: 3

Automated code acceleration targeting heterogeneous openCL devices 针对异构openCL设备的自动代码加速

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming Pub Date : 2018-02-10 DOI: 10.1145/3178487.3178534

Heinrich Riebler, G. Vaz, Tobias Kenter, Christian Plessl

引用次数: 1

Making pull-based graph processing performant 使基于拉的图形处理性能

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming Pub Date : 2018-02-10 DOI: 10.1145/3178487.3178506

Samuel Grossman, Heiner Litz, C. Kozyrakis

{"title":"Making pull-based graph processing performant","authors":"Samuel Grossman, Heiner Litz, C. Kozyrakis","doi":"10.1145/3178487.3178506","DOIUrl":"https://doi.org/10.1145/3178487.3178506","url":null,"abstract":"Graph processing engines following either the push-based or pull-based pattern conceptually consist of a two-level nested loop structure. Parallelizing and vectorizing these loops is critical for high overall performance and memory bandwidth utilization. Outer loop parallelization is simple for both engine types but suffers from high load imbalance. This work focuses on inner loop parallelization for pull engines, which when performed naively leads to a significant increase in conflicting memory writes that must be synchronized. Our first contribution is a scheduler-aware interface for parallel loops that allows us to optimize for the common case in which each thread executes several consecutive iterations. This eliminates most write traffic and avoids all synchronization, leading to speedups of up to 50X. Our second contribution is the Vector-Sparse format, which addresses the obstacles to vectorization that stem from the commonly-used Compressed-Sparse data structure. Our new format eliminates unaligned memory accesses and bounds checks within vector operations, two common problems when processing low-degree vertices. Vectorization with Vector-Sparse leads to speedups of up to 2.5X. Our contributions are embodied in Grazelle, a hybrid graph processing framework. On a server equipped with four Intel Xeon E7-4850 v3 processors, Grazelle respectively outperforms Ligra, Polymer, GraphMat, and X-Stream by up to 15.2X, 4.6X, 4.7X, and 66.8X.","PeriodicalId":193776,"journal":{"name":"Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130354390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 61

A microbenchmark to study GPU performance models 一个研究GPU性能模型的微基准

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming Pub Date : 2018-02-10 DOI: 10.1145/3178487.3178536

V. Volkov

引用次数: 7

VerifiedFT: a verified, high-performance precise dynamic race detector VerifiedFT:一个经过验证的高性能精确动态竞赛检测器

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming Pub Date : 2018-02-10 DOI: 10.1145/3178487.3178514

James R. Wilcox, C. Flanagan, Stephen N. Freund

引用次数: 14

Graph partitioning applied to DAG scheduling to reduce NUMA effects 图分区应用于DAG调度，减少NUMA效应

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming Pub Date : 2018-02-10 DOI: 10.1145/3178487.3178535

Isaac Sánchez Barrera, Marc Casas, Miquel Moretó, E. Ayguadé, Jesús Labarta, M. Valero

引用次数: 5