2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)最新文献_第2页

A Robust Parallel Preconditioner for Indefinite Systems Using Hierarchical Matrices and Randomized Sampling 基于分层矩阵和随机抽样的不确定系统鲁棒并行预调节器

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2017-05-01 DOI: 10.1109/IPDPS.2017.21

P. Ghysels, X. Li, C. Gorman, François-Henry Rouet

{"title":"A Robust Parallel Preconditioner for Indefinite Systems Using Hierarchical Matrices and Randomized Sampling","authors":"P. Ghysels, X. Li, C. Gorman, François-Henry Rouet","doi":"10.1109/IPDPS.2017.21","DOIUrl":"https://doi.org/10.1109/IPDPS.2017.21","url":null,"abstract":"We present the design and implementation of a parallel and fully algebraic preconditioner based on an approximate sparse factorization using low-rank matrix compression. The sparse factorization uses a multifrontal algorithm with fill-in occurring in dense frontal matrices. These frontal matrices are approximated as hierarchically semi-separable matrices, which are constructed using a randomized sampling technique. The resulting preconditioner has (close to) optimal complexity in terms of flops and memory usage for many discretized partial differential equations. We illustrate the robustness and performance of this new preconditioner for a number of unstructured grid problems. Initial results show that the rank-structured preconditioner could be a viable alternative to algebraic multigrid and incomplete LU, for instance. Our implementation uses MPI and OpenMP and supports real and complex arithmetic and 32 and 64 bit integers. We present a detailed performance analysis. The code is released as the STRUMPACK library with a BSD license, and a PETSc interface is available to allow for easy integration in existing applications.","PeriodicalId":209524,"journal":{"name":"2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128679263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 32

Tight Load Balancing Via Randomized Local Search 基于随机局部搜索的紧负载均衡

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2017-05-01 DOI: 10.1109/IPDPS.2017.52

P. Berenbrink, Peter Kling, Christopher Liaw, Abbas Mehrabian

引用次数: 6

SWhybrid: A Hybrid-Parallel Framework for Large-Scale Protein Sequence Database Search SWhybrid:用于大规模蛋白质序列数据库搜索的混合并行框架

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2017-05-01 DOI: 10.1109/IPDPS.2017.42

Haidong Lan, Weiguo Liu, Yongchao Liu, B. Schmidt

{"title":"SWhybrid: A Hybrid-Parallel Framework for Large-Scale Protein Sequence Database Search","authors":"Haidong Lan, Weiguo Liu, Yongchao Liu, B. Schmidt","doi":"10.1109/IPDPS.2017.42","DOIUrl":"https://doi.org/10.1109/IPDPS.2017.42","url":null,"abstract":"Computer architectures continue to develop rapidly towards massively parallel and heterogeneous systems. Thus, easily extensible yet highly efficient parallelization approaches for a variety of platforms are urgently needed. In this paper, we present SWhybrid, a hybrid computing framework for large-scale biological sequence database search on heterogeneous computing environments with multi-core or many-core processing units (PUs) based on the Smith- Waterman (SW) algorithm. To incorporate a diverse set of PUs such as combinations of CPUs, GPUs and Xeon Phis, we abstract them as SIMD vector execution units with different number of lanes. We propose a machine model, associated with a unified programming interface implemented in C++, to abstract underlying architectural differences. Performance evaluation reveals that SWhybrid (i) outperforms all other tested state-of-the-art tools on both homogeneous and heterogeneous computing platforms, (ii) achieves an efficiency of over 80% on all tested CPUs and GPUs and over 70% on Xeon Phis, and (iii) achieves utlization rates of over 80% on all tested heterogeneous platforms. Our results demonstrate that there is enough commonality between vector-like instructions across CPUs and GPUs that one can develop higher-level abstractions and still specialize with close-to-peak performance. SWhybrid is open-source software and freely available at https://github.com/turbo0628/swhybrid.","PeriodicalId":209524,"journal":{"name":"2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124301809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Accelerating Spark Datasets by Inlining Deserialization 通过内联反序列化加速Spark数据集

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2017-05-01 DOI: 10.1109/IPDPS.2017.111

Jan Wroblewski, K. Ishizaki, H. Inoue, Moriyoshi Ohara

引用次数: 1

Exploring DataVortex Systems for Irregular Applications 探索不规则应用的数据漩涡系统

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2017-05-01 DOI: 10.1109/IPDPS.2017.121

R. Gioiosa, Antonino Tumeo, Jian Yin, T. Warfel, D. Haglin, S. Betelú

{"title":"Exploring DataVortex Systems for Irregular Applications","authors":"R. Gioiosa, Antonino Tumeo, Jian Yin, T. Warfel, D. Haglin, S. Betelú","doi":"10.1109/IPDPS.2017.121","DOIUrl":"https://doi.org/10.1109/IPDPS.2017.121","url":null,"abstract":"Emerging applications for data analytics and knowledge discovery typically have irregular or unpredictable communication patterns that do not scale well on parallel systems designed for traditional bulk-synchronous HPC applications. New network architectures that focus on minimizing (short) message latencies, rather than maximizing (large) transfer bandwidths, are emerging as possible alternatives to better support those applications with irregular communication patterns. We explore a system based upon one such novel network architecture, the Data Vortex interconnection network, and examine how this system performs by running benchmark code written for the Data Vortex network, as well as a reference MPI-over- Infiniband implementation, on the same cluster. Simple communication primitives (ping-pong and barrier synchronization), a few common communication kernels (distributed 1D Fast Fourier Transform, breadth-first search, Giga-Updates Per Second) and three prototype applications (a proxy application for simulating neutron transport-”SNAP”, a finite difference simulation for computing incompressible fluid flow, and an implementation of the heat equation) were all implemented for both network models. The results were compared and analyzed to determine what characteristics make an application a good candidate for porting to a Data Vortex system, and to what extent applications could potentially benefit from this new architecture.","PeriodicalId":209524,"journal":{"name":"2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"119 14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126299354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Dynamic Adaptation in Wireless Networks Under Comprehensive Interference via Carrier Sense 基于载波感知的综合干扰无线网络动态自适应

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2017-05-01 DOI: 10.1109/IPDPS.2017.78

Dongxiao Yu, Yuexuan Wang, Tigran Tonoyan, M. Halldórsson

引用次数: 10

Argo NodeOS: Toward Unified Resource Management for Exascale Argo NodeOS:面向百亿亿级的统一资源管理

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2017-05-01 DOI: 10.1109/IPDPS.2017.25

Swann Perarnau, Judicael A. Zounmevo, Matthieu Dreher, B. V. Essen, R. Gioiosa, K. Iskra, M. Gokhale, Kazutomo Yoshii, P. Beckman

{"title":"Argo NodeOS: Toward Unified Resource Management for Exascale","authors":"Swann Perarnau, Judicael A. Zounmevo, Matthieu Dreher, B. V. Essen, R. Gioiosa, K. Iskra, M. Gokhale, Kazutomo Yoshii, P. Beckman","doi":"10.1109/IPDPS.2017.25","DOIUrl":"https://doi.org/10.1109/IPDPS.2017.25","url":null,"abstract":"Exascale systems are expected to feature hundreds of thousands of compute nodes with hundreds of hardware threads and complex memory hierarchies with a mix of on-package and persistent memory modules. In this context, the Argo project is developing a new operating system for exascale machines. Targeting production workloads using workflows or coupled codes, we improve the Linux kernel on several fronts. We extendthe memory management of Linux to be able to subdivide NUMA memory nodes, allowing better resource partitioning among processes running on the same node. We also add support for memory-mapped access tonode-local, PCIe-attached NVRAM devices and introduce a new scheduling class targeted at parallel runtimes supporting user-level load balancing. These features are unified into compute containers, a containerization approach focused on providing modern HPC applications with dynamic control over a wide range of kernel interfaces. To keep our approach compatible with industrial containerization products, we also identifycontentions points for the adoption of containers in HPC settings. Each NodeOS feature is evaluated by using a set of parallel benchmarks, miniapps, and coupled applications consisting of simulation and data analysis components, running on a modern NUMA platform. We observe out-of-the-box performance improvements easily matching, and often exceeding, those observed with expert-optimized configurations on standard OS kernels. Our lightweight approach to resource management retains the many benefits of a full OS kernel that application programmers have learned to depend on, at the same time providing a set of extensions that can be freely mixed and matched to best benefit particular application components.","PeriodicalId":209524,"journal":{"name":"2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133059774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Aces4: A Platform for Computational Chemistry Calculations with Extremely Large Block-Sparse Arrays Aces4:一个超大块稀疏阵列计算化学计算平台

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2017-05-01 DOI: 10.1109/IPDPS.2017.108

B. Sanders, J. Byrd, Nakul Jindal, V. Lotrich, Dmitry I. Liakh, A. Perera, R. Bartlett

引用次数: 1

When Neurons Fail 当神经元失效时

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2017-05-01 DOI: 10.1109/IPDPS.2017.66

El Mahdi El Mhamdi, R. Guerraoui

{"title":"When Neurons Fail","authors":"El Mahdi El Mhamdi, R. Guerraoui","doi":"10.1109/IPDPS.2017.66","DOIUrl":"https://doi.org/10.1109/IPDPS.2017.66","url":null,"abstract":"Neural networks have been traditionally considered robust in the sense that their precision degrades gracefully with the failure of neurons and can be compensated by additional learning phases. Nevertheless, critical applications for which neural networks are now appealing solutions, cannot afford any additional learning at run-time. In this paper, we view a multilayer neural network as a distributed system of which neurons can fail independently, and we evaluate its robustness in the absence of any (recovery) learning phase. We give tight bounds on the number of neurons that can fail without harming the result of a computation. To determine our bounds, we leverage the fact that neuralactivation functions are Lipschitz-continuous. Our bound isgiven in the form of quantity, we call the Forward ErrorPropagation, computing this quantity only requires looking atthe topology of the network, while experimentally assessingthe robustness of a network requires the costly experiment oflooking at all the possible inputs and testing all the possibleconfigurations of the network corresponding to different failuresituations, facing a discouraging combinatorial explosion. We distinguish the case of neurons that can fail and stop their activity (crashed neurons) from the case of neurons that can fail by transmitting arbitrary values (Byzantine neurons). In the crash case, our bound involves the number of neuronsper layer, the Lipschitz constant of the neural activationfunction, the number of failing neurons, the synaptic weightsand the depth of the layer where the failure occurred. In thecase of Byzantine failures, our bound involves, in addition, thesynaptic transmission capacity. Interestingly, as we show inthe paper, our bound can easily be extended to the case wheresynapses can fail. We present three applications of our results. The first is aquantification of the effect of memory cost reduction on theaccuracy of a neural network. The second is a quantification ofthe amount of information any neuron needs from its precedinglayer, enabling thereby a boosting scheme that prevents neuronsfrom waiting for unnecessary signals. Our third applicationis a quantification of the trade-off between neural networksrobustness and learning cost.","PeriodicalId":209524,"journal":{"name":"2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121504612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 35

Corrected Gossip Algorithms for Fast Reliable Broadcast on Unreliable Systems 在不可靠系统上快速可靠广播的修正八卦算法

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2017-05-01 DOI: 10.1109/IPDPS.2017.36

T. Hoefler, A. Barak, A. Shiloh, Z. Drezner

引用次数: 15