2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)最新文献

筛选
英文 中文
A Robust Parallel Preconditioner for Indefinite Systems Using Hierarchical Matrices and Randomized Sampling 基于分层矩阵和随机抽样的不确定系统鲁棒并行预调节器
P. Ghysels, X. Li, C. Gorman, François-Henry Rouet
{"title":"A Robust Parallel Preconditioner for Indefinite Systems Using Hierarchical Matrices and Randomized Sampling","authors":"P. Ghysels, X. Li, C. Gorman, François-Henry Rouet","doi":"10.1109/IPDPS.2017.21","DOIUrl":"https://doi.org/10.1109/IPDPS.2017.21","url":null,"abstract":"We present the design and implementation of a parallel and fully algebraic preconditioner based on an approximate sparse factorization using low-rank matrix compression. The sparse factorization uses a multifrontal algorithm with fill-in occurring in dense frontal matrices. These frontal matrices are approximated as hierarchically semi-separable matrices, which are constructed using a randomized sampling technique. The resulting preconditioner has (close to) optimal complexity in terms of flops and memory usage for many discretized partial differential equations. We illustrate the robustness and performance of this new preconditioner for a number of unstructured grid problems. Initial results show that the rank-structured preconditioner could be a viable alternative to algebraic multigrid and incomplete LU, for instance. Our implementation uses MPI and OpenMP and supports real and complex arithmetic and 32 and 64 bit integers. We present a detailed performance analysis. The code is released as the STRUMPACK library with a BSD license, and a PETSc interface is available to allow for easy integration in existing applications.","PeriodicalId":209524,"journal":{"name":"2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128679263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Tight Load Balancing Via Randomized Local Search 基于随机局部搜索的紧负载均衡
P. Berenbrink, Peter Kling, Christopher Liaw, Abbas Mehrabian
{"title":"Tight Load Balancing Via Randomized Local Search","authors":"P. Berenbrink, Peter Kling, Christopher Liaw, Abbas Mehrabian","doi":"10.1109/IPDPS.2017.52","DOIUrl":"https://doi.org/10.1109/IPDPS.2017.52","url":null,"abstract":"We consider the following balls-into-bins process with n bins andmballs: Each ball is equipped with a mutually independent exponential clock of rate 1. Whenever a ball’s clock rings, the ball samples a random bin and moves there if the number of balls in the sampled bin is smaller than in its current bin. This simple process models a typical load balancing problem where users (balls) seek a selfish improvement of their assignment to resources (bins). From a game theoretic perspective, this is a randomized approach to the well-known KPmodel [1], while it is known as Randomized Local Search (RLS) in load balancing literature [2], [3]. Up to now, the best bound on the expected time to reach perfect balance was O((ln n)2+ln(n)⋅n 2/m) due to [3]. We improve this to an asymptotically tight O(ln(n)+n2/m). Our analysis is based on the crucial observation that performing destructive moves (reversals of RLS moves) cannot decrease the balancing time. This allows us to simplify problem instances and to ignore “inconvenient moves” in the analysis.","PeriodicalId":209524,"journal":{"name":"2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124174677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
SWhybrid: A Hybrid-Parallel Framework for Large-Scale Protein Sequence Database Search SWhybrid:用于大规模蛋白质序列数据库搜索的混合并行框架
Haidong Lan, Weiguo Liu, Yongchao Liu, B. Schmidt
{"title":"SWhybrid: A Hybrid-Parallel Framework for Large-Scale Protein Sequence Database Search","authors":"Haidong Lan, Weiguo Liu, Yongchao Liu, B. Schmidt","doi":"10.1109/IPDPS.2017.42","DOIUrl":"https://doi.org/10.1109/IPDPS.2017.42","url":null,"abstract":"Computer architectures continue to develop rapidly towards massively parallel and heterogeneous systems. Thus, easily extensible yet highly efficient parallelization approaches for a variety of platforms are urgently needed. In this paper, we present SWhybrid, a hybrid computing framework for large-scale biological sequence database search on heterogeneous computing environments with multi-core or many-core processing units (PUs) based on the Smith- Waterman (SW) algorithm. To incorporate a diverse set of PUs such as combinations of CPUs, GPUs and Xeon Phis, we abstract them as SIMD vector execution units with different number of lanes. We propose a machine model, associated with a unified programming interface implemented in C++, to abstract underlying architectural differences. Performance evaluation reveals that SWhybrid (i) outperforms all other tested state-of-the-art tools on both homogeneous and heterogeneous computing platforms, (ii) achieves an efficiency of over 80% on all tested CPUs and GPUs and over 70% on Xeon Phis, and (iii) achieves utlization rates of over 80% on all tested heterogeneous platforms. Our results demonstrate that there is enough commonality between vector-like instructions across CPUs and GPUs that one can develop higher-level abstractions and still specialize with close-to-peak performance. SWhybrid is open-source software and freely available at https://github.com/turbo0628/swhybrid.","PeriodicalId":209524,"journal":{"name":"2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124301809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Accelerating Spark Datasets by Inlining Deserialization 通过内联反序列化加速Spark数据集
2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2017-05-01 DOI: 10.1109/IPDPS.2017.111
Jan Wroblewski, K. Ishizaki, H. Inoue, Moriyoshi Ohara
{"title":"Accelerating Spark Datasets by Inlining Deserialization","authors":"Jan Wroblewski, K. Ishizaki, H. Inoue, Moriyoshi Ohara","doi":"10.1109/IPDPS.2017.111","DOIUrl":"https://doi.org/10.1109/IPDPS.2017.111","url":null,"abstract":"Apache Spark is a framework for distributed computing that supports the map-reduce programming model. The SQL module of Spark contains Datasets, i.e., distributed collections of records stored in a serialized low-level format in a manually managed chunk of memory. However, the functions users provide to the map-reduce computations expect Java objects. Datasets perform an additional deserialization step beforehand to support the user-provided function, which increases the overhead. We tackled this problem by replacing map functions with their counterparts that accepted the serialized data. This allowed us to skip the unnecessary part of deserialization and achieve faster data processing speeds.","PeriodicalId":209524,"journal":{"name":"2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127778385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Exploring DataVortex Systems for Irregular Applications 探索不规则应用的数据漩涡系统
2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2017-05-01 DOI: 10.1109/IPDPS.2017.121
R. Gioiosa, Antonino Tumeo, Jian Yin, T. Warfel, D. Haglin, S. Betelú
{"title":"Exploring DataVortex Systems for Irregular Applications","authors":"R. Gioiosa, Antonino Tumeo, Jian Yin, T. Warfel, D. Haglin, S. Betelú","doi":"10.1109/IPDPS.2017.121","DOIUrl":"https://doi.org/10.1109/IPDPS.2017.121","url":null,"abstract":"Emerging applications for data analytics and knowledge discovery typically have irregular or unpredictable communication patterns that do not scale well on parallel systems designed for traditional bulk-synchronous HPC applications. New network architectures that focus on minimizing (short) message latencies, rather than maximizing (large) transfer bandwidths, are emerging as possible alternatives to better support those applications with irregular communication patterns. We explore a system based upon one such novel network architecture, the Data Vortex interconnection network, and examine how this system performs by running benchmark code written for the Data Vortex network, as well as a reference MPI-over- Infiniband implementation, on the same cluster. Simple communication primitives (ping-pong and barrier synchronization), a few common communication kernels (distributed 1D Fast Fourier Transform, breadth-first search, Giga-Updates Per Second) and three prototype applications (a proxy application for simulating neutron transport-”SNAP”, a finite difference simulation for computing incompressible fluid flow, and an implementation of the heat equation) were all implemented for both network models. The results were compared and analyzed to determine what characteristics make an application a good candidate for porting to a Data Vortex system, and to what extent applications could potentially benefit from this new architecture.","PeriodicalId":209524,"journal":{"name":"2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126299354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Dynamic Adaptation in Wireless Networks Under Comprehensive Interference via Carrier Sense 基于载波感知的综合干扰无线网络动态自适应
Dongxiao Yu, Yuexuan Wang, Tigran Tonoyan, M. Halldórsson
{"title":"Dynamic Adaptation in Wireless Networks Under Comprehensive Interference via Carrier Sense","authors":"Dongxiao Yu, Yuexuan Wang, Tigran Tonoyan, M. Halldórsson","doi":"10.1109/IPDPS.2017.78","DOIUrl":"https://doi.org/10.1109/IPDPS.2017.78","url":null,"abstract":"Dynamic behavior is an essential part of wireless networking, due mobility, environmental changes or failures. We analyze a natural exponential backoff procedure to manage contention in a fading channel, in the presence of both node churn and link changes. We show that it attains a fast convergence, stabilizing contention from any state in logarithmic time. We use it to obtain optimal algorithm for Local Broadcast that even improves known results for the static case. The results illustrate the utility of carrier sensing, a stock feature of wireless nodes.","PeriodicalId":209524,"journal":{"name":"2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127906437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Argo NodeOS: Toward Unified Resource Management for Exascale Argo NodeOS:面向百亿亿级的统一资源管理
Swann Perarnau, Judicael A. Zounmevo, Matthieu Dreher, B. V. Essen, R. Gioiosa, K. Iskra, M. Gokhale, Kazutomo Yoshii, P. Beckman
{"title":"Argo NodeOS: Toward Unified Resource Management for Exascale","authors":"Swann Perarnau, Judicael A. Zounmevo, Matthieu Dreher, B. V. Essen, R. Gioiosa, K. Iskra, M. Gokhale, Kazutomo Yoshii, P. Beckman","doi":"10.1109/IPDPS.2017.25","DOIUrl":"https://doi.org/10.1109/IPDPS.2017.25","url":null,"abstract":"Exascale systems are expected to feature hundreds of thousands of compute nodes with hundreds of hardware threads and complex memory hierarchies with a mix of on-package and persistent memory modules. In this context, the Argo project is developing a new operating system for exascale machines. Targeting production workloads using workflows or coupled codes, we improve the Linux kernel on several fronts. We extendthe memory management of Linux to be able to subdivide NUMA memory nodes, allowing better resource partitioning among processes running on the same node. We also add support for memory-mapped access tonode-local, PCIe-attached NVRAM devices and introduce a new scheduling class targeted at parallel runtimes supporting user-level load balancing. These features are unified into compute containers, a containerization approach focused on providing modern HPC applications with dynamic control over a wide range of kernel interfaces. To keep our approach compatible with industrial containerization products, we also identifycontentions points for the adoption of containers in HPC settings. Each NodeOS feature is evaluated by using a set of parallel benchmarks, miniapps, and coupled applications consisting of simulation and data analysis components, running on a modern NUMA platform. We observe out-of-the-box performance improvements easily matching, and often exceeding, those observed with expert-optimized configurations on standard OS kernels. Our lightweight approach to resource management retains the many benefits of a full OS kernel that application programmers have learned to depend on, at the same time providing a set of extensions that can be freely mixed and matched to best benefit particular application components.","PeriodicalId":209524,"journal":{"name":"2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133059774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Aces4: A Platform for Computational Chemistry Calculations with Extremely Large Block-Sparse Arrays Aces4:一个超大块稀疏阵列计算化学计算平台
2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2017-05-01 DOI: 10.1109/IPDPS.2017.108
B. Sanders, J. Byrd, Nakul Jindal, V. Lotrich, Dmitry I. Liakh, A. Perera, R. Bartlett
{"title":"Aces4: A Platform for Computational Chemistry Calculations with Extremely Large Block-Sparse Arrays","authors":"B. Sanders, J. Byrd, Nakul Jindal, V. Lotrich, Dmitry I. Liakh, A. Perera, R. Bartlett","doi":"10.1109/IPDPS.2017.108","DOIUrl":"https://doi.org/10.1109/IPDPS.2017.108","url":null,"abstract":"Aces4 is a parallel programming platform comprising a DSL for Computational Chemistry and its runtime system. It offers a convenient way to express parallelism together with extensive support for extremely large, possibly sparse, distributed arrays. It aids scientists in the creation of performant, scalable, massively parallel programs that can effectively take advantage of leadership class computing systems to address important scientific questions. Aces4 has enabled the development and implementation of new methods in electronic structure theory which are breaking new ground in their ability to perform highly accurate calculations on ever larger molecular systems. In this paper the design of Aces4, which is based on the the Super Instruction Architecture approach, is described. Experimental scaling results for Molecular Cluster Perturbation Theory, a new method enabled by Aces4, and CCSD, a widely used computational chemistry method are given.","PeriodicalId":209524,"journal":{"name":"2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131323917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
When Neurons Fail 当神经元失效时
El Mahdi El Mhamdi, R. Guerraoui
{"title":"When Neurons Fail","authors":"El Mahdi El Mhamdi, R. Guerraoui","doi":"10.1109/IPDPS.2017.66","DOIUrl":"https://doi.org/10.1109/IPDPS.2017.66","url":null,"abstract":"Neural networks have been traditionally considered robust in the sense that their precision degrades gracefully with the failure of neurons and can be compensated by additional learning phases. Nevertheless, critical applications for which neural networks are now appealing solutions, cannot afford any additional learning at run-time. In this paper, we view a multilayer neural network as a distributed system of which neurons can fail independently, and we evaluate its robustness in the absence of any (recovery) learning phase. We give tight bounds on the number of neurons that can fail without harming the result of a computation. To determine our bounds, we leverage the fact that neuralactivation functions are Lipschitz-continuous. Our bound isgiven in the form of quantity, we call the Forward ErrorPropagation, computing this quantity only requires looking atthe topology of the network, while experimentally assessingthe robustness of a network requires the costly experiment oflooking at all the possible inputs and testing all the possibleconfigurations of the network corresponding to different failuresituations, facing a discouraging combinatorial explosion. We distinguish the case of neurons that can fail and stop their activity (crashed neurons) from the case of neurons that can fail by transmitting arbitrary values (Byzantine neurons). In the crash case, our bound involves the number of neuronsper layer, the Lipschitz constant of the neural activationfunction, the number of failing neurons, the synaptic weightsand the depth of the layer where the failure occurred. In thecase of Byzantine failures, our bound involves, in addition, thesynaptic transmission capacity. Interestingly, as we show inthe paper, our bound can easily be extended to the case wheresynapses can fail. We present three applications of our results. The first is aquantification of the effect of memory cost reduction on theaccuracy of a neural network. The second is a quantification ofthe amount of information any neuron needs from its precedinglayer, enabling thereby a boosting scheme that prevents neuronsfrom waiting for unnecessary signals. Our third applicationis a quantification of the trade-off between neural networksrobustness and learning cost.","PeriodicalId":209524,"journal":{"name":"2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121504612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Corrected Gossip Algorithms for Fast Reliable Broadcast on Unreliable Systems 在不可靠系统上快速可靠广播的修正八卦算法
T. Hoefler, A. Barak, A. Shiloh, Z. Drezner
{"title":"Corrected Gossip Algorithms for Fast Reliable Broadcast on Unreliable Systems","authors":"T. Hoefler, A. Barak, A. Shiloh, Z. Drezner","doi":"10.1109/IPDPS.2017.36","DOIUrl":"https://doi.org/10.1109/IPDPS.2017.36","url":null,"abstract":"Large-scale parallel programming environments and algorithms require efficient group-communication on computing systems with failing nodes. Existing reliable broadcast algorithms either cannot guarantee that all nodes are reached or are very expensive in terms of the number of messages and latency. This paper proposes Corrected-Gossip, a method that combines Monte Carlo style gossiping with a deterministic correction phase, to construct a Las Vegas style reliable broadcast that guarantees reaching all the nodes at low cost. We analyze the performance of this method both analytically and by simulations and show how it reduces the latency and network load compared to existing algorithms. Our method improves the latency by 20% and the network load by 53% compared to the fastest known algorithm on 4,096 nodes. We believe that the principle of corrected-gossip opens an avenue for many other reliable group communication operations.","PeriodicalId":209524,"journal":{"name":"2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122838338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信