2016 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA)最新文献

Towards Fast Scalable Solvers for Charge Equilibration in Molecular Dynamics Applications 分子动力学中电荷平衡的快速可扩展求解方法研究

2016 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA) Pub Date : 2016-11-13 DOI: 10.1109/SCALA.2016.6

Kurt A. O'Hearn, H. Aktulga

{"title":"Towards Fast Scalable Solvers for Charge Equilibration in Molecular Dynamics Applications","authors":"Kurt A. O'Hearn, H. Aktulga","doi":"10.1109/SCALA.2016.6","DOIUrl":"https://doi.org/10.1109/SCALA.2016.6","url":null,"abstract":"Including atom polarizability in molecular dynamics (MD) simulations is important for high-fidelity simulations. Solvers for charge models that are used to dynamically determine atom polarizations constitute significant bottlenecks in terms of time-to-solution and the overall scalability of polarizable and reactive force fields. The objective of this work is to improve the performance of the charge equilibration (QEq) method on shared memory architectures. A number of parallel incomplete LU-based preconditioning techniques are explored to enhance the performance of the Krylov subspace methods used in the QEq model. Detailed analysis of how these techniques effect convergence rate and the overall solver performance is presented. ILU-based schemes which produce good quality factors with relatively low number of nonzeros have been observed to yield significant speedups over the diagonal inverse baseline preconditioner. These results are significant as they can enable efficient simulations of moderate-sized systems on a single node with several cores, and also because they can constitute the future building blocks for distributed memory parallel solvers.","PeriodicalId":410521,"journal":{"name":"2016 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116071936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Optimizing PLASMA Eigensolver on Large Shared Memory Systems 大型共享内存系统的等离子体特征求解优化

2016 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA) Pub Date : 2016-11-13 DOI: 10.1109/SCALA.2016.14

Cheng Liao

引用次数: 0

Left-Preconditioned Communication-Avoiding Conjugate Gradient Methods for Multiphase CFD Simulations on the K Computer K计算机上多相CFD模拟的左预条件通信避免共轭梯度法

2016 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA) Pub Date : 2016-11-13 DOI: 10.1109/SCALA.2016.7

Akie Mayumi, Y. Idomura, Takuya Ina, S. Yamada, Toshiyuki Imamura

{"title":"Left-Preconditioned Communication-Avoiding Conjugate Gradient Methods for Multiphase CFD Simulations on the K Computer","authors":"Akie Mayumi, Y. Idomura, Takuya Ina, S. Yamada, Toshiyuki Imamura","doi":"10.1109/SCALA.2016.7","DOIUrl":"https://doi.org/10.1109/SCALA.2016.7","url":null,"abstract":"The left-preconditioned communication avoiding conjugate gradient (LP-CA-CG) method is applied to the pressure Poisson equation in the multiphase CFD code JUPITER. The arithmetic intensity of the LP-CA-CG method is analyzed, and is dramatically improved by loop splitting for inner product operations and for three term recurrence operations. Two LPCA-CG solvers with block Jacobi preconditioning and with underlap preconditioning are developed. The former is developed based on a hybrid CA approach, in which CA is applied only to global collective communications for inner product operations. The latter is a full CA approach, in which CA is applied also to local point-to-point communications in sparse matrix-vector (SpMV) operations and preconditioning. CA-SpMV requires additional computation for overlapping regions. CA-preconditiong is enabled by underlap preconditioning, which approximates preconditioning for overlapping regions by point Jacobi preconditioning. It is shown that on the K computer, the former is faster, because the performance of local point-to-point communications scales well, and the convergence property becomes worse with underlap preconditioning. The LP-CA-CG solver shows good strong scaling up to 30,000 nodes, where the LP-CA-CG solver achieved higher performance than the original CG solver by reducing the cost of global collective communications by 69 percent.","PeriodicalId":410521,"journal":{"name":"2016 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116507294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Performance Scaling Variability and Energy Analysis for a Resilient ULFM-based PDE Solver 基于弹性ulfm的PDE求解器的性能、尺度可变性和能量分析

2016 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA) Pub Date : 2016-11-13 DOI: 10.1109/SCALA.2016.10

Karla Morris, F. Rizzi, Brendan Cook, Paul Mycek, O. Maître, O. Knio, K. Sargsyan, K. Dahlgren, B. Debusschere

{"title":"Performance Scaling Variability and Energy Analysis for a Resilient ULFM-based PDE Solver","authors":"Karla Morris, F. Rizzi, Brendan Cook, Paul Mycek, O. Maître, O. Knio, K. Sargsyan, K. Dahlgren, B. Debusschere","doi":"10.1109/SCALA.2016.10","DOIUrl":"https://doi.org/10.1109/SCALA.2016.10","url":null,"abstract":"We present a resilient task-based domain-decomposition preconditioner for partial differential equations (PDEs) built on top of User Level Fault Mitigation Message Passing Interface (ULFM-MPI). The algorithm reformulates the PDE as a sampling problem, followed by a robust regression-based solution update that is resilient to silent data corruptions (SDCs). We adopt a server-client model where all state information is held by the servers, while clients only serve as computational units. The task-based nature of the algorithm and the capabilities of ULFM complement each other to support missing tasks, making the application resilient to clients failing.We present weak and strong scaling results on Edison, National Energy Research Scientific Computing Center (NERSC), for a nominal and a fault-injected case, showing that even in the presence of faults, scalability tested up to 50k cores is within 90%. We then quantify the variability of weak and strong scaling due to the presence of faults. Finally, we discuss the performance of our application with respect to subdomain size, server/client configuration, and the interplay between energy and resilience.","PeriodicalId":410521,"journal":{"name":"2016 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132361546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

On Monte Carlo Hybrid Methods for Linear Algebra 线性代数的蒙特卡罗混合方法

2016 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA) Pub Date : 2016-11-13 DOI: 10.1109/SCALA.2016.15

Diego Davila, V. Alexandrov, Oscar A. Esquivel-Flores

{"title":"On Monte Carlo Hybrid Methods for Linear Algebra","authors":"Diego Davila, V. Alexandrov, Oscar A. Esquivel-Flores","doi":"10.1109/SCALA.2016.15","DOIUrl":"https://doi.org/10.1109/SCALA.2016.15","url":null,"abstract":"This paper presents an enhanced hybrid (e.g. stochastic/deterministic) method for Linear Algebra based on bulding an efficient stochastic s and then solving the corresponding System of Linear Algebraic Equations (SLAE) by applying an iterative method. This is a Monte Carlo preconditioner based on Markov Chain Monte Carlo (MCMC) methods to compute a rough approximate matrix inverse first. The above Monte Carlo preconditioner is further used to solve systems of linear algebraic equations thus delivering hybrid stochastic/deterministic algorithms. The advantage of the proposed approach is that the sparse Monte Carlo matrix inversion has a computational complexity linear of the size of the matrix, it is inherently parallel and thus can be obtained very efficiently for large matrices and can be used also as an efficient preconditioner while solving systems of linear algebraic equations. Several improvements, as well as the mixed MPI/OpenMP implementation, are carried out that enhance the scalability of the method and the efficient use of computational resources. A set of different test matrices from several matrix market collections were used to show the consistency of these improvements.","PeriodicalId":410521,"journal":{"name":"2016 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129163909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Batched Generation of Incomplete Sparse Approximate Inverses on GPUs gpu上不完全稀疏近似逆的批量生成

2016 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA) Pub Date : 2016-11-13 DOI: 10.1109/SCALA.2016.11

H. Anzt, Edmond Chow, T. Huckle, J. Dongarra

引用次数: 15

Randomized Sketching for Large-Scale Sparse Ridge Regression Problems 大规模稀疏脊回归问题的随机素描

2016 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA) Pub Date : 2016-11-13 DOI: 10.1109/SCALA.2016.13

Chander Iyer, C. Carothers, P. Drineas

引用次数: 4

The Gyrokinetic Particle Simulation of Fusion Plasmas on Tianhe-2 Supercomputer 聚变等离子体在天河二号超级计算机上的回旋动力学粒子模拟

2016 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA) Pub Date : 2016-11-13 DOI: 10.1109/SCALA.2016.8

Endong Wang, Shaohua Wu, Qing Zhang, Jun Liu, Wenlu Zhang, Zhihong Lin, Yutong Lu, Yunfei Du, Xiaoqian Zhu

引用次数: 5

Effective Dynamic Load Balance using Space-Filling Curves for Large-Scale SPH Simulations on GPU-rich Supercomputers 基于空间填充曲线的大规模SPH模拟的有效动态负载平衡

2016 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA) Pub Date : 2016-11-13 DOI: 10.1109/SCALA.2016.5

Satori Tsuzuki, T. Aoki

{"title":"Effective Dynamic Load Balance using Space-Filling Curves for Large-Scale SPH Simulations on GPU-rich Supercomputers","authors":"Satori Tsuzuki, T. Aoki","doi":"10.1109/SCALA.2016.5","DOIUrl":"https://doi.org/10.1109/SCALA.2016.5","url":null,"abstract":"Billion of particles are required to describe fluid dynamics by using smoothed particle hydrodynamics (SPH), which computes short-range interactions among particles. In this study, we develop a novel code of large-scale SPH simulations on a multi-GPU platform by using the domain decomposition technique. The computational load of each decomposed domain is dynamically balanced by applying domain re-decomposition, which maintains the same number of particles in each decomposed domain. The performance scalability of the SPH simulation is examined on the GPUs of a TSUBAME 2.5 supercomputer by using two different techniques of dynamic load balance: the slice-grid method and the hierarchical domain decomposition method using the space-filling curve. The weak and strong scalabilities of a test case using 111 million particles are measured with 512 GPUs. In comparison with the slice-grid method, the performance keeps improving in proportion to the number of GPUs in the case of the space-filling curve. The Hilbert curve and the Peano curve show better performance scalabilities than the Morton curve in proportion to the increase in the number of GPUs.","PeriodicalId":410521,"journal":{"name":"2016 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122750530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

A Massively Parallel Distributed N-body Application Implemented with HPX 用HPX实现的大规模并行分布式n体应用

2016 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA) Pub Date : 2016-11-13 DOI: 10.1109/SCALA.2016.12

Zahra Khatami, Hartmut Kaiser, Patricia A. Grubel, Adrian Serio, J. Ramanujam

{"title":"A Massively Parallel Distributed N-body Application Implemented with HPX","authors":"Zahra Khatami, Hartmut Kaiser, Patricia A. Grubel, Adrian Serio, J. Ramanujam","doi":"10.1109/SCALA.2016.12","DOIUrl":"https://doi.org/10.1109/SCALA.2016.12","url":null,"abstract":"One of the major challenges in parallelization is the difficulty of improving application scalability with conventional techniques. HPX provides efficient scalable parallelism by significantly reducing node starvation and effective latencies while controlling the overheads. In this paper, we present a new highly scalable parallel distributed N-Body application using a future-based algorithm, which is implemented with HPX. The main difference between this algorithm and prior art is that a future-based request buffer is used between different nodes and along each spatial direction to send/receive data to/from the remote nodes, which helps removing synchronization barriers. HPX provides an asynchronous programming model which results in improving the parallel performance. The results of using HPX for parallelizing Octree construction on one node and the force computation on the distributed nodes show the scalability improvement on an average by about 45% compared to an equivalent OpenMP implementation and 28% compared to a hybrid implementation (MPI+OpenMP) [1] respectively for one billion particles running on up to 128 nodes with 20 cores per each.","PeriodicalId":410521,"journal":{"name":"2016 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126684461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11