ACM Transactions on Mathematical Software (TOMS)最新文献_第10页

ACM Transactions on Mathematical Software (TOMS) Pub Date : 2019-08-08 DOI: 10.1145/3328731

A. P. Diéguez, M. Amor, R. Doallo

{"title":"Tree Partitioning Reduction","authors":"A. P. Diéguez, M. Amor, R. Doallo","doi":"10.1145/3328731","DOIUrl":"https://doi.org/10.1145/3328731","url":null,"abstract":"Solving tridiagonal linear-equation systems is a fundamental computing kernel in a wide range of scientific and engineering applications, and its computation can be modeled with parallel algorithms. These parallel solvers are typically designed to compute problems whose data fit in a common shared-memory space where all the cores taking part in the computation have access. However, when the problem size is large, data cannot be entirely stored in the common shared-memory space, and a high number of high-latency communications are performed. One alternative is to partition the problem among different memory spaces. At this point, conventional parallel algorithms do not facilitate the partition of computation in independent tiles, since each reduction depends on equations that may be in different tiles. This article proposes an algorithm based on a tree reduction, called the Tree Partitioning Reduction (TPR) method, which partitions the problem into independent slices that can be partially computed in parallel within different common shared-memory spaces. The TPR method can be implemented for any parallel and distributed programming paradigm. Furthermore, in this work, TPR is efficiently implemented for CUDA GPUs to solve large size problems, providing highly competitive performance results with respect to existing packages, being, on average, 22.03× faster than CUSPARSE.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"408 1","pages":"1 - 26"},"PeriodicalIF":0.0,"publicationDate":"2019-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76467447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Algorithm 998 算法998

ACM Transactions on Mathematical Software (TOMS) Pub Date : 2019-08-08 DOI: 10.1145/3323925

C. Agulhari, Alexandre Felipe, R. Oliveira, P. Peres

引用次数: 48

Algorithm 997 算法997

ACM Transactions on Mathematical Software (TOMS) Pub Date : 2019-08-08 DOI: 10.1145/3310410

R. Speck

{"title":"Algorithm 997","authors":"R. Speck","doi":"10.1145/3310410","DOIUrl":"https://doi.org/10.1145/3310410","url":null,"abstract":"In this article, we present the Python framework pySDC for solving collocation problems with spectral deferred correction (SDC) methods and their time-parallel variant PFASST, the parallel full approximation scheme in space and time. pySDC features many implementations of SDC and PFASST, from simple implicit timestepping to high-order implicit-explicit or multi-implicit splitting and multilevel SDCs. The software package comes with many different, preimplemented examples and has seven tutorials to help new users with their first steps. Time parallelism is implemented either in an emulated way for debugging and prototyping or using MPI for benchmarking. The code is fully documented and tested using continuous integration, including most results of previous publications. Here, we describe the structure of the code by taking two different perspectives: those of the user and those of the developer. The first sheds light on the front-end, the examples, and the tutorials, and the second is used to describe the underlying implementation and the data structures. We show three different examples to highlight various aspects of the implementation, the capabilities, and the usage of pySDC. In addition, couplings to the FEniCS framework and PETSc, the latter including spatial parallelism with MPI, are described.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"72 1","pages":"1 - 23"},"PeriodicalIF":0.0,"publicationDate":"2019-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88231176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

GraphBLAST: A High-Performance Linear Algebra-based Graph Framework on the GPU GraphBLAST:基于GPU的高性能线性代数图形框架

ACM Transactions on Mathematical Software (TOMS) Pub Date : 2019-08-04 DOI: 10.1145/3466795

Carl Yang, A. Buluç, John Douglas Owens

{"title":"GraphBLAST: A High-Performance Linear Algebra-based Graph Framework on the GPU","authors":"Carl Yang, A. Buluç, John Douglas Owens","doi":"10.1145/3466795","DOIUrl":"https://doi.org/10.1145/3466795","url":null,"abstract":"High-performance implementations of graph algorithms are challenging to implement on new parallel hardware such as GPUs because of three challenges: (1) the difficulty of coming up with graph building blocks, (2) load imbalance on parallel hardware, and (3) graph problems having low arithmetic intensity. To address some of these challenges, GraphBLAS is an innovative, on-going effort by the graph analytics community to propose building blocks based on sparse linear algebra, which allow graph algorithms to be expressed in a performant, succinct, composable, and portable manner. In this paper, we examine the performance challenges of a linear-algebra-based approach to building graph frameworks and describe new design principles for overcoming these bottlenecks. Among the new design principles is exploiting input sparsity, which allows users to write graph algorithms without specifying push and pull direction. Exploiting output sparsity allows users to tell the backend which values of the output in a single vectorized computation they do not want computed. Load-balancing is an important feature for balancing work amongst parallel workers. We describe the important load-balancing features for handling graphs with different characteristics. The design principles described in this paper have been implemented in “GraphBLAST”, the first high-performance linear algebra-based graph framework on NVIDIA GPUs that is open-source. The results show that on a single GPU, GraphBLAST has on average at least an order of magnitude speedup over previous GraphBLAS implementations SuiteSparse and GBTL, comparable performance to the fastest GPU hardwired primitives and shared-memory graph frameworks Ligra and Gunrock, and better performance than any other GPU graph framework, while offering a simpler and more concise programming model.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"12 1","pages":"1 - 51"},"PeriodicalIF":0.0,"publicationDate":"2019-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75148402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 62

Adjoint Code Design Patterns 伴随代码设计模式

ACM Transactions on Mathematical Software (TOMS) Pub Date : 2019-07-30 DOI: 10.1145/3326162

U. Naumann

{"title":"Adjoint Code Design Patterns","authors":"U. Naumann","doi":"10.1145/3326162","DOIUrl":"https://doi.org/10.1145/3326162","url":null,"abstract":"Adjoint methods have become fundamental ingredients of the scientific computing toolbox over the past decades. Large-scale parameter sensitivity analysis, uncertainty quantification, and nonlinear optimization would otherwise turn out computationally infeasible. The symbolic derivation of adjoint mathematical models for relevant problems in science and engineering and their implementation in consistency with the implementation of the underlying primal model frequently proves highly challenging. Hence, an increased interest in algorithmic adjoints can be observed. The algorithmic derivation of adjoint numerical simulation programs shifts some of the problems faced from functional and numerical analysis to computer science. It becomes a highly complex software engineering task requiring expertise in software analysis, transformation, and optimization. Despite rather mature software tool support for algorithmic differentiation, substantial user intervention is typically required when targeting nontrivial numerical programs. A large number of patterns shared by numerous application codes results in repeated duplication of development effort. The adjoint code design patterns introduced in this article aim to reduce this problem through improved formalization from the software engineering perspective. Fully functional reference implementations are provided through github.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"122 1","pages":"1 - 32"},"PeriodicalIF":0.0,"publicationDate":"2019-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87665183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Enclosing Chebyshev Expansions in Linear Time 线性时间中的切比雪夫展开式

ACM Transactions on Mathematical Software (TOMS) Pub Date : 2019-07-30 DOI: 10.1145/3319395

B. Hashemi

{"title":"Enclosing Chebyshev Expansions in Linear Time","authors":"B. Hashemi","doi":"10.1145/3319395","DOIUrl":"https://doi.org/10.1145/3319395","url":null,"abstract":"We consider the problem of computing rigorous enclosures for polynomials represented in the Chebyshev basis. Our aim is to compare and develop algorithms with a linear complexity in terms of the polynomial degree. A first category of methods relies on a direct interval evaluation of the given Chebyshev expansion in which Chebyshev polynomials are bounded, e.g., with a divide-and-conquer strategy. Our main category of methods that are based on the Clenshaw recurrence includes interval Clenshaw with defect correction (ICDC), and the spectral transformation of Clenshaw recurrence rewritten as a discrete dynamical system. An extension of the barycentric representation to interval arithmetic is also considered that has a log-linear complexity as it takes advantage of a verified discrete cosine transform. We compare different methods and provide illustrative numerical experiments. In particular, our eigenvalue-based methods are interesting for bounding the range of high-degree interval polynomials. Some of the methods rigorously compute narrow enclosures for high-degree Chebyshev expansions at thousands of points in a few seconds on an average computer. We also illustrate how to employ our methods as an automatic a posteriori forward error analysis tool to monitor the accuracy of the Chebfun feval command.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"160 1","pages":"1 - 33"},"PeriodicalIF":0.0,"publicationDate":"2019-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76973574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

High-performance Implementation of Elliptic Curve Cryptography Using Vector Instructions 使用矢量指令的椭圆曲线加密的高性能实现

ACM Transactions on Mathematical Software (TOMS) Pub Date : 2019-07-30 DOI: 10.1145/3309759

Armando Faz-Hernández, Julio López, R. Dahab

引用次数: 26

Algorithm 995 算法995

ACM Transactions on Mathematical Software (TOMS) Pub Date : 2019-07-18 DOI: 10.1145/3301321

Juliette Pardue, Andrey N. Chernikov

{"title":"Algorithm 995","authors":"Juliette Pardue, Andrey N. Chernikov","doi":"10.1145/3301321","DOIUrl":"https://doi.org/10.1145/3301321","url":null,"abstract":"A bottom-up approach to parallel anisotropic mesh generation is presented by building a mesh generator starting from the basic operations of vertex insertion and Delaunay triangles. Applications focusing on high-lift design or dynamic stall, or numerical methods and modeling test cases, still focus on two-dimensional domains. This automated parallel mesh generation approach can generate high-fidelity unstructured meshes with anisotropic boundary layers for use in the computational fluid dynamics field. The anisotropy requirement adds a level of complexity to a parallel meshing algorithm by making computation depend on the local alignment of elements, which in turn is dictated by geometric boundaries and the density functions— one-dimensional spacing functions generated from an exponential distribution. This approach yields computational savings in mesh generation and flow solution through well-shaped anisotropic triangles instead of isotropic triangles. The validity of the meshes is shown through solution characteristic comparisons to verified reference solutions. A 79% parallel weak scaling efficiency on 1,024 distributed memory nodes, and a 72% parallel efficiency over the fastest sequential isotropic mesh generator on 512 distributed memory nodes, is shown through numerical experiments.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"33 1","pages":"1 - 30"},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84554039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Algorithm 994 算法994

ACM Transactions on Mathematical Software (TOMS) Pub Date : 2019-06-05 DOI: 10.1145/3302389

F. Hernando, Francisco D. Igual, G. Quintana-Ortí

引用次数: 4

CGPOPS CGPOPS

ACM Transactions on Mathematical Software (TOMS) Pub Date : 2019-05-28 DOI: 10.1145/3390463

Yunus M. Agamawi, Anil V. Rao

引用次数: 18