{"title":"Faithfully Rounded Floating-point Computations","authors":"M. Lange, S. Rump","doi":"10.1145/3290955","DOIUrl":"https://doi.org/10.1145/3290955","url":null,"abstract":"We present a pair arithmetic for the four basic operations and square root. It can be regarded as a simplified, more-efficient double-double arithmetic. The central assumption on the underlying arithmetic is the first standard model for error analysis for operations on a discrete set of real numbers. Neither do we require a floating-point grid nor a rounding to nearest property. Based on that, we define a relative rounding error unit u and prove rigorous error bounds for the computed result of an arbitrary arithmetic expression depending on u, the size of the expression, and possibly a condition measure. In the second part of this note, we extend the error analysis by examining requirements to ensure faithfully rounded outputs and apply our results to IEEE 754 standard conform floating-point systems. For a class of mathematical expressions, using an IEEE 754 standard conform arithmetic with base β, the result is proved to be faithfully rounded for up to 1 / √βu - 2 operations. Our findings cover a number of previously published algorithms to compute faithfully rounded results, among them Horner’s scheme, products, sums, dot products, or Euclidean norm. Beyond that, several other problems can be analyzed, such as polynomial interpolation, orientation problems, Householder transformations, or the smallest singular value of Hilbert matrices of large size.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"37 1","pages":"1 - 20"},"PeriodicalIF":0.0,"publicationDate":"2020-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85543795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"fenicsR13","authors":"Lambert Theisen, M. Torrilhon","doi":"10.1145/3442378","DOIUrl":"https://doi.org/10.1145/3442378","url":null,"abstract":"We present a mixed finite element solver for the linearized regularized 13-moment equations of non-equilibrium gas dynamics. The Python implementation builds upon the software tools provided by the FEniCS computing platform. We describe a new tensorial approach utilizing the extension capabilities of FEniCS’ Unified Form Language to define required differential operators for tensors above second degree. The presented solver serves as an example for implementing tensorial variational formulations in FEniCS, for which the documentation and literature seem to be very sparse. Using the software abstraction levels provided by the Unified Form Language allows an almost one-to-one correspondence between the underlying mathematics and the resulting source code. Test cases support the correctness of the proposed method using validation with exact solutions. To justify the usage of extended gas flow models, we discuss typical application cases involving rarefaction effects. We provide the documented and validated solver publicly.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"18 1","pages":"1 - 29"},"PeriodicalIF":0.0,"publicationDate":"2020-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83593945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Algorithm 1019: A Task-based Multi-shift QR/QZ Algorithm with Aggressive Early Deflation","authors":"Mirko Myllykoski","doi":"10.1145/3495005","DOIUrl":"https://doi.org/10.1145/3495005","url":null,"abstract":"The QR algorithm is one of the three phases in the process of computing the eigenvalues and the eigenvectors of a dense nonsymmetric matrix. This paper describes a task-based QR algorithm for reducing an upper Hessenberg matrix to real Schur form. The task-based algorithm also supports generalized eigenvalue problems (QZ algorithm) but this paper concentrates on the standard case. The task-based algorithm adopts previous algorithmic improvements, such as tightly-coupled multi-shifts and Aggressive Early Deflation (AED), and also incorporates several new ideas that significantly improve the performance. This includes, but is not limited to, the elimination of several synchronization points, the dynamic merging of previously separate computational steps, the shortening and the prioritization of the critical path, and experimental GPU support. The task-based implementation is demonstrated to be multiple times faster than multi-threaded LAPACK and ScaLAPACK in both single-node and multi-node configurations on two different machines based on Intel and AMD CPUs. The implementation is built on top of the StarPU runtime system and is part of the open-source StarNEig library.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"12 1","pages":"1 - 36"},"PeriodicalIF":0.0,"publicationDate":"2020-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80621438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. Anzt, T. Cojean, Goran Flegar, Fritz Göbel, Thomas Grützmacher, Pratik Nayak, T. Ribizel, Yu-Hsiang Tsai, E. S. Quintana‐Ortí
{"title":"Ginkgo: A Modern Linear Operator Algebra Framework for High Performance Computing","authors":"H. Anzt, T. Cojean, Goran Flegar, Fritz Göbel, Thomas Grützmacher, Pratik Nayak, T. Ribizel, Yu-Hsiang Tsai, E. S. Quintana‐Ortí","doi":"10.1145/3480935","DOIUrl":"https://doi.org/10.1145/3480935","url":null,"abstract":"In this article, we present Ginkgo, a modern C++ math library for scientific high performance computing. While classical linear algebra libraries act on matrix and vector objects, Ginkgo’s design principle abstracts all functionality as “linear operators,” motivating the notation of a “linear operator algebra library.” Ginkgo’s current focus is oriented toward providing sparse linear algebra functionality for high performance graphics processing unit (GPU) architectures, but given the library design, this focus can be easily extended to accommodate other algorithms and hardware architectures. We introduce this sophisticated software architecture that separates core algorithms from architecture-specific backends and provide details on extensibility and sustainability measures. We also demonstrate Ginkgo’s usability by providing examples on how to use its functionality inside the MFEM and deal.ii finite element ecosystems. Finally, we offer a practical demonstration of Ginkgo’s high performance on state-of-the-art GPU architectures.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"33 1","pages":"1 - 33"},"PeriodicalIF":0.0,"publicationDate":"2020-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88423799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Irksome: Automating Runge–Kutta Time-stepping for Finite Element Methods","authors":"P. Farrell, R. Kirby, J. Marchena-Menendez","doi":"10.1145/3466168","DOIUrl":"https://doi.org/10.1145/3466168","url":null,"abstract":"While implicit Runge–Kutta (RK) methods possess high order accuracy and important stability properties, implementation difficulties and the high expense of solving the coupled algebraic system at each time step are frequently cited as impediments. We present Irksome, a high-level library for manipulating UFL (Unified Form Language) expressions of semidiscrete variational forms to obtain UFL expressions for the coupled Runge–Kutta stage equations at each time step. Irksome works with the Firedrake package to enable the efficient solution of the resulting coupled algebraic systems. Numerical examples confirm the efficacy of the software and our solver techniques for various problems.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"1 1","pages":"1 - 26"},"PeriodicalIF":0.0,"publicationDate":"2020-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78263429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"H-Revolve","authors":"J. Herrmann, G. Pallez","doi":"10.1145/3378672","DOIUrl":"https://doi.org/10.1145/3378672","url":null,"abstract":"We study the problem of checkpointing strategies for adjoint computation on synchronous hierarchical platforms, specifically computational platforms with several levels of storage with different writing and reading costs. When reversing a large adjoint chain, choosing which data to checkpoint and where is a critical decision for the overall performance of the computation. We introduce H-Revolve, an optimal algorithm for this problem. We make it available in a public Python library along with the implementation of several state-of-the-art algorithms for the variant of the problem with two levels of storage. We provide a detailed description of how one can use this library in an adjoint computation software in the field of automatic differentiation or backpropagation. Finally, we evaluate the performance of H-Revolve and other checkpointing heuristics though an extensive campaign of simulation.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"2 1","pages":"1 - 25"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87378216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Algorithm 1010","authors":"A. G. Orellana, C. Michele","doi":"10.1145/3386241","DOIUrl":"https://doi.org/10.1145/3386241","url":null,"abstract":"Aiming to provide a very accurate, efficient, and robust quartic equation solver for physical applications, we have proposed an algorithm that builds on the previous works of P. Strobach and S. L. Shmakov. It is based on the decomposition of the quartic polynomial into two quadratics, whose coefficients are first accurately estimated by handling carefully numerical errors and afterward refined through the use of the Newton-Raphson method. Our algorithm is very accurate in comparison with other state-of-the-art solvers that can be found in the literature, but (most importantly) it turns out to be very efficient according to our timing tests. A crucial issue for us is the robustness of the algorithm, i.e., its ability to cope with the detrimental effect of round-off errors, no matter what set of quartic coefficients is provided in a practical application. In this respect, we extensively tested our algorithm in comparison to other quartic equation solvers both by considering specific extreme cases and by carrying out a statistical analysis over a very large set of quartics. Our algorithm has also been heavily tested in a physical application, i.e., simulations of hard cylinders, where it proved its absolute reliability as well as its efficiency.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"9 1","pages":"1 - 28"},"PeriodicalIF":0.0,"publicationDate":"2020-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88026388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bidiagonal SVD Computation via an Associated Tridiagonal Eigenproblem","authors":"O. Marques, J. Demmel, P. Vasconcelos","doi":"10.1145/3361746","DOIUrl":"https://doi.org/10.1145/3361746","url":null,"abstract":"The Singular Value Decomposition (SVD) is widely used in numerical analysis and scientific computing applications, including dimensionality reduction, data compression and clustering, and computation of pseudo-inverses. In many cases, a crucial part of the SVD of a general matrix is to find the SVD of an associated bidiagonal matrix. This article discusses an algorithm to compute the SVD of a bidiagonal matrix through the eigenpairs of an associated symmetric tridiagonal matrix. The algorithm enables the computation of only a subset of singular values and corresponding vectors, with potential performance gains. The article focuses on a sequential version of the algorithm, and discusses special cases and implementation details. The implementation, called BDSVDX, has been included in the LAPACK library. We use a large set of bidiagonal matrices to assess the accuracy of the implementation, both in single and double precision, as well as to identify potential shortcomings. The results show that BDSVDX can be up to three orders of magnitude faster than existing algorithms, which are limited to the computation of a full SVD. We also show comparisons of an implementation that uses BDSVDX as a building block for the computation of the SVD of general matrices.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"32 1 1","pages":"1 - 25"},"PeriodicalIF":0.0,"publicationDate":"2020-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75786779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
N. Brisebarre, Mioara Joldes, J. Muller, Ana-Maria Naneş, Joris Picot
{"title":"Error Analysis of Some Operations Involved in the Cooley-Tukey Fast Fourier Transform","authors":"N. Brisebarre, Mioara Joldes, J. Muller, Ana-Maria Naneş, Joris Picot","doi":"10.1145/3368619","DOIUrl":"https://doi.org/10.1145/3368619","url":null,"abstract":"We are interested in obtaining error bounds for the classical Cooley-Tukey fast Fourier transform algorithm in floating-point arithmetic, for the 2-norm as well as for the infinity norm. For that purpose, we also give some results on the relative error of the complex multiplication by a root of unity, and on the largest value that can take the real or imaginary part of one term of the fast Fourier transform of a vector x, assuming that all terms of x have real and imaginary parts less than some value b.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"65 1","pages":"1 - 27"},"PeriodicalIF":0.0,"publicationDate":"2020-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91084192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Algorithm 1008","authors":"Jose Maria Varas Casado, R. Hewson","doi":"10.1145/3378542","DOIUrl":"https://doi.org/10.1145/3378542","url":null,"abstract":"A Matlab class for multicomplex numbers was developed with particular attention paid to the robust and accurate handling of small imaginary components. This is primarily to allow the class to be used to obtain n-order derivative information using the multicomplex step method for, among other applications, gradient-based optimization and optimum control problems. The algebra of multicomplex numbers is described, as is its accurate computational implementation, considering small term approximations and the identification of principal values. The implementation of the method in Matlab is studied, and a class definition is constructed. This new class definition enables Matlab to handle n-order multicomplex numbers and perform arithmetic functions. It was found that with this method, the step size could be arbitrarily decreased toward machine precision. Use of the method to obtain up to the seventh derivative of functions is presented, as is timing data to demonstrate the efficiency of the class implementation.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"56 1","pages":"1 - 26"},"PeriodicalIF":0.0,"publicationDate":"2020-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78986444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}