J. Román, F. Alvarruiz, C. Campos, Lisandro Dalcin, P. Jolivet, A. L. Daviña
{"title":"Improvements to SLEPc in Releases 3.14–3.18","authors":"J. Román, F. Alvarruiz, C. Campos, Lisandro Dalcin, P. Jolivet, A. L. Daviña","doi":"10.1145/3603373","DOIUrl":"https://doi.org/10.1145/3603373","url":null,"abstract":"This short article describes the main new features added to SLEPc, the Scalable Library for Eigenvalue Problem Computations, in the past two and a half years, corresponding to five release versions. The main novelty is the extension of the SVD module with new problem types, such as the generalized SVD or the hyperbolic SVD. Additionally, many improvements have been incorporated in different parts of the library, including contour integral eigensolvers, preconditioning, and GPU support.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"49 1","pages":"1 - 11"},"PeriodicalIF":2.7,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64077639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Algorithms for Parallel Generic hp-adaptive Finite Element Software","authors":"Marc Fehling, Wolfgang Bangerth","doi":"https://dl.acm.org/doi/10.1145/3603372","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3603372","url":null,"abstract":"<p>The <i>hp</i>-adaptive finite element method (FEM) – where one independently chooses the mesh size (<i>h</i>) and polynomial degree (<i>p</i>) to be used on each cell – has long been known to have better theoretical convergence properties than either <i>h</i>- or <i>p</i>-adaptive methods alone. However, it is not widely used, owing at least in parts to the difficulty of the underlying algorithms and the lack of widely usable implementations. This is particularly true when used with continuous finite elements. </p><p>Herein, we discuss algorithms that are necessary for a comprehensive and generic implementation of <i>hp</i>-adaptive finite element methods on distributed-memory, parallel machines. In particular, we will present a multi-stage algorithm for the unique enumeration of degrees of freedom (DoFs) suitable for continuous finite element spaces, describe considerations for weighted load balancing, and discuss the transfer of variable size data between processes. We illustrate the performance of our algorithms with numerical examples, and demonstrate that they scale reasonably up to at least 16 384 Message Passing Interface (MPI) processes. </p><p>We provide a reference implementation of our algorithms as part of the open-source library <monospace>deal.II</monospace>.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"37 3","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vincent Lefèvre, Nicolas Louvet, Jean-Michel Muller, Joris Picot, Laurence Rideau
{"title":"Accurate Calculation of Euclidean Norms Using Double-word Arithmetic","authors":"Vincent Lefèvre, Nicolas Louvet, Jean-Michel Muller, Joris Picot, Laurence Rideau","doi":"https://dl.acm.org/doi/10.1145/3568672","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3568672","url":null,"abstract":"<p>We consider the computation of the Euclidean (or L2) norm of an <i>n</i>-dimensional vector in floating-point arithmetic. We review the classical solutions used to avoid spurious overflow or underflow and/or to obtain very accurate results. We modify a recently published algorithm (that uses double-word arithmetic) to allow for a very accurate solution, free of spurious overflows and underflows. To that purpose, we use a double-word square-root algorithm of which we provide a tight error analysis. The returned L2 norm will be within very slightly more than 0.5 ulp from the exact result, which means that we will almost always provide correct rounding.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"116 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138543481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maxence Reberol, Kilian Verhetsel, François Henrotte, David Bommes, Jean-François Remacle
{"title":"Robust Topological Construction of All-hexahedral Boundary Layer Meshes","authors":"Maxence Reberol, Kilian Verhetsel, François Henrotte, David Bommes, Jean-François Remacle","doi":"https://dl.acm.org/doi/10.1145/3577196","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3577196","url":null,"abstract":"<p>We present a robust technique to build a topologically optimal all-hexahedral layer on the boundary of a model with arbitrarily complex ridges and corners. The generated boundary layer mesh strictly respects the geometry of the input surface mesh, and it is optimal in the sense that the hexahedral valences of the boundary edges are as close as possible to their ideal values (local dihedral angle divided by 90°). Starting from a valid watertight surface mesh (all-quad in practice), we build a global optimization integer programming problem to minimize the mismatch between the hexahedral valences of the boundary edges and their ideal values. The formulation of the integer programming problem relies on the duality between boundary hexahedral configurations and triangulations of the disk, which we reframe in terms of integer constraints. The global problem is solved efficiently by performing combinatorial branch-and-bound searches on a series of sub-problems defined in the vicinity of complicated ridges/corners, where the local mesh topology is necessarily irregular because of the inherent constraints in hexahedral meshes. From the integer solution, we build the topology of the all-hexahedral layer, and the mesh geometry is computed by untangling/smoothing. Our approach is fully automated, topologically robust, and fast.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"20 8","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thomas Lux, Layne T. Watson, Tyler Chang, William Thacker
{"title":"Algorithm 1031: MQSI—Monotone Quintic Spline Interpolation","authors":"Thomas Lux, Layne T. Watson, Tyler Chang, William Thacker","doi":"https://dl.acm.org/doi/10.1145/3570157","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3570157","url":null,"abstract":"<p>MQSI is a Fortran 2003 subroutine for constructing monotone quintic spline interpolants to univariate monotone data. Using sharp theoretical monotonicity constraints, first and second derivative estimates at data provided by a quadratic facet model are refined to produce a univariate C<sup>2</sup> monotone interpolant. Algorithm and implementation details, complexity and sensitivity analyses, usage information, a brief performance study, and comparisons with other spline approaches are included.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"41 ","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Certifying Zeros of Polynomial Systems Using Interval Arithmetic","authors":"Paul Breiding, Kemal Rose, Sascha Timme","doi":"https://dl.acm.org/doi/10.1145/3580277","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3580277","url":null,"abstract":"<p>We establish interval arithmetic as a practical tool for certification in numerical algebraic geometry. Our software <monospace>HomotopyContinuation.jl</monospace> now has a built-in function <monospace>certify</monospace>, which proves the correctness of an isolated nonsingular solution to a square system of polynomial equations. The implementation rests on Krawczyk’s method. We demonstrate that it dramatically outperforms earlier approaches to certification. We see this contribution as a powerful new tool in numerical algebraic geometry, which can make certification the default and not just an option.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"75 ","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Algorithm 1034: An Accelerated Algorithm to Compute the Qn Robust Statistic, with Corrections to Constants","authors":"Thierry Fahmy","doi":"https://dl.acm.org/doi/10.1145/3576920","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3576920","url":null,"abstract":"<p>The robust scale estimator <i>Q<sub>n</sub></i> developed by Croux and Rousseeuw [3], for the computation of which they provided a deterministic algorithm, has proven to be very useful in several domains including in quality management and time series analysis. It has interesting mathematical (50% breakdown, 82% Asymptotic Relative Efficiency) and computing (<i>O(nlogn)</i> time, <i>O</i>(<i>n</i>) space) properties. While working on a faster algorithm to compute <i>Q<sub>n</sub></i>, we have discovered an error in the computation of the <i>d</i> constant, and as a consequence in the <i>d<sub>n</sub></i> constants that are used to scale the statistic for consistency with the variance of a normal sample. These errors have been reproduced in several articles including in the International Standard Organisation 13,528 [12] document. In this article, we fix the errors and present a new approach, which includes a new algorithm, allowing computations to run 1.3 to 4.5 times faster when <i>n</i> grows from 10 to 100,000.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"72 ","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Event-Based Automatic Differentiation of OpenMP with OpDiLib","authors":"Johannes Blühdorn, Max Sagebaum, Nicolas Gauger","doi":"https://dl.acm.org/doi/10.1145/3570159","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3570159","url":null,"abstract":"<p>We present the new software OpDiLib, a universal add-on for classical operator overloading AD tools that enables the automatic differentiation (AD) of OpenMP parallelized code. With it, we establish support for OpenMP features in a reverse mode operator overloading AD tool to an extent that was previously only reported on in source transformation tools. We achieve this with an event-based implementation ansatz that is unprecedented in AD. Combined with modern OpenMP features around OMPT, we demonstrate how it can be used to achieve differentiation without any additional modifications of the source code; neither do we impose a priori restrictions on the data access patterns, which makes OpDiLib highly applicable. For further performance optimizations, restrictions like atomic updates on adjoint variables can be lifted in a fine-grained manner. OpDiLib can also be applied in a semi-automatic fashion via a macro interface, which supports compilers that do not implement OMPT. We demonstrate the applicability of OpDiLib for a pure operator overloading approach in a hybrid parallel environment. We quantify the cost of atomic updates on adjoint variables and showcase the speedup and scaling that can be achieved with the different configurations of OpDiLib in both the forward and the reverse pass.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"49 ","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Patrick Amestoy, Alfredo Buttari, Nicholas J. Higham, Jean-Yves L’Excellent, Theo Mary, Bastien Vieublé
{"title":"Combining Sparse Approximate Factorizations with Mixed-precision Iterative Refinement","authors":"Patrick Amestoy, Alfredo Buttari, Nicholas J. Higham, Jean-Yves L’Excellent, Theo Mary, Bastien Vieublé","doi":"https://dl.acm.org/doi/10.1145/3582493","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3582493","url":null,"abstract":"<p>The standard LU factorization-based solution process for linear systems can be enhanced in speed or accuracy by employing mixed-precision iterative refinement. Most recent work has focused on dense systems. We investigate the potential of mixed-precision iterative refinement to enhance methods for sparse systems based on approximate sparse factorizations. In doing so, we first develop a new error analysis for LU- and GMRES-based iterative refinement under a general model of LU factorization that accounts for the approximation methods typically used by modern sparse solvers, such as low-rank approximations or relaxed pivoting strategies. We then provide a detailed performance analysis of both the execution time and memory consumption of different algorithms, based on a selected set of iterative refinement variants and approximate sparse factorizations. Our performance study uses the multifrontal solver MUMPS, which can exploit block low-rank factorization and static pivoting. We evaluate the performance of the algorithms on large, sparse problems coming from a variety of real-life and industrial applications showing that mixed-precision iterative refinement combined with approximate sparse factorization can lead to considerable reductions of both the time and memory consumption.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"74 ","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Newly Released Capabilities in the Distributed-Memory SuperLU Sparse Direct Solver","authors":"Xiaoye S. Li, Paul Lin, Yang Liu, Piyush Sao","doi":"https://dl.acm.org/doi/10.1145/3577197","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3577197","url":null,"abstract":"<p>We present the new features available in the recent release of <monospace>SuperLU_DIST</monospace>, Version 8.1.1. <monospace>SuperLU_DIST</monospace> is a distributed-memory parallel sparse direct solver. The new features include (1) a 3D communication-avoiding algorithm framework that trades off inter-process communication for selective memory duplication, (2) multi-GPU support for both NVIDIA GPUs and AMD GPUs, and (3) mixed-precision routines that perform single-precision LU factorization and double-precision iterative refinement. Apart from the algorithm improvements, we also modernized the software build system to use CMake and Spack package installation tools to simplify the installation procedure. Throughout the article, we describe in detail the pertinent performance-sensitive parameters associated with each new algorithmic feature, show how they are exposed to the users, and give general guidance of how to set these parameters. We illustrate that the solver’s performance both in time and memory can be greatly improved after systematic tuning of the parameters, depending on the input sparse matrix and underlying hardware.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"25 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}