{"title":"Algorithm 1017","authors":"Pavel Škrabánek, N. Martínková","doi":"10.1145/3451389","DOIUrl":"https://doi.org/10.1145/3451389","url":null,"abstract":"Fuzzy regression provides an alternative to statistical regression when the model is indefinite, the relationships between model parameters are vague, the sample size is low, or the data are hierarchically structured. Such cases allow to consider the choice of a regression model based on the fuzzy set theory. In fuzzyreg, we implement fuzzy linear regression methods that differ in the expectations of observational data types, outlier handling, and parameter estimation method. We provide a wrapper function that prepares data for fitting fuzzy linear models with the respective methods from a syntax established in R for fitting regression models. The function fuzzylm thus provides a novel functionality for R through standardized operations with fuzzy numbers. Additional functions allow for conversion of real-value variables to be fuzzy numbers, printing, summarizing, model plotting, and calculation of model predictions from new data using supporting functions that perform arithmetic operations with triangular fuzzy numbers. Goodness of fit and total error of the fit measures allow model comparisons. The package contains a dataset named bats with measurements of temperatures of hibernating bats and the mean annual surface temperature reflecting the climate at the sampling sites. The predictions from fuzzy linear models fitted to this dataset correspond well to the observed biological phenomenon. Fuzzy linear regression has great potential in predictive modeling where the data structure prevents statistical analysis and the modeled process exhibits inherent fuzziness.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"37 1","pages":"1 - 18"},"PeriodicalIF":0.0,"publicationDate":"2021-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77061136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Abdelfattah, Timothy B. Costa, J. Dongarra, M. Gates, A. Haidar, S. Hammarling, N. Higham, J. Kurzak, P. Luszczek, S. Tomov, M. Zounon
{"title":"A Set of Batched Basic Linear Algebra Subprograms and LAPACK Routines","authors":"A. Abdelfattah, Timothy B. Costa, J. Dongarra, M. Gates, A. Haidar, S. Hammarling, N. Higham, J. Kurzak, P. Luszczek, S. Tomov, M. Zounon","doi":"10.1145/3431921","DOIUrl":"https://doi.org/10.1145/3431921","url":null,"abstract":"This article describes a standard API for a set of Batched Basic Linear Algebra Subprograms (Batched BLAS or BBLAS). The focus is on many independent BLAS operations on small matrices that are grouped together and processed by a single routine, called a Batched BLAS routine. The matrices are grouped together in uniformly sized groups, with just one group if all the matrices are of equal size. The aim is to provide more efficient, but portable, implementations of algorithms on high-performance many-core platforms. These include multicore and many-core CPU processors, GPUs and coprocessors, and other hardware accelerators with floating-point compute facility. As well as the standard types of single and double precision, we also include half and quadruple precision in the standard. In particular, half precision is used in many very large scale applications, such as those associated with machine learning.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"582 1","pages":"1 - 23"},"PeriodicalIF":0.0,"publicationDate":"2021-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78931136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HIFIR: Hybrid Incomplete Factorization with Iterative Refinement for Preconditioning Ill-Conditioned and Singular Systems","authors":"Qiao Chen, X. Jiao","doi":"10.1145/3536165","DOIUrl":"https://doi.org/10.1145/3536165","url":null,"abstract":"We introduce a software package called Hybrid Incomplete Factorization with Iterative Refinement (HIFIR) for preconditioning sparse, unsymmetric, ill-conditioned, and potentially singular systems. HIFIR computes a hybrid incomplete factorization (HIF), which combines multilevel incomplete LU factorization with a truncated, rank-revealing QR (RRQR) factorization on the final Schur complement. This novel hybridization is based on the new theory of ϵ-accurate approximate generalized inverse (AGI). It enables near-optimal preconditioners for consistent systems and enables flexible GMRES to solve inconsistent systems when coupled with iterative refinement. In this article, we focus on some practical algorithmic and software issues of HIFIR. In particular, we introduce a new inverse-based rook pivoting (IBRP) into ILU, which improves the robustness and the overall efficiency for some ill-conditioned systems by significantly reducing the size of the final Schur complement for some systems. We also describe the software design of HIFIR in terms of its efficient data structures for supporting rook pivoting in a multilevel setting, its template-based generic programming interfaces for mixed-precision real and complex values in C++, and its user-friendly high-level interfaces in MATLAB and Python. We demonstrate the effectiveness of HIFIR for ill-conditioned or singular systems arising from several applications, including the Helmholtz equation, linear elasticity, stationary incompressible Navier–Stokes (INS) equations, and time-dependent advection-diffusion equation.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"110 1","pages":"1 - 33"},"PeriodicalIF":0.0,"publicationDate":"2021-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86234848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Crum, Cyrus Cheng, D. Ham, L. Mitchell, R. Kirby, J. Levine, A. Gillette
{"title":"Bringing Trimmed Serendipity Methods to Computational Practice in Firedrake","authors":"J. Crum, Cyrus Cheng, D. Ham, L. Mitchell, R. Kirby, J. Levine, A. Gillette","doi":"10.1145/3490485","DOIUrl":"https://doi.org/10.1145/3490485","url":null,"abstract":"We present an implementation of the trimmed serendipity finite element family, using the open-source finite element package Firedrake. The new elements can be used seamlessly within the software suite for problems requiring H1, H(curl), or H(div)-conforming elements on meshes of squares or cubes. To test how well trimmed serendipity elements perform in comparison to traditional tensor product elements, we perform a sequence of numerical experiments including the primal Poisson, mixed Poisson, and Maxwell cavity eigenvalue problems. Overall, we find that the trimmed serendipity elements converge, as expected, at the same rate as the respective tensor product elements, while being able to offer significant savings in the time or memory required to solve certain problems.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"48 1","pages":"1 - 19"},"PeriodicalIF":0.0,"publicationDate":"2021-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86942098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Goran Flegar, H. Anzt, T. Cojean, E. S. Quintana‐Ortí
{"title":"Adaptive Precision Block-Jacobi for High Performance Preconditioning in the Ginkgo Linear Algebra Software","authors":"Goran Flegar, H. Anzt, T. Cojean, E. S. Quintana‐Ortí","doi":"10.1145/3441850","DOIUrl":"https://doi.org/10.1145/3441850","url":null,"abstract":"The use of mixed precision in numerical algorithms is a promising strategy for accelerating scientific applications. In particular, the adoption of specialized hardware and data formats for low-precision arithmetic in high-end GPUs (graphics processing units) has motivated numerous efforts aiming at carefully reducing the working precision in order to speed up the computations. For algorithms whose performance is bound by the memory bandwidth, the idea of compressing its data before (and after) memory accesses has received considerable attention. One idea is to store an approximate operator–like a preconditioner–in lower than working precision hopefully without impacting the algorithm output. We realize the first high-performance implementation of an adaptive precision block-Jacobi preconditioner which selects the precision format used to store the preconditioner data on-the-fly, taking into account the numerical properties of the individual preconditioner blocks. We implement the adaptive block-Jacobi preconditioner as production-ready functionality in the Ginkgo linear algebra library, considering not only the precision formats that are part of the IEEE standard, but also customized formats which optimize the length of the exponent and significand to the characteristics of the preconditioner blocks. Experiments run on a state-of-the-art GPU accelerator show that our implementation offers attractive runtime savings.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"39 1","pages":"1 - 28"},"PeriodicalIF":0.0,"publicationDate":"2021-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84707846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Algorithm 1016","authors":"Jens Hahne, S. Friedhoff, M. Bolten","doi":"10.1145/3446979","DOIUrl":"https://doi.org/10.1145/3446979","url":null,"abstract":"In this article, we introduce the Python framework PyMGRIT, which implements the multigrid-reduction-in-time (MGRIT) algorithm for solving (non-)linear systems arising from the discretization of time-dependent problems. The MGRIT algorithm is a reduction-based iterative method that allows parallel-in-time simulations, i.e., calculating multiple time steps simultaneously in a simulation, using a time-grid hierarchy. The PyMGRIT framework includes many different variants of the MGRIT algorithm, ranging from different multigrid cycle types and relaxation schemes, various coarsening strategies, including time-only and space-time coarsening, and the ability to utilize different time integrators on different levels in the multigrid hierachy. The comprehensive documentation with tutorials and many examples and the fully documented code allow an easy start into the work with the package. The functionality of the code is ensured by automated serial and parallel tests using continuous integration. PyMGRIT supports serial runs suitable for prototyping and testing of new approaches, as well as parallel runs using the Message Passing Interface (MPI). In this manuscript, we describe the implementation of the MGRIT algorithm in PyMGRIT and present the usage from both a user and a developer point of view. Three examples illustrate different aspects of the package itself, especially running tests with pure time parallelism, as well as space-time parallelism through the coupling of PyMGRIT with PETSc or Firedrake.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"26 1","pages":"1 - 22"},"PeriodicalIF":0.0,"publicationDate":"2021-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73177657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Algorithm 1015","authors":"S. Guthe, D. Thuerck","doi":"10.1145/3442348","DOIUrl":"https://doi.org/10.1145/3442348","url":null,"abstract":"We present a new algorithm for solving the dense linear (sum) assignment problem and an efficient, parallel implementation that is based on the successive shortest path algorithm. More specifically, we introduce the well-known epsilon scaling approach used in the Auction algorithm to approximate the dual variables of the successive shortest path algorithm prior to solving the assignment problem to limit the complexity of the path search. This improves the runtime by several orders of magnitude for hard-to-solve real-world problems, making the runtime virtually independent of how hard the assignment is to find. In addition, our approach allows for using accelerators and/or external compute resources to calculate individual rows of the cost matrix. This enables us to solve problems that are larger than what has been reported in the past, including the ability to efficiently solve problems whose cost matrix exceeds the available systems memory. To our knowledge, this is the first implementation that is able to solve problems with more than one trillion arcs in less than 100 hours on a single machine.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"50 1","pages":"1 - 27"},"PeriodicalIF":0.0,"publicationDate":"2021-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74209025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improved Arithmetic of Complex Fans","authors":"G. Soylu","doi":"10.1145/3434400","DOIUrl":"https://doi.org/10.1145/3434400","url":null,"abstract":"Complex fans are sets of complex numbers whose magnitudes and angles range in closed intervals. The fact that the sum of two fans is a disordered shape gives rise to the need for computational methods to find the minimal enclosing fan. Cases where the sum of two fans contains the origin of the complex plane as a boundary point are of special interest. The result of the addition is then enclosed by circles in current methods, but under certain circumstances this turns out to be an overestimate. The focus of this article is the diagnosis and treatment of such cases.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"326 1","pages":"1 - 10"},"PeriodicalIF":0.0,"publicationDate":"2021-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79718137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Supporting Mixed-domain Mixed-precision Matrix Multiplication within the BLIS Framework","authors":"F. V. Zee, D. Parikh, R. V. D. Geijn","doi":"10.1145/3402225","DOIUrl":"https://doi.org/10.1145/3402225","url":null,"abstract":"We approach the problem of implementing mixed-datatype support within the general matrix multiplication (gemm) operation of the BLAS-like Library Instantiation Software framework, whereby each matrix operand A, B, and C may be stored as single- or double-precision real or complex values. Another factor of complexity, whereby the matrix product and accumulation are allowed to take place in a precision different from the storage precisions of either A or B, is also discussed. We first break the problem into orthogonal dimensions, considering the mixing of domains separately from mixing precisions. Support for all combinations of matrix operands stored in either the real or complex domain is mapped out by enumerating the cases and describing an implementation approach for each. Supporting all combinations of storage and computation precisions is handled by typecasting the matrices at key stages of the computation—during packing and/or accumulation, as needed. Several optional optimizations are also documented. Performance results gathered on a 56-core Marvell ThunderX2 and a 52-core Intel Xeon Platinum demonstrate that high performance is mostly preserved, with modest slowdowns incurred from unavoidable typecast instructions. The mixed-datatype implementation confirms that combinatorial intractability is avoided, with the framework relying on only two assembly microkernels to implement 128 datatype combinations.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"90 1","pages":"1 - 26"},"PeriodicalIF":0.0,"publicationDate":"2021-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79932608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"QPPAL: A Two-phase Proximal Augmented Lagrangian Method for High-dimensional Convex Quadratic Programming Problems","authors":"Ling Liang, Xudong Li, Defeng Sun, K. Toh","doi":"10.1145/3476571","DOIUrl":"https://doi.org/10.1145/3476571","url":null,"abstract":"In this article, we aim to solve high-dimensional convex quadratic programming (QP) problems with a large number of quadratic terms, linear equality, and inequality constraints. To solve the targeted QP problem to a desired accuracy efficiently, we consider the restricted-Wolfe dual problem and develop a two-phase Proximal Augmented Lagrangian method (QPPAL), with Phase I to generate a reasonably good initial point to warm start Phase II to obtain an accurate solution efficiently. More specifically, in Phase I, based on the recently developed symmetric Gauss-Seidel (sGS) decomposition technique, we design a novel sGS-based semi-proximal augmented Lagrangian method for the purpose of finding a solution of low to medium accuracy. Then, in Phase II, a proximal augmented Lagrangian algorithm is proposed to obtain a more accurate solution efficiently. Extensive numerical results evaluating the performance of QPPAL against existing state-of-the-art solvers Gurobi, OSQP, and QPALM are presented to demonstrate the high efficiency and robustness of our proposed algorithm for solving various classes of large-scale convex QP problems. The MATLAB implementation of the software package QPPAL is available at https://blog.nus.edu.sg/mattohkc/softwares/qppal/.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"41 1","pages":"1 - 27"},"PeriodicalIF":0.0,"publicationDate":"2021-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79898695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}