{"title":"Algorithm XXX: Concurrent Alternating Least Squares for multiple simultaneous Canonical Polyadic Decompositions","authors":"C. Psarras, L. Karlsson, R. Bro, P. Bientinesi","doi":"10.1145/3519383","DOIUrl":"https://doi.org/10.1145/3519383","url":null,"abstract":"Tensor decompositions, such as CANDECOMP/PARAFAC (CP), are widely used in a variety of applications, such as chemometrics, signal processing, and machine learning. A broadly used method for computing such decompositions relies on the Alternating Least Squares (ALS) algorithm. When the number of components is small, regardless of its implementation, ALS exhibits low arithmetic intensity, which severely hinders its performance and makes GPU offloading ineffective. We observe that, in practice, experts often have to compute multiple decompositions of the same tensor, each with a small number of components (typically fewer than 20), to ultimately find the best ones to use for the application at hand. In this paper, we illustrate how multiple decompositions of the same tensor can be fused together at the algorithmic level to increase the arithmetic intensity. Therefore, it becomes possible to make efficient use of GPUs for further speedups; at the same time the technique is compatible with many enhancements typically used in ALS, such as line search, extrapolation, and non-negativity constraints. We introduce the Concurrent ALS algorithm and library, which offers an interface to MATLAB, and a mechanism to effectively deal with the issue that decompositions complete at different times. Experimental results on artificial and real datasets demonstrate a shorter time to completion due to increased arithmetic intensity.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2022-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41347702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Algorithm xxx: Restoration of function by integrals with cubic integral smoothing spline in R","authors":"Yu. D. Korablev","doi":"10.1145/3519384","DOIUrl":"https://doi.org/10.1145/3519384","url":null,"abstract":"\u0000 In this paper, a cubic integral smoothing spline with roughness penalty for restoring a function by integrals is described. A mathematical method for building such a spline is described in detail. The method is based on cubic integral spline with a penalty function, which minimizes the sum of squares of the difference between the observed integrals of the unknown function and the integrals of the spline being constructed, plus an additional penalty for the nonlinearity (roughness) of the spline. This method has a matrix form, and this paper shows in detail how to fill in each matrix. The parameter\u0000 α\u0000 governs the desired smoothness of the restored function. Spline knots can be chosen independently of observations, and a weight can be defined for each observation for more control over the resulting spline shape. An implementation in the R language as function\u0000 int_spline\u0000 is given. The function\u0000 int_spline\u0000 is easy to use, with all arguments completely described and corresponding examples given. An example of the application of the method in rare event analysis and forecasting is given.\u0000","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2022-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46245619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
N. Heavner, F. D. Igual, G. Quintana-Ortí, P.G. Martinsson
{"title":"Algorithm xxx: Efficient algorithms for computing a rank-revealing UTV factorization on parallel computing architectures","authors":"N. Heavner, F. D. Igual, G. Quintana-Ortí, P.G. Martinsson","doi":"10.1145/3507466","DOIUrl":"https://doi.org/10.1145/3507466","url":null,"abstract":"\u0000 The randomized singular value decomposition (RSVD) is by now a well established technique for efficiently computing an approximate singular value decomposition of a matrix. Building on the ideas that underpin the RSVD, the recently proposed algorithm “randUTV” computes a\u0000 full\u0000 factorization of a given matrix that provides low-rank approximations with near-optimal error. Because the bulk of\u0000 randUTV\u0000 is cast in terms of communication-efficient operations like matrix-matrix multiplication and unpivoted QR factorizations, it is faster than competing rank-revealing factorization methods like column-pivoted QR in most high performance computational settings. In this article, optimized\u0000 randUTV\u0000 implementations are presented for both shared-memory and distributed-memory computing environments. For shared memory,\u0000 randUTV\u0000 is redesigned in terms of an\u0000 algorithm-by-blocks\u0000 that, together with a runtime task scheduler, eliminates bottlenecks from data synchronization points to achieve acceleration over the standard\u0000 blocked algorithm\u0000 , based on a purely fork-join approach. The distributed-memory implementation is based on the ScaLAPACK library. The performances of our new codes compare favorably with competing factorizations available on both shared-memory and distributed-memory architectures.\u0000","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2022-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43296455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"pylspack: Parallel Algorithms and Data Structures for Sketching, Column Subset Selection, Regression, and Leverage Scores","authors":"Aleksandros Sobczyk, Efstratios Gallopoulos","doi":"10.1145/3555370","DOIUrl":"https://doi.org/10.1145/3555370","url":null,"abstract":"We present parallel algorithms and data structures for three fundamental operations in Numerical Linear Algebra: (i) Gaussian and CountSketch random projections and their combination, (ii) computation of the Gram matrix, and (iii) computation of the squared row norms of the product of two matrices, with a special focus on “tall-and-skinny” matrices, which arise in many applications. We provide a detailed analysis of the ubiquitous CountSketch transform and its combination with Gaussian random projections, accounting for memory requirements, computational complexity and workload balancing. We also demonstrate how these results can be applied to column subset selection, least squares regression and leverage scores computation. These tools have been implemented in pylspack, a publicly available Python package1 whose core is written in C++ and parallelized with OpenMP and that is compatible with standard matrix data structures of SciPy and NumPy. Extensive numerical experiments indicate that the proposed algorithms scale well and significantly outperform existing libraries for tall-and-skinny matrices.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2022-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43272294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Algorithm 1035: A Gradient-based Implementation of the Polyhedral Active Set Algorithm","authors":"W. Hager, Hongchao Zhang","doi":"10.1145/3583559","DOIUrl":"https://doi.org/10.1145/3583559","url":null,"abstract":"The Polyhedral Active Set Algorithm (PASA) is designed to optimize a general nonlinear function over a polyhedron. Phase one of the algorithm is a nonmonotone gradient projection algorithm, while phase two is an active set algorithm that explores faces of the constraint polyhedron. A gradient-based implementation is presented, where a projected version of the conjugate gradient algorithm is employed in phase two. Asymptotically, only phase two is performed. Comparisons are given with IPOPT using polyhedral-constrained problems from CUTEst and the Maros/Meszaros quadratic programming test set.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2022-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49613741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ki-tae Kim, U. Villa, M. Parno, Y. Marzouk, O. Ghattas, N. Petra
{"title":"hIPPYlib-MUQ: A Bayesian Inference Software Framework for Integration of Data with Complex Predictive Models under Uncertainty","authors":"Ki-tae Kim, U. Villa, M. Parno, Y. Marzouk, O. Ghattas, N. Petra","doi":"10.1145/3580278","DOIUrl":"https://doi.org/10.1145/3580278","url":null,"abstract":"Bayesian inference provides a systematic framework for integration of data with mathematical models to quantify the uncertainty in the solution of the inverse problem. However, the solution of Bayesian inverse problems governed by complex forward models described by partial differential equations (PDEs) remains prohibitive with black-box Markov chain Monte Carlo (MCMC) methods. We present hIPPYlib-MUQ, an extensible and scalable software framework that contains implementations of state-of-the art algorithms aimed to overcome the challenges of high-dimensional, PDE-constrained Bayesian inverse problems. These algorithms accelerate MCMC sampling by exploiting the geometry and intrinsic low-dimensionality of parameter space via derivative information and low rank approximation. The software integrates two complementary open-source software packages, hIPPYlib and MUQ. hIPPYlib solves PDE-constrained inverse problems using automatically-generated adjoint-based derivatives, but it lacks full Bayesian capabilities. MUQ provides a spectrum of powerful Bayesian inversion models and algorithms, but expects forward models to come equipped with gradients and Hessians to permit large-scale solution. By combining these two complementary libraries, we created a robust, scalable, and efficient software framework that realizes the benefits of each and allows us to tackle complex large-scale Bayesian inverse problems across a broad spectrum of scientific and engineering disciplines. To illustrate the capabilities of hIPPYlib-MUQ, we present a comparison of a number of MCMC methods available in the integrated software on several high-dimensional Bayesian inverse problems. These include problems characterized by both linear and nonlinear PDEs, various noise models, and different parameter dimensions. The results demonstrate that large (∼ 50×) speedups over conventional black box and gradient-based MCMC algorithms can be obtained by exploiting Hessian information (from the log-posterior), underscoring the power of the integrated hIPPYlib-MUQ framework.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48074933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Computing with B-series","authors":"D. Ketcheson, Hendrik Ranocha","doi":"10.1145/3573384","DOIUrl":"https://doi.org/10.1145/3573384","url":null,"abstract":"We present BSeries.jl, a Julia package for the computation and manipulation of B-series, which are a versatile theoretical tool for understanding and designing discretizations of differential equations. We give a short introduction to the theory of B-series and associated concepts and provide examples of their use, including method composition and backward error analysis. The associated software is highly performant and makes it possible to work with B-series of high order.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2021-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48570599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Computational Graphs for Matrix Functions","authors":"E. Jarlebring, M. Fasi, Emil Ringh","doi":"10.1145/3568991","DOIUrl":"https://doi.org/10.1145/3568991","url":null,"abstract":"Many numerical methods for evaluating matrix functions can be naturally viewed as computational graphs. Rephrasing these methods as directed acyclic graphs (DAGs) is a particularly effective approach to study existing techniques, improve them, and eventually derive new ones. The accuracy of these matrix techniques can be characterized by the accuracy of their scalar counterparts, thus designing algorithms for matrix functions can be regarded as a scalar-valued optimization problem. The derivatives needed during the optimization can be calculated automatically by exploiting the structure of the DAG in a fashion analogous to backpropagation. This article describes GraphMatFun.jl, a Julia package that offers the means to generate and manipulate computational graphs, optimize their coefficients, and generate Julia, MATLAB, and C code to evaluate them efficiently at a matrix argument. The software also provides tools to estimate the accuracy of a graph-based algorithm and thus obtain numerically reliable methods. For the exponential, for example, using a particular form (degree-optimal) of polynomials produces implementations that in many cases are cheaper, in terms of computational cost, than the Padé-based techniques typically used in mathematical software. The optimized graphs and the corresponding generated code are available online.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2021-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44753027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Geometric Multigrid Method for Space-Time Finite Element Discretizations of the Navier–Stokes Equations and its Application to 3D Flow Simulation","authors":"Mathias Anselmann, M. Bause","doi":"10.1145/3582492","DOIUrl":"https://doi.org/10.1145/3582492","url":null,"abstract":"We present a parallelized geometric multigrid (GMG) method, based on the cell-based Vanka smoother, for higher order space-time finite element methods (STFEM) to the incompressible Navier–Stokes equations. The STFEM is implemented as a time marching scheme. The GMG solver is applied as a preconditioner for generalized minimal residual iterations. Its performance properties are demonstrated for 2D and 3D benchmarks of flow around a cylinder. The key ingredients of the GMG approach are the construction of the local Vanka smoother over all degrees of freedom in time of the respective subinterval and its efficient application. For this, data structures that store pre-computed cell inverses of the Jacobian for all hierarchical levels and require only a reasonable amount of memory overhead are generated. The GMG method is built for the deal.II finite element library. The concepts are flexible and can be transferred to similar software platforms.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2021-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49403692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}