{"title":"Distributed ℋ2-Matrices for Boundary Element Methods","authors":"S. Börm","doi":"10.1145/3582494","DOIUrl":"https://doi.org/10.1145/3582494","url":null,"abstract":"Standard discretization techniques for boundary integral equations, e.g., the Galerkin boundary element method, lead to large densely populated matrices that require fast and efficient compression techniques like the fast multipole method or hierarchical matrices. If the underlying mesh is very large, running the corresponding algorithms on a distributed computer is attractive, e.g., since distributed computers frequently are cost-effective and offer a high accumulated memory bandwidth. Compared to the closely related particle methods, for which distributed algorithms are well-established, the Galerkin discretization poses a challenge, since the supports of the basis functions influence the block structure of the matrix and therefore the flow of data in the corresponding algorithms. This article introduces distributed ℋ2-matrices, a class of hierarchical matrices that is closely related to fast multipole methods and particularly well-suited for distributed computing. While earlier efforts required the global tree structure of the ℋ2-matrix to be stored in every node of the distributed system, the new approach needs only local multilevel information that can be obtained via a simple distributed algorithm, allowing us to scale to significantly larger systems. Experiments show that this approach can handle very large meshes with more than 130 million triangles efficiently.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":" ","pages":"1 - 21"},"PeriodicalIF":2.7,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43867034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Algorithm 10xx: SuiteSparse:GraphBLAS: parallel graph algorithms in the language of sparse linear algebra","authors":"Timothy A. Davis","doi":"https://dl.acm.org/doi/10.1145/3577195","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3577195","url":null,"abstract":"<p>SuiteSparse:GraphBLAS is a full parallel implementation of the GraphBLAS standard, which defines a set of sparse matrix operations on an extended algebra of semirings using an almost unlimited variety of operators and types. When applied to sparse adjacency matrices, these algebraic operations are equivalent to computations on graphs. A description of the parallel implementation of SuiteSparse:GraphBLAS is given, including its novel parallel algorithms for sparse matrix multiply, addition, element-wise multiply, submatrix extraction and assignment, and the GraphBLAS mask/accumulator operation. Its performance is illustrated by solving the graph problems in the GAP Benchmark and by comparing it with other sparse matrix libraries.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"1 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138543465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Algorithm 1037: SuiteSparse:GraphBLAS: Parallel Graph Algorithms in the Language of Sparse Linear Algebra","authors":"T. Davis","doi":"10.1145/3577195","DOIUrl":"https://doi.org/10.1145/3577195","url":null,"abstract":"SuiteSparse:GraphBLAS is a full parallel implementation of the GraphBLAS standard, which defines a set of sparse matrix operations on an extended algebra of semirings using an almost unlimited variety of operators and types. When applied to sparse adjacency matrices, these algebraic operations are equivalent to computations on graphs. A description of the parallel implementation of SuiteSparse:GraphBLAS is given, including its novel parallel algorithms for sparse matrix multiply, addition, element-wise multiply, submatrix extraction and assignment, and the GraphBLAS mask/accumulator operation. Its performance is illustrated by solving the graph problems in the GAP Benchmark and by comparing it with other sparse matrix libraries.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"49 1","pages":"1 - 30"},"PeriodicalIF":2.7,"publicationDate":"2023-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44957647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Reberol, Kilian Verhetsel, F. Henrotte, D. Bommes, J. Remacle
{"title":"Robust Topological Construction of All-hexahedral Boundary Layer Meshes","authors":"M. Reberol, Kilian Verhetsel, F. Henrotte, D. Bommes, J. Remacle","doi":"10.1145/3577196","DOIUrl":"https://doi.org/10.1145/3577196","url":null,"abstract":"We present a robust technique to build a topologically optimal all-hexahedral layer on the boundary of a model with arbitrarily complex ridges and corners. The generated boundary layer mesh strictly respects the geometry of the input surface mesh, and it is optimal in the sense that the hexahedral valences of the boundary edges are as close as possible to their ideal values (local dihedral angle divided by 90°). Starting from a valid watertight surface mesh (all-quad in practice), we build a global optimization integer programming problem to minimize the mismatch between the hexahedral valences of the boundary edges and their ideal values. The formulation of the integer programming problem relies on the duality between boundary hexahedral configurations and triangulations of the disk, which we reframe in terms of integer constraints. The global problem is solved efficiently by performing combinatorial branch-and-bound searches on a series of sub-problems defined in the vicinity of complicated ridges/corners, where the local mesh topology is necessarily irregular because of the inherent constraints in hexahedral meshes. From the integer solution, we build the topology of the all-hexahedral layer, and the mesh geometry is computed by untangling/smoothing. Our approach is fully automated, topologically robust, and fast.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"49 1","pages":"1 - 32"},"PeriodicalIF":2.7,"publicationDate":"2022-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48482584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FastSpline: Automatic Generation of Interpolants for Lattice Samplings","authors":"J. Horacsek, U. Alim","doi":"10.1145/3577194","DOIUrl":"https://doi.org/10.1145/3577194","url":null,"abstract":"Interpolation is a foundational concept in scientific computing and is at the heart of many scientific visualization techniques. There is usually a tradeoff between the approximation capabilities of an interpolation scheme and its evaluation efficiency. For many applications, it is important for a user to navigate their data in real time. In practice, evaluation efficiency outweighs any incremental improvements in reconstruction fidelity. We first analyze, from a general standpoint, the use of compact piece-wise polynomial basis functions to efficiently interpolate data that is sampled on a lattice. We then detail our automatic code-generation framework on both CPU and GPU architectures. Specifically, we propose a general framework that can produce a fast evaluation scheme by analyzing the algebro-geometric structure of the convolution sum for a given lattice and basis function combination. We demonstrate the utility and generality of our framework by providing fast implementations of various box splines on the Body Centered and Face Centered Cubic lattices, as well as some non-separable box splines on the Cartesian lattice. We also provide fast implementations for certain Voronoi-splines that have not yet appeared in the literature. Finally, we demonstrate that this framework may also be used for non-Cartesian lattices in 4D.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"49 1","pages":"1 - 35"},"PeriodicalIF":2.7,"publicationDate":"2022-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47226907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jie Wang, Victor Magron, J. B. Lasserre, Ngoc Hoang Anh Mai
{"title":"CS-TSSOS: Correlative and Term Sparsity for Large-Scale Polynomial Optimization","authors":"Jie Wang, Victor Magron, J. B. Lasserre, Ngoc Hoang Anh Mai","doi":"https://dl.acm.org/doi/10.1145/3569709","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3569709","url":null,"abstract":"<p>This work proposes a new moment-SOS hierarchy, called <i>CS-TSSOS</i>, for solving large-scale sparse polynomial optimization problems. Its novelty is to exploit simultaneously <i>correlative sparsity</i> and <i>term sparsity</i> by combining advantages of two existing frameworks for sparse polynomial optimization. The former is due to Waki et al. [40] while the latter was initially proposed by Wang et al. [42] and later exploited in the TSSOS hierarchy [46, 47]. In doing so we obtain CS-TSSOS—a two-level hierarchy of semidefinite programming relaxations with (i) the crucial property to involve blocks of SDP matrices and (ii) the guarantee of convergence to the global optimum under certain conditions. We demonstrate its efficiency and scalability on several large-scale instances of the celebrated Max-Cut problem and the important industrial optimal power flow problem, involving up to six thousand variables and tens of thousands of constraints.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"289 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2022-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138537826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Newly Released Capabilities in the Distributed-Memory SuperLU Sparse Direct Solver","authors":"X. Li, Paul B. S. Lin, Yang Liu, Piyush Sao","doi":"10.1145/3577197","DOIUrl":"https://doi.org/10.1145/3577197","url":null,"abstract":"We present the new features available in the recent release of SuperLU_DIST, Version 8.1.1. SuperLU_DIST is a distributed-memory parallel sparse direct solver. The new features include (1) a 3D communication-avoiding algorithm framework that trades off inter-process communication for selective memory duplication, (2) multi-GPU support for both NVIDIA GPUs and AMD GPUs, and (3) mixed-precision routines that perform single-precision LU factorization and double-precision iterative refinement. Apart from the algorithm improvements, we also modernized the software build system to use CMake and Spack package installation tools to simplify the installation procedure. Throughout the article, we describe in detail the pertinent performance-sensitive parameters associated with each new algorithmic feature, show how they are exposed to the users, and give general guidance of how to set these parameters. We illustrate that the solver’s performance both in time and memory can be greatly improved after systematic tuning of the parameters, depending on the input sparse matrix and underlying hardware.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"49 1","pages":"1 - 20"},"PeriodicalIF":2.7,"publicationDate":"2022-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45727625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"pylspack: Parallel Algorithms and Data Structures for Sketching, Column Subset Selection, Regression, and Leverage Scores","authors":"Aleksandros Sobczyk, Efstratios Gallopoulos","doi":"https://dl.acm.org/doi/10.1145/3555370","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3555370","url":null,"abstract":"<p>We present parallel algorithms and data structures for three fundamental operations in Numerical Linear Algebra: (i) Gaussian and CountSketch random projections and their combination, (ii) computation of the Gram matrix, and (iii) computation of the squared row norms of the product of two matrices, with a special focus on “tall-and-skinny” matrices, which arise in many applications. We provide a detailed analysis of the ubiquitous CountSketch transform and its combination with Gaussian random projections, accounting for memory requirements, computational complexity and workload balancing. We also demonstrate how these results can be applied to column subset selection, least squares regression and leverage scores computation. These tools have been implemented in <monospace>pylspack</monospace>, a publicly available Python package<sup>1</sup> whose core is written in C++ and parallelized with OpenMP and that is compatible with standard matrix data structures of SciPy and NumPy. Extensive numerical experiments indicate that the proposed algorithms scale well and significantly outperform existing libraries for tall-and-skinny matrices.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"9 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2022-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138537800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ngoc Hoang Anh Mai, J. B. Lasserre, Victor Magron, Jie Wang
{"title":"Exploiting Constant Trace Property in Large-scale Polynomial Optimization","authors":"Ngoc Hoang Anh Mai, J. B. Lasserre, Victor Magron, Jie Wang","doi":"https://dl.acm.org/doi/10.1145/3555309","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3555309","url":null,"abstract":"<p>We prove that every semidefinite moment relaxation of a polynomial optimization problem (POP) with a ball constraint can be reformulated as a semidefinite program involving a matrix with constant trace property (CTP). As a result, such moment relaxations can be solved efficiently by first-order methods that exploit CTP, e.g., the conditional gradient-based augmented Lagrangian method. We also extend this CTP-exploiting framework to large-scale POPs with different sparsity structures. The efficiency and scalability of our framework are illustrated on some moment relaxations for various randomly generated POPs, especially second-order moment relaxations for quadratically constrained quadratic programs.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"48 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2022-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138537802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Normal Form Algorithm for Tensor Rank Decomposition","authors":"Simon Telen, Nick Vannieuwenhoven","doi":"https://dl.acm.org/doi/10.1145/3555369","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3555369","url":null,"abstract":"<p>We propose a new numerical algorithm for computing the tensor rank decomposition or canonical polyadic decomposition of higher-order tensors subject to a rank and genericity constraint. Reformulating this computational problem as a system of polynomial equations allows us to leverage recent numerical linear algebra tools from computational algebraic geometry. We characterize the complexity of our algorithm in terms of an algebraic property of this polynomial system—the multigraded regularity. We prove effective bounds for many tensor formats and ranks, which are of independent interest for overconstrained polynomial system solving. Moreover, we conjecture a general formula for the multigraded regularity, yielding a (parameterized) polynomial time complexity for the tensor rank decomposition problem in the considered setting. Our numerical experiments show that our algorithm can outperform state-of-the-art numerical algorithms by an order of magnitude in terms of accuracy, computation time, and memory consumption.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"37 ","pages":""},"PeriodicalIF":2.7,"publicationDate":"2022-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}