{"title":"Algorithm 1009","authors":"S. Hawkins","doi":"10.1145/3381537","DOIUrl":"https://doi.org/10.1145/3381537","url":null,"abstract":"MieSolver provides an efficient solver for the problem of wave propagation through a heterogeneous configuration of nonidentical circular cylinders. MieSolver allows great flexibility in the physical properties of each cylinder, and the cylinders may have opaque or penetrable cores, as well as an arbitrary number of penetrable layers. The wave propagation is governed by the two-dimensional Helmholtz equation and models electromagnetic, acoustic, and elastic waves. The solver is based on the Mie series solution for scattering by a single circular cylinder and hence is numerically stable and highly accurate. We demonstrate the accuracy of our software with extensive numerical experiments over a wide range of frequencies (about five orders of magnitude) and up to 60 cylinders.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"9 1","pages":"1 - 28"},"PeriodicalIF":0.0,"publicationDate":"2020-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75165936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Abhyankar, G. Betrie, D. Maldonado, L. McInnes, Barry F. Smith, Hong Zhang
{"title":"PETSc DMNetwork","authors":"S. Abhyankar, G. Betrie, D. Maldonado, L. McInnes, Barry F. Smith, Hong Zhang","doi":"10.1145/3344587","DOIUrl":"https://doi.org/10.1145/3344587","url":null,"abstract":"We present DMNetwork, a high-level package included in the PETSc library for the simulation of multiphysics phenomena over large-scale networked systems. The library aims at applications that have networked structures such as those in electrical, gas, and water distribution systems. DMNetwork provides data and topology management, parallelization for multiphysics systems over a network, and hierarchical and composable solvers to exploit the problem structure. DMNetwork eases the simulation development cycle by providing the necessary infrastructure through simple abstractions to define and query the network components. This article presents the design of DMNetwork, illustrates its user interface, and demonstrates its ability to solve multiphysics systems, such as an electric circuit, a network of power grid and water subnetworks, and transient hydraulic systems over large networks with more than 2 billion variables on extreme-scale computers using up to 30,000 processors.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"36 1","pages":"1 - 24"},"PeriodicalIF":0.0,"publicationDate":"2020-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81224257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exactly Computing the Tail of the Poisson-Binomial Distribution","authors":"N. Peres, Andrew Lee, U. Keich","doi":"10.1145/3460774","DOIUrl":"https://doi.org/10.1145/3460774","url":null,"abstract":"We present ShiftConvolvePoibin, a fast exact method to compute the tail of a Poisson-binomial distribution (PBD). Our method employs an exponential shift to retain its accuracy when computing a tail probability, and in practice we find that it is immune to the significant relative errors that other methods, exact or approximate, can suffer from when computing very small tail probabilities of the PBD. The accompanying R package is also competitive with the fastest implementations for computing the entire PBD.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"179 1","pages":"1 - 19"},"PeriodicalIF":0.0,"publicationDate":"2020-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80105643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Davis, W. Hager, Scott P. Kolodziej, S. Yeralan
{"title":"Algorithm 1003","authors":"T. Davis, W. Hager, Scott P. Kolodziej, S. Yeralan","doi":"10.1145/3337792","DOIUrl":"https://doi.org/10.1145/3337792","url":null,"abstract":"Partitioning graphs is a common and useful operation in many areas, from parallel computing to VLSI design to sparse matrix algorithms. In this article, we introduce Mongoose, a multilevel hybrid graph partitioning algorithm and library. Building on previous work in multilevel partitioning frameworks and combinatoric approaches, we introduce novel stall-reducing and stall-free coarsening strategies, as well as an efficient hybrid algorithm leveraging (1) traditional combinatoric methods and (2) continuous quadratic programming formulations. We demonstrate how this new hybrid algorithm outperforms either strategy in isolation, and we also compare Mongoose to METIS and demonstrate its effectiveness on large and social networking (power law) graphs.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"45 1","pages":"1 - 18"},"PeriodicalIF":0.0,"publicationDate":"2020-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77674132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tao Cui, W. Leng, Huaqing Liu, Linbo Zhang, Weiying Zheng
{"title":"High-order Numerical Quadratures in a Tetrahedron with an Implicitly Defined Curved Interface","authors":"Tao Cui, W. Leng, Huaqing Liu, Linbo Zhang, Weiying Zheng","doi":"10.1145/3372144","DOIUrl":"https://doi.org/10.1145/3372144","url":null,"abstract":"Given a shape regular tetrahedron and a curved surface that is defined implicitly by a nonlinear level set function and divides the tetrahedron into two sub-domains, a general-purpose, robust, and high-order numerical algorithm is proposed in this article for computing both volume integrals in the sub-domains and surface integrals on their common boundary. The algorithm uses a direct approach that decomposes 3D volume integrals or 2D surface integrals into multiple 1D integrals and computes the 1D integrals with Gaussian quadratures. It only requires finding roots of univariate nonlinear functions in given intervals and evaluating the integrand, the level set function, and the gradient of the level set function at given points. It can achieve arbitrarily high accuracy by increasing the orders of Gaussian quadratures, and it does not need extra a priori knowledge about the integrand and the level set function. The code for the algorithm is freely available in the open-source finite element toolbox Parallel Hierarchical Grid (PHG) and can serve as a basic building block for implementing 3D high-order numerical algorithms involving implicit interfaces or boundaries.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"32 1","pages":"1 - 18"},"PeriodicalIF":0.0,"publicationDate":"2020-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86541787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Jónasson, S. Sigurdsson, H. F. Yngvason, Pétur Orri Ragnarsson, Páll Melsted
{"title":"Algorithm 1005","authors":"K. Jónasson, S. Sigurdsson, H. F. Yngvason, Pétur Orri Ragnarsson, Páll Melsted","doi":"10.1145/3382191","DOIUrl":"https://doi.org/10.1145/3382191","url":null,"abstract":"A set of Fortran subroutines for reverse mode algorithmic (or automatic) differentiation of the basic linear algebra subprograms (BLAS) is presented. This is preceded by a description of the mathematical tools used to obtain the formulae of these derivatives, with emphasis on special matrices supported by the BLAS: triangular, symmetric, and band. All single and double precision BLAS derivatives have been implemented, together with the Cholesky factorization from Linear Algebra Package (LAPACK). The subroutines are written in Fortran 2003 with a Fortran 77 interface to allow use from C and C++, as well as dynamic languages such as R, Python, Matlab, and Octave. The subroutines are all implemented by calling BLAS, thereby attaining fast runtime. Timing results show derivative runtimes that are about twice those of the corresponding BLAS, in line with theory. The emphasis is on reverse mode because it is more important for the main application that we have in mind, numerical optimization. Two examples are presented, one dealing with the least squares modeling of groundwater, and the other dealing with the maximum likelihood estimation of the parameters of a vector autoregressive time series. The article contains comprehensive tables of formulae for the BLAS derivatives as well as for several non-BLAS matrix operations commonly used in optimization.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"33 1","pages":"1 - 20"},"PeriodicalIF":0.0,"publicationDate":"2020-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78416019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Carmen Arévalo, Erik Jonsson-Glans, Josefine Olander, M. S. Soto, Gustaf Söderlind
{"title":"A Software Platform for Adaptive High Order Multistep Methods","authors":"Carmen Arévalo, Erik Jonsson-Glans, Josefine Olander, M. S. Soto, Gustaf Söderlind","doi":"10.1145/3372159","DOIUrl":"https://doi.org/10.1145/3372159","url":null,"abstract":"We present a software package, Modes, offering h-adaptive and p-adaptive linear multistep methods for first order initial value problems in ordinary differential equations. The implementation is based on a new parametric, grid-independent representation of multistep methods [Arévalo and Söderlind 2017]. Parameters are supplied for over 60 methods. For nonstiff problems, all maximal order methods (p=k for explicit and p=k+1 for implicit methods) are supported. For stiff computation, implicit methods of order p=k are included. A collection of step-size controllers based on digital filters is provided, generating smooth step-size sequences offering improved computational stability. Controllers may be selected to match method and problem classes. A new system for automatic order control is also provided for designated families of multistep methods, offering simultaneous h- and p-adaptivity. Implemented as a Matlab toolbox, the software covers high order computations with linear multistep methods within a unified, generic framework. Computational experiments show that the new software is competitive and offers qualitative improvements. Modes is available for downloading and is primarily intended as a platform for developing a new generation of state-of-the-art multistep solvers, as well as for true ceteris paribus evaluation of algorithmic components. This also enables method comparisons within a single implementation environment.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"81 1","pages":"1 - 17"},"PeriodicalIF":0.0,"publicationDate":"2020-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76709535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Algorithm 1004","authors":"Jeremy Reizenstein, Benjamin Graham","doi":"10.1145/3371237","DOIUrl":"https://doi.org/10.1145/3371237","url":null,"abstract":"Iterated-integral signatures and log signatures are sequences calculated from a path that characterizes its shape. They originate from the work of K. T. Chen and have become important through Terry Lyons’s theory of differential equations driven by rough paths, which is an important developing area of stochastic analysis. They have applications in statistics and machine learning, where there can be a need to calculate finite parts of them quickly for many paths. We introduce the signature and the most basic information (displacement and signed areas) that it contains. We present algorithms for efficiently calculating these signatures. For log signatures this requires consideration of the structure of free Lie algebras. We benchmark the performance of the algorithms. The methods are implemented in C++ and released as a Python extension package, which also supports differentiation. In combination with a machine learning library (Tensorflow, PyTorch, or Theano), this allows end-to-end learning of neural networks involving signatures.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"26 1","pages":"1 - 21"},"PeriodicalIF":0.0,"publicationDate":"2020-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74320556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Strassen’s Algorithm Reloaded on GPUs","authors":"Jianyu Huang, Chenhan D. Yu, R. Geijn","doi":"10.1145/3372419","DOIUrl":"https://doi.org/10.1145/3372419","url":null,"abstract":"Conventional Graphics Processing Unit (GPU) implementations of Strassen’s algorithm (Strassen) rely on the existing high-performance matrix multiplication (gemm), trading space for time. As a result, such approaches can only achieve practical speedup for relatively large, “squarish” matrices due to the extra memory overhead, and their usages are limited due to the considerable workspace. We present novel Strassen primitives for GPUs that can be composed to generate a family of Strassen algorithms. Our algorithms utilize both the memory and thread hierarchies on GPUs, reusing shared memory and register files inherited from gemm, fusing additional operations, and avoiding extra workspace. We further exploit intra- and inter-kernel parallelism by batching, streaming, and employing atomic operations. We develop a performance model for NVIDIA Volta GPUs to select the appropriate blocking parameters and predict the performance for gemm and Strassen. Overall, our 1-level Strassen can achieve up to 1.11× speedup with a crossover point as small as 1,536 compared to cublasSgemm on a NVIDIA Tesla V100 GPU. With additional workspace, our 2-level Strassen can achieve 1.19× speedup with a crossover point at 7,680.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"54 1","pages":"1 - 22"},"PeriodicalIF":0.0,"publicationDate":"2020-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74143251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Algorithm 1006","authors":"Rémy Abergel, L. Moisan","doi":"10.1145/3365983","DOIUrl":"https://doi.org/10.1145/3365983","url":null,"abstract":"We present a computational procedure to evaluate the integral ∫yx sp-1 e-μs ds for 0 ≤ x < y ≤ +∞,μ = ±1, p> 0, which generalizes the lower (x=0) and upper (y=+∞) incomplete gamma functions. To allow for large values of x, y, and p while avoiding under/overflow issues in the standard double precision floating point arithmetic, we use an explicit normalization that is much more efficient than the classical ratio with the complete gamma function. The generalized incomplete gamma function is estimated with continued fractions, with integrations by parts, or, when x ≈ y, with the Romberg numerical integration algorithm. We show that the accuracy reached by our algorithm improves a recent state-of-the-art method by two orders of magnitude, and it is essentially optimal considering the limitations imposed by floating point arithmetic. Moreover, the admissible parameter range of our algorithm (0 ≤ p,x,y ≤ 1015) is much larger than competing algorithms, and its robustness is assessed through massive usage in an image processing application.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"18 1","pages":"1 - 24"},"PeriodicalIF":0.0,"publicationDate":"2020-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90145877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}