{"title":"hyper.deal: An Efficient, Matrix-free Finite-element Library for High-dimensional Partial Differential Equations","authors":"Peter Munch, K. Kormann, M. Kronbichler","doi":"10.1145/3469720","DOIUrl":"https://doi.org/10.1145/3469720","url":null,"abstract":"This work presents the efficient, matrix-free finite-element library hyper.deal for solving partial differential equations in two up to six dimensions with high-order discontinuous Galerkin methods. It builds upon the low-dimensional finite-element library deal.II to create complex low-dimensional meshes and to operate on them individually. These meshes are combined via a tensor product on the fly, and the library provides new special-purpose highly optimized matrix-free functions exploiting domain decomposition as well as shared memory via MPI-3.0 features. Both node-level performance analyses and strong/weak-scaling studies on up to 147,456 CPU cores confirm the efficiency of the implementation. Results obtained with the library hyper.deal are reported for high-dimensional advection problems and for the solution of the Vlasov–Poisson equation in up to six-dimensional phase space.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"82 1","pages":"1 - 34"},"PeriodicalIF":0.0,"publicationDate":"2020-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88465508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Medusa","authors":"J. Slak, G. Kosec","doi":"10.1145/3450966","DOIUrl":"https://doi.org/10.1145/3450966","url":null,"abstract":"Medusa, a novel library for implementation of non-particle strong form mesh-free methods, such as GFDM or RBF-FD, is described. We identify and present common parts and patterns among many such methods reported in the literature, such as node positioning, stencil selection, and stencil weight computation. Many different algorithms exist for each part and the possible combinations offer a plethora of possibilities for improvements of solution procedures that are far from fully understood. As a consequence there are still many unanswered questions in the mesh-free community resulting in vivid ongoing research in the field. Medusa implements the core mesh-free elements as independent blocks, which offers users great flexibility in experimenting with the method they are developing, as well as easily comparing it with other existing methods. The article describes the chosen abstractions and their usage, illustrates aspects of the philosophy and design, offers some executions time benchmarks and demonstrates the application of the library on cases from linear elasticity and fluid flow in irregular 2D and 3D domains.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"68 1","pages":"1 - 25"},"PeriodicalIF":0.0,"publicationDate":"2019-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84101067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Linnea","authors":"Henrik Barthels, C. Psarras, P. Bientinesi","doi":"10.1145/3446632","DOIUrl":"https://doi.org/10.1145/3446632","url":null,"abstract":"The translation of linear algebra computations into efficient sequences of library calls is a non-trivial task that requires expertise in both linear algebra and high-performance computing. Almost all high-level languages and libraries for matrix computations (e.g., Matlab, Eigen) internally use optimized kernels such as those provided by BLAS and LAPACK; however, their translation algorithms are often too simplistic and thus lead to a suboptimal use of said kernels, resulting in significant performance losses. To combine the productivity offered by high-level languages, and the performance of low-level kernels, we are developing Linnea, a code generator for linear algebra problems. As input, Linnea takes a high-level description of a linear algebra problem; as output, it returns an efficient sequence of calls to high-performance kernels. Linnea uses a custom best-first search algorithm to find a first solution in less than a second, and increasingly better solutions when given more time. In 125 test problems, the code generated by Linnea almost always outperforms Matlab, Julia, Eigen, and Armadillo, with speedups up to and exceeding 10×.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"46 1","pages":"1 - 26"},"PeriodicalIF":0.0,"publicationDate":"2019-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83403916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PCPATCH","authors":"P. Farrell, M. Knepley, L. Mitchell, F. Wechsung","doi":"10.1145/3445791","DOIUrl":"https://doi.org/10.1145/3445791","url":null,"abstract":"Effective relaxation methods are necessary for good multigrid convergence. For many equations, standard Jacobi and Gauß–Seidel are inadequate, and more sophisticated space decompositions are required; examples include problems with semidefinite terms or saddle point structure. In this article, we present a unifying software abstraction, PCPATCH, for the topological construction of space decompositions for multigrid relaxation methods. Space decompositions are specified by collecting topological entities in a mesh (such as all vertices or faces) and applying a construction rule (such as taking all degrees of freedom in the cells around each entity). The software is implemented in PETSc and facilitates the elegant expression of a wide range of schemes merely by varying solver options at runtime. In turn, this allows for the very rapid development of fast solvers for difficult problems.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"40 1","pages":"1 - 22"},"PeriodicalIF":0.0,"publicationDate":"2019-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90144993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Wolfe Line Search Algorithm for Vector Optimization","authors":"L. R. L. Pérez, L. F. Prudente","doi":"10.1145/3342104","DOIUrl":"https://doi.org/10.1145/3342104","url":null,"abstract":"In a recent article, Lucambio Pérez and Prudente extended the Wolfe conditions for the vector-valued optimization. Here, we propose a line search algorithm for finding a step size satisfying the strong Wolfe conditions in the vector optimization setting. Well definedness and finite termination results are provided. We discuss practical aspects related to the algorithm and present some numerical experiments illustrating its applicability. Codes supporting this article are written in Fortran 90 and are freely available for download.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"118 1","pages":"1 - 23"},"PeriodicalIF":0.0,"publicationDate":"2019-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77423485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Goran Flegar, F. Scheidegger, Vedran Novaković, Giovani Mariani, A. Tomás, A. Malossi, E. S. Quintana‐Ortí
{"title":"FloatX","authors":"Goran Flegar, F. Scheidegger, Vedran Novaković, Giovani Mariani, A. Tomás, A. Malossi, E. S. Quintana‐Ortí","doi":"10.1145/3368086","DOIUrl":"https://doi.org/10.1145/3368086","url":null,"abstract":"We present FloatX (Float eXtended), a C++ framework to investigate the effect of leveraging customized floating-point formats in numerical applications. FloatX formats are based on binary IEEE 754 with smaller significand and exponent bit counts specified by the user. Among other properties, FloatX facilitates an incremental transformation of the code, relies on hardware-supported floating-point types as back-end to preserve efficiency, and incurs no storage overhead. The article discusses in detail the design principles, programming interface, and datatype casting rules behind FloatX. Furthermore, it demonstrates FloatX’s usage and benefits via several case studies from well-known numerical dense linear algebra libraries, such as BLAS and LAPACK; the Ginkgo library for sparse linear systems; and two neural network applications related with image processing and text recognition.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"23 1","pages":"1 - 23"},"PeriodicalIF":0.0,"publicationDate":"2019-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84408440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Algorithm 1002","authors":"Gökçehan Kara, C. Özturan","doi":"10.1145/3330481","DOIUrl":"https://doi.org/10.1145/3330481","url":null,"abstract":"The maximum flow problem is one of the most common network flow problems. This problem involves finding the maximum possible amount of flow between two designated nodes on a network with arcs having flow capacities. The push-relabel algorithm is one of the fastest algorithms to solve this problem. We present a shared memory parallel push-relabel algorithm. Graph coloring is used to avoid collisions between threads for concurrent push and relabel operations. In addition, excess values of target nodes are updated using atomic instructions to prevent race conditions. The experiments show that our algorithm is competitive for wide graphs with low diameters. Results from three different data sets are included, computer vision problems, DIMACS challenge problems, and KaHIP partitioning problems. These are compared with existing push-relabel and pseudoflow implementations. We show that high speedup rates are possible using our coloring based parallelization technique on sparse networks. However, we also observe that the pseudoflow algorithm runs faster than the push-relabel algorithm on dense and long networks.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"52 1","pages":"1 - 28"},"PeriodicalIF":0.0,"publicationDate":"2019-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85794031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On Kummer Lines with Full Rational 2-torsion and Their Usage in Cryptography","authors":"H. Hisil, Joost Renes","doi":"10.1145/3361680","DOIUrl":"https://doi.org/10.1145/3361680","url":null,"abstract":"A paper by Karati and Sarkar at Asiacrypt’17 has pointed out the potential for Kummer lines in genus 1, by observing that their SIMD-friendly arithmetic is competitive with the status quo. A more recent preprint explores the connection with (twisted) Edwards curves. In this article, we extend this work and significantly simplify the treatment of Karati and Sarkar. We show that their Kummer line is the x-line of a Montgomery curve translated by a point of order two, and exhibit a natural isomorphism to the y-line of a twisted Edwards curve. Moreover, we show that the Kummer line presented by Gaudry and Lubicz can be obtained via the action of a point of order two on the y-line of an Edwards curve. The maps connecting these curves and lines are all very simple. As a result, a cryptographic implementation can use the arithmetic that is optimal for its instruction set at negligible cost.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"27 1","pages":"1 - 17"},"PeriodicalIF":0.0,"publicationDate":"2019-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87111862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Algorithm 1001","authors":"Florian Bürgel, K. Kazimierski, A. Lechleiter","doi":"10.1145/3328525","DOIUrl":"https://doi.org/10.1145/3328525","url":null,"abstract":"IPscatt is a free, open-source MATLAB toolbox facilitating the solution for time-independent scattering (also known as time-harmonic scattering) in two- and three-dimensional settings. The toolbox has three main application cases: simulation of the scattered field for a given transmitter-receiver geometry; the generation of simulated data as well as the handling of the real-world data from Institute Fresnel; and the reconstruction of the contrast from several measured, scattered fields. In each case, a variety of options tailored to the needs of practitioners is provided. For example, the toolbox allows the simulation of the scattered near field as well as of the far field. Also, it provides methods for the modeling of the incident field as point sources as well as plane waves. Finally, many common geometries of transmitters and receivers are included out of the box. Regarding the reconstruction, the provided functions implement the regularization scheme that relies on a primal-dual algorithm and was introduced by F. Bürgel, K. S. Kazimierski, and A. Lechleiter [Journal of Computational Physics 339 (2017), 1–30]. This article provides a survey of the mathematical concepts in scattering, connects them with the provided implementation, gives an overview of the software framework as well as its application areas, and compares it with existing software packages solving the same problem.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"25 1","pages":"1 - 20"},"PeriodicalIF":0.0,"publicationDate":"2019-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85135460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Algorithm 1000","authors":"T. Davis","doi":"10.1145/3322125","DOIUrl":"https://doi.org/10.1145/3322125","url":null,"abstract":"SuiteSparse:GraphBLAS is a full implementation of the GraphBLAS standard, which defines a set of sparse matrix operations on an extended algebra of semirings using an almost unlimited variety of operators and types. When applied to sparse adjacency matrices, these algebraic operations are equivalent to computations on graphs. GraphBLAS provides a powerful and expressive framework for creating graph algorithms based on the elegant mathematics of sparse matrix operations on a semiring. An overview of the GraphBLAS specification is given, followed by a description of the key features and performance of its implementation in the SuiteSparse:GraphBLAS package.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"1857 1","pages":"1 - 25"},"PeriodicalIF":0.0,"publicationDate":"2019-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86528871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}