ACM Transactions on Mathematical Software (TOMS)最新文献

Configurable Open-source Data Structure for Distributed Conforming Unstructured Homogeneous Meshes with GPU Support 支持GPU的分布式非结构化同构网格的可配置开源数据结构

ACM Transactions on Mathematical Software (TOMS) Pub Date : 2022-09-10 DOI: 10.1145/3536164

Jakub Klinkovský, T. Oberhuber, R. Fučík, Vítezslav Zabka

{"title":"Configurable Open-source Data Structure for Distributed Conforming Unstructured Homogeneous Meshes with GPU Support","authors":"Jakub Klinkovský, T. Oberhuber, R. Fučík, Vítezslav Zabka","doi":"10.1145/3536164","DOIUrl":"https://doi.org/10.1145/3536164","url":null,"abstract":"A general multi-purpose data structure for an efficient representation of conforming unstructured homogeneous meshes for scientific computations on CPU and GPU-based systems is presented. The data structure is provided as open-source software as part of the TNL library (https://tnl-project.org/). The abstract representation supports almost any cell shape and common 2D quadrilateral, 3D hexahedron and arbitrarily dimensional simplex shapes are currently built into the library. The implementation is highly configurable via templates of the C++ language, which allows avoiding the storage of unnecessary dynamic data. The internal memory layout is based on state-of-the-art sparse matrix storage formats, which are optimized for different hardware architectures in order to provide high-performance computations. The proposed data structure is also suitable for meshes decomposed into several subdomains and distributed computing using the Message Passing Interface (MPI). The efficiency of the implemented data structure on CPU and GPU hardware architectures is demonstrated on several benchmark problems and a comparison with another library. Its applicability to advanced numerical methods is demonstrated with an example problem of two-phase flow in porous media using a numerical scheme based on the mixed-hybrid finite element method (MHFEM). We show GPU speed-ups that rise above 20 in 2D and 50 in 3D when compared to sequential CPU computations, and above 2 in 2D and 9 in 3D when compared to 12-threaded CPU computations.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"10 1","pages":"1 - 30"},"PeriodicalIF":0.0,"publicationDate":"2022-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86142597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Algorithm 1027: NOMAD Version 4: Nonlinear Optimization with the MADS Algorithm 算法1027:NOMAD版本4:非线性优化与MADS算法

ACM Transactions on Mathematical Software (TOMS) Pub Date : 2022-06-17 DOI: 10.1145/3544489

C. Audet, Sébastien Le Digabel, Viviane Rochon Montplaisir, C. Tribes

{"title":"Algorithm 1027: NOMAD Version 4: Nonlinear Optimization with the MADS Algorithm","authors":"C. Audet, Sébastien Le Digabel, Viviane Rochon Montplaisir, C. Tribes","doi":"10.1145/3544489","DOIUrl":"https://doi.org/10.1145/3544489","url":null,"abstract":"NOMADis a state-of-the-art software package for optimizing blackbox problems. In continuous development since 2001, it constantly evolved with the integration of new algorithmic features published in scientific publications. These features are motivated by real applications encountered by industrial partners. The latest major release of NOMAD, version 3, dates to 2008. Minor releases are produced as new features are incorporated. The present work describes NOMAD 4, a complete redesign of the previous version, with a new architecture providing more flexible code, added functionalities, and reusable code. We introduce algorithmic components, which are building blocks for more complex algorithms and can initiate other components, launch nested algorithms, or perform specialized tasks. They facilitate the implementation of new ideas, including the MegaSearchPoll component, warm and hot restarts, and a revised version of the PsdMads algorithm. Another main improvement of NOMAD 4 is the usage of parallelism, to simultaneously compute multiple blackbox evaluations and to maximize usage of available cores. Running different algorithms, tuning their parameters, and comparing their performance for optimization are simpler than before, while overall optimization performance is maintained between versions 3 and 4. NOMAD is freely available at www.gerad.ca/nomad and the whole project is visible at github.com/bbopt/nomad.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"11 1","pages":"1 - 22"},"PeriodicalIF":0.0,"publicationDate":"2022-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80405903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

Toward Accurate and Fast Summation 走向准确、快速的求和

ACM Transactions on Mathematical Software (TOMS) Pub Date : 2022-06-15 DOI: 10.1145/3544488

M. Lange

引用次数: 2

Algorithm 1028: VTMOP: Solver for Blackbox Multiobjective Optimization Problems 算法1028:VTMOP:求解黑盒多目标优化问题

ACM Transactions on Mathematical Software (TOMS) Pub Date : 2022-05-27 DOI: 10.1145/3529258

Tyler H. Chang, L.T. Watson, Jeffrey Larson, N. Neveu, W. Thacker, Shubhangi G. Deshpande, T. Lux

引用次数: 6

Parallel QR Factorization of Block Low-rank Matrices 块低秩矩阵的并行QR分解

ACM Transactions on Mathematical Software (TOMS) Pub Date : 2022-05-23 DOI: 10.1145/3538647

M. R. Apriansyah, Rio Yokota

{"title":"Parallel QR Factorization of Block Low-rank Matrices","authors":"M. R. Apriansyah, Rio Yokota","doi":"10.1145/3538647","DOIUrl":"https://doi.org/10.1145/3538647","url":null,"abstract":"We present two new algorithms for Householder QR factorization of Block Low-Rank (BLR) matrices: one that performs block-column-wise QR and another that is based on tiled QR. We show how the block-column-wise algorithm exploits BLR structure to achieve arithmetic complexity of 𝒪(mn), while the tiled BLR-QR exhibits 𝒪(mn1.5 complexity. However, the tiled BLR-QR has finer task granularity that allows parallel task-based execution on shared memory systems. We compare the block-column-wise BLR-QR using fork-join parallelism with tiled BLR-QR using task-based parallelism. We also compare these two implementations of Householder BLR-QR with a block-column-wise Modified Gram–Schmidt (MGS) BLR-QR using fork-join parallelism and a state-of-the-art vendor-optimized dense Householder QR in Intel MKL. For a matrix of size 131k × 65k, all BLR methods are more than an order of magnitude faster than the dense QR in MKL. Our methods are also robust to ill conditioning and produce better orthogonal factors than the existing MGS-based method. On a CPU with 64 cores, our parallel tiled Householder and block-column-wise Householder algorithms show a speedup of 50 and 37 times, respectively.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"275 1","pages":"1 - 28"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85030943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

On Memory Traffic and Optimisations for Low-order Finite Element Assembly Algorithms on Multi-core CPUs 多核cpu上低阶有限元装配算法的内存流量与优化

ACM Transactions on Mathematical Software (TOMS) Pub Date : 2022-03-04 DOI: 10.1145/3503925

James D. Trotter, Xing Cai, S. Funke

引用次数: 2

Algorithm 1021: SPEX Left LU, Exactly Solving Sparse Linear Systems via a Sparse Left-looking Integer-preserving LU Factorization 算法1021:SPEX左LU，通过稀疏左查找保整LU分解精确求解稀疏线性系统

ACM Transactions on Mathematical Software (TOMS) Pub Date : 2022-03-04 DOI: 10.1145/3519024

Christopher Lourenco, Jinhao Chen, Erick Moreno-Centeno, T. Davis

{"title":"Algorithm 1021: SPEX Left LU, Exactly Solving Sparse Linear Systems via a Sparse Left-looking Integer-preserving LU Factorization","authors":"Christopher Lourenco, Jinhao Chen, Erick Moreno-Centeno, T. Davis","doi":"10.1145/3519024","DOIUrl":"https://doi.org/10.1145/3519024","url":null,"abstract":"SPEX Left LU is a software package for exactly solving unsymmetric sparse linear systems. As a component of the sparse exact (SPEX) software package, SPEX Left LU can be applied to any input matrix, A, whose entries are integral, rational, or decimal, and provides a solution to the system ( Ax = b ) , which is either exact or accurate to user-specified precision. SPEX Left LU preorders the matrix A with a user-specified fill-reducing ordering and computes a left-looking LU factorization with the special property that each operation used to compute the L and U matrices is integral. Notable additional applications of this package include benchmarking the stability and accuracy of state-of-the-art linear solvers and determining whether singular-to-double-precision matrices are indeed singular. Computationally, this article evaluates the impact of several novel pivoting schemes in exact arithmetic, benchmarks the exact iterative solvers within Linbox, and benchmarks the accuracy of MATLAB sparse backslash. Most importantly, it is shown that SPEX Left LU outperforms the exact iterative solvers in run time on easy instances and in stability as the iterative solver fails on a sizeable subset of the tested (both easy and hard) instances. The SPEX Left LU package is written in ANSI C, comes with a MATLAB interface, and is distributed via GitHub, as a component of the SPEX software package, and as a component of SuiteSparse.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"14 1","pages":"1 - 23"},"PeriodicalIF":0.0,"publicationDate":"2022-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82138007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Provably Robust Algorithm for Triangle-triangle Intersections in Floating-point Arithmetic 浮点运算中三角形-三角形相交的可证明鲁棒算法

ACM Transactions on Mathematical Software (TOMS) Pub Date : 2022-03-04 DOI: 10.1145/3513264

Conor Mccoid, M. Gander

引用次数: 7

Exploiting Problem Structure in Derivative Free Optimization 利用无导数优化中的问题结构

ACM Transactions on Mathematical Software (TOMS) Pub Date : 2022-02-16 DOI: 10.1145/3474054

M. Porcelli, P. Toint

{"title":"Exploiting Problem Structure in Derivative Free Optimization","authors":"M. Porcelli, P. Toint","doi":"10.1145/3474054","DOIUrl":"https://doi.org/10.1145/3474054","url":null,"abstract":"A structured version of derivative-free random pattern search optimization algorithms is introduced, which is able to exploit coordinate partially separable structure (typically associated with sparsity) often present in unconstrained and bound-constrained optimization problems. This technique improves performance by orders of magnitude and makes it possible to solve large problems that otherwise are totally intractable by other derivative-free methods. A library of interpolation-based modelling tools is also described, which can be associated with the structured or unstructured versions of the initial pattern search algorithm. The use of the library further enhances performance, especially when associated with structure. The significant gains in performance associated with these two techniques are illustrated using a new freely-available release of the Brute Force Optimizer (BFO) package firstly introduced in [Porcelli and Toint 2017], which incorporates them. An interesting conclusion of the numerical results presented is that providing global structural information on a problem can result in significantly less evaluations of the objective function than attempting to building local Taylor-like models.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"128 1","pages":"1 - 25"},"PeriodicalIF":0.0,"publicationDate":"2022-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77698048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Reproduced Computational Results Report for “Ginkgo: A Modern Linear Operator Algebra Framework for High Performance Computing” “银杏:用于高性能计算的现代线性算子代数框架”的再现计算结果报告

ACM Transactions on Mathematical Software (TOMS) Pub Date : 2022-02-16 DOI: 10.1145/3480936

C. Balos

引用次数: 0