ACM Transactions on Mathematical Software最新文献

筛选
英文 中文
Accurate Calculation of Euclidean Norms Using Double-word Arithmetic 用双字算术精确计算欧几里得范数
IF 2.7 1区 数学
ACM Transactions on Mathematical Software Pub Date : 2022-10-25 DOI: 10.1145/3568672
V. Lefèvre, N. Louvet, J. Muller, Joris Picot, L. Rideau
{"title":"Accurate Calculation of Euclidean Norms Using Double-word Arithmetic","authors":"V. Lefèvre, N. Louvet, J. Muller, Joris Picot, L. Rideau","doi":"10.1145/3568672","DOIUrl":"https://doi.org/10.1145/3568672","url":null,"abstract":"We consider the computation of the Euclidean (or L2) norm of an n-dimensional vector in floating-point arithmetic. We review the classical solutions used to avoid spurious overflow or underflow and/or to obtain very accurate results. We modify a recently published algorithm (that uses double-word arithmetic) to allow for a very accurate solution, free of spurious overflows and underflows. To that purpose, we use a double-word square-root algorithm of which we provide a tight error analysis. The returned L2 norm will be within very slightly more than 0.5 ulp from the exact result, which means that we will almost always provide correct rounding.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"49 1","pages":"1 - 34"},"PeriodicalIF":2.7,"publicationDate":"2022-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46100009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
IFISS3D: A Computational Laboratory for Investigating Finite Element Approximation in Three Dimensions IFISS3D:研究三维有限元逼近的计算实验室
IF 2.7 1区 数学
ACM Transactions on Mathematical Software Pub Date : 2022-09-27 DOI: 10.1145/3604934
Georgios Papanikos, C. Powell, D. Silvester
{"title":"IFISS3D: A Computational Laboratory for Investigating Finite Element Approximation in Three Dimensions","authors":"Georgios Papanikos, C. Powell, D. Silvester","doi":"10.1145/3604934","DOIUrl":"https://doi.org/10.1145/3604934","url":null,"abstract":"IFISS is an established MATLAB finite element software package for studying strategies for solving partial differential equations (PDEs). IFISS3D is a new add-on toolbox that extends IFISS capabilities for elliptic PDEs from two to three space dimensions. The open-source MATLAB framework provides a computational laboratory for experimentation and exploration of finite element approximation and error estimation, as well as iterative solvers. The package is designed to be useful as a teaching tool for instructors and students who want to learn about state-of-the-art finite element methodology. It will also be useful for researchers as a source of reproducible test matrices of arbitrarily large dimension.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"49 1","pages":"1 - 14"},"PeriodicalIF":2.7,"publicationDate":"2022-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47090209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Automatic Differentiation of C++ Codes on Emerging Manycore Architectures with Sacado 基于Sacado的新兴多核体系结构c++代码自动识别
IF 2.7 1区 数学
ACM Transactions on Mathematical Software Pub Date : 2022-09-27 DOI: 10.1145/3560262
E. Phipps, R. Pawlowski, C. Trott
{"title":"Automatic Differentiation of C++ Codes on Emerging Manycore Architectures with Sacado","authors":"E. Phipps, R. Pawlowski, C. Trott","doi":"10.1145/3560262","DOIUrl":"https://doi.org/10.1145/3560262","url":null,"abstract":"Automatic differentiation (AD) is a well-known technique for evaluating analytic derivatives of calculations implemented on a computer, with numerous software tools available for incorporating AD technology into complex applications. However, a growing challenge for AD is the efficient differentiation of parallel computations implemented on emerging manycore computing architectures such as multicore CPUs, GPUs, and accelerators as these devices become more pervasive. In this work, we explore forward mode, operator overloading-based differentiation of C++ codes on these architectures using the widely available Sacado AD software package. In particular, we leverage Kokkos, a C++ tool providing APIs for implementing parallel computations that is portable to a wide variety of emerging architectures. We describe the challenges that arise when differentiating code for these architectures using Kokkos, and two approaches for overcoming them that ensure optimal memory access patterns as well as expose additional dimensions of fine-grained parallelism in the derivative calculation. We describe the results of several computational experiments that demonstrate the performance of the approach on a few contemporary CPU and GPU architectures. We then conclude with applications of these techniques to the simulation of discretized systems of partial differential equations.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"48 1","pages":"1 - 29"},"PeriodicalIF":2.7,"publicationDate":"2022-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42830984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Remark on Algorithm 1010: Boosting Efficiency in Solving Quartic Equations with No Compromise in Accuracy 算法1010:在不影响精度的情况下提高求解四次方程的效率
IF 2.7 1区 数学
ACM Transactions on Mathematical Software Pub Date : 2022-09-19 DOI: 10.1145/3564270
C. De Michele
{"title":"Remark on Algorithm 1010: Boosting Efficiency in Solving Quartic Equations with No Compromise in Accuracy","authors":"C. De Michele","doi":"10.1145/3564270","DOIUrl":"https://doi.org/10.1145/3564270","url":null,"abstract":"We present a correction and an improvement to Algorithm 1010 [A. Orellana and C. De Michele 2020].","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"48 1","pages":"1 - 3"},"PeriodicalIF":2.7,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64059873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
emgr – EMpirical GRamian Framework Version 5.99 emgr -经验语法框架版本5.99
IF 2.7 1区 数学
ACM Transactions on Mathematical Software Pub Date : 2022-09-08 DOI: 10.1145/3609860
Christian Himpe
{"title":"emgr – EMpirical GRamian Framework Version 5.99","authors":"Christian Himpe","doi":"10.1145/3609860","DOIUrl":"https://doi.org/10.1145/3609860","url":null,"abstract":"Version 5.99 of the empirical Gramian framework – emgr – completes a development cycle which focused on parametric model order reduction of gas network models while preserving compatibility to the previous development for the application of combined state and parameter reduction for neuroscience network models. Second, new features concerning empirical Gramian types, perturbation design, and trajectory post-processing, as well as a Python version in addition to the default MATLAB / Octave implementation, have been added. This work summarizes these changes, particularly since emgr version 5.4, see Himpe, 2018 [Algorithms 11(7): 91], and gives recent as well as future applications, such as parameter identification in systems biology, based on the current feature set.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"49 1","pages":"1 - 8"},"PeriodicalIF":2.7,"publicationDate":"2022-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64082663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Cache-oblivious Hilbert Curve-based Blocking Scheme for Matrix Transposition 基于缓存遗忘Hilbert曲线的矩阵换位分块方案
IF 2.7 1区 数学
ACM Transactions on Mathematical Software Pub Date : 2022-08-09 DOI: 10.1145/3555353
J. N. F. Alves, L. Russo, Alexandre P. Francisco
{"title":"Cache-oblivious Hilbert Curve-based Blocking Scheme for Matrix Transposition","authors":"J. N. F. Alves, L. Russo, Alexandre P. Francisco","doi":"10.1145/3555353","DOIUrl":"https://doi.org/10.1145/3555353","url":null,"abstract":"This article presents a fast SIMD Hilbert space-filling curve generator, which supports a new cache-oblivious blocking-scheme technique applied to the out-of-place transposition of general matrices. Matrix operations found in high performance computing libraries are usually parameterized based on host microprocessor specifications to minimize data movement within the different levels of memory hierarchy. The performance of cache-oblivious algorithms does not rely on such parameterizations. This type of algorithm provides an elegant and portable solution to address the lack of standardization in modern-day processors. Our solution consists in an iterative blocking scheme that takes advantage of the locality-preserving properties of Hilbert space-filling curves to minimize data movement in any memory hierarchy. This scheme traverses the input matrix, in O(nm) time and space, improving the behavior of matrix algorithms that inherently present poor memory locality. The application of this technique to the problem of out-of-place matrix transposition achieved competitive results when compared to state-of-the-art approaches. The performance of our solution surpassed Intel MKL version after employing standard software prefetching techniques.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"48 1","pages":"1 - 28"},"PeriodicalIF":2.7,"publicationDate":"2022-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43226486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Algorithm xxx: SC-SR1: MATLAB Software for Limited-Memory SR1 Trust-Region Methods 算法xxx:SC-SR1:MATLAB有限内存软件SR1信赖域方法
IF 2.7 1区 数学
ACM Transactions on Mathematical Software Pub Date : 2022-07-22 DOI: 10.1145/3550269
J. Brust, O. Burdakov, Jennifer B. Erway, Roummel F. Marcia
{"title":"Algorithm xxx: SC-SR1: MATLAB Software for Limited-Memory SR1 Trust-Region Methods","authors":"J. Brust, O. Burdakov, Jennifer B. Erway, Roummel F. Marcia","doi":"10.1145/3550269","DOIUrl":"https://doi.org/10.1145/3550269","url":null,"abstract":"We present a MATLAB implementation of the symmetric rank-one (SC-SR1) method that solves trust-region subproblems when a limited-memory symmetric rank-one (L-SR1) matrix is used in place of the true Hessian matrix, which can be used for large-scale optimization. The method takes advantage of two shape-changing norms [7, 9] to decompose the trust-region subproblem into two separate problems. Using one of the proposed norms, the resulting subproblems have closed-form solutions. Meanwhile, using the other proposed norm, one of the resulting subproblems has a closed-form solution while the other is easily solvable using techniques that exploit the structure of L-SR1 matrices. Numerical results suggest that the SC-SR1 method is able to solve trust-region subproblems to high accuracy even in the so-called “hard case”. When integrated into a trust-region algorithm, extensive numerical experiments suggest that the proposed algorithms perform well, when compared with widely used solvers, such as truncated CG.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":" ","pages":""},"PeriodicalIF":2.7,"publicationDate":"2022-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41802660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Algorithms for Parallel Generic hp-Adaptive Finite Element Software 并行通用hp-自适应有限元软件算法
IF 2.7 1区 数学
ACM Transactions on Mathematical Software Pub Date : 2022-06-13 DOI: 10.1145/3603372
M. Fehling, W. Bangerth
{"title":"Algorithms for Parallel Generic hp-Adaptive Finite Element Software","authors":"M. Fehling, W. Bangerth","doi":"10.1145/3603372","DOIUrl":"https://doi.org/10.1145/3603372","url":null,"abstract":"The hp-adaptive finite element method—where one independently chooses the mesh size (h) and polynomial degree (p) to be used on each cell—has long been known to have better theoretical convergence properties than either h- or p-adaptive methods alone. However, it is not widely used, owing at least in part to the difficulty of the underlying algorithms and the lack of widely usable implementations. This is particularly true when used with continuous finite elements. Herein, we discuss algorithms that are necessary for a comprehensive and generic implementation of hp-adaptive finite element methods on distributed-memory, parallel machines. In particular, we will present a multistage algorithm for the unique enumeration of degrees of freedom suitable for continuous finite element spaces, describe considerations for weighted load balancing, and discuss the transfer of variable size data between processes. We illustrate the performance of our algorithms with numerical examples and demonstrate that they scale reasonably up to at least 16,384 message passage interface processes. We provide a reference implementation of our algorithms as part of the open source library deal.II.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"49 1","pages":"1 - 26"},"PeriodicalIF":2.7,"publicationDate":"2022-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44759580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
ARKODE: A Flexible IVP Solver Infrastructure for One-step Methods ARKODE:用于一步法的灵活IVP求解器基础结构
IF 2.7 1区 数学
ACM Transactions on Mathematical Software Pub Date : 2022-05-27 DOI: 10.1145/3594632
D. Reynolds, D. J. Gardner, C. Woodward, Rujeko Chinomona
{"title":"ARKODE: A Flexible IVP Solver Infrastructure for One-step Methods","authors":"D. Reynolds, D. J. Gardner, C. Woodward, Rujeko Chinomona","doi":"10.1145/3594632","DOIUrl":"https://doi.org/10.1145/3594632","url":null,"abstract":"We describe the ARKODE library of one-step time integration methods for ordinary differential equation (ODE) initial-value problems (IVPs). In addition to providing standard explicit and diagonally implicit Runge–Kutta methods, ARKODE supports one-step methods designed to treat additive splittings of the IVP, including implicit-explicit (ImEx) additive Runge–Kutta methods and multirate infinitesimal (MRI) methods. We present the role of ARKODE within the SUNDIALS suite of time integration and nonlinear solver libraries, the core ARKODE infrastructure for utilities common to large classes of one-step methods, as well as its use of “time stepper” modules enabling easy incorporation of novel algorithms into the library. Numerical results show example problems of increasing complexity, highlighting the algorithmic flexibility afforded through this infrastructure, and include a larger multiphysics application leveraging multiple algorithmic features from ARKODE and SUNDIALS.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":" ","pages":"1 - 26"},"PeriodicalIF":2.7,"publicationDate":"2022-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47709068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Algorithm XXX: Concurrent Alternating Least Squares for multiple simultaneous Canonical Polyadic Decompositions 算法XXX:并行交替最小二乘的多重同时正则多进分解
IF 2.7 1区 数学
ACM Transactions on Mathematical Software Pub Date : 2022-04-29 DOI: 10.1145/3519383
C. Psarras, L. Karlsson, R. Bro, P. Bientinesi
{"title":"Algorithm XXX: Concurrent Alternating Least Squares for multiple simultaneous Canonical Polyadic Decompositions","authors":"C. Psarras, L. Karlsson, R. Bro, P. Bientinesi","doi":"10.1145/3519383","DOIUrl":"https://doi.org/10.1145/3519383","url":null,"abstract":"Tensor decompositions, such as CANDECOMP/PARAFAC (CP), are widely used in a variety of applications, such as chemometrics, signal processing, and machine learning. A broadly used method for computing such decompositions relies on the Alternating Least Squares (ALS) algorithm. When the number of components is small, regardless of its implementation, ALS exhibits low arithmetic intensity, which severely hinders its performance and makes GPU offloading ineffective. We observe that, in practice, experts often have to compute multiple decompositions of the same tensor, each with a small number of components (typically fewer than 20), to ultimately find the best ones to use for the application at hand. In this paper, we illustrate how multiple decompositions of the same tensor can be fused together at the algorithmic level to increase the arithmetic intensity. Therefore, it becomes possible to make efficient use of GPUs for further speedups; at the same time the technique is compatible with many enhancements typically used in ALS, such as line search, extrapolation, and non-negativity constraints. We introduce the Concurrent ALS algorithm and library, which offers an interface to MATLAB, and a mechanism to effectively deal with the issue that decompositions complete at different times. Experimental results on artificial and real datasets demonstrate a shorter time to completion due to increased arithmetic intensity.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"1 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2022-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41347702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信