ACM Transactions on Mathematical Software最新文献_第8页

Cache-oblivious Hilbert Curve-based Blocking Scheme for Matrix Transposition 基于缓存无关Hilbert曲线的矩阵转置阻塞方案

IF 2.7 1区数学

ACM Transactions on Mathematical Software Pub Date : 2022-12-19 DOI: https://dl.acm.org/doi/10.1145/3555353

João Nuno Ferreira Alves, Luís Manuel Silveira Russo, Alexandre Francisco

{"title":"Cache-oblivious Hilbert Curve-based Blocking Scheme for Matrix Transposition","authors":"João Nuno Ferreira Alves, Luís Manuel Silveira Russo, Alexandre Francisco","doi":"https://dl.acm.org/doi/10.1145/3555353","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3555353","url":null,"abstract":"This article presents a fast SIMD Hilbert space-filling curve generator, which supports a new cache-oblivious blocking-scheme technique applied to the out-of-place transposition of general matrices. Matrix operations found in high performance computing libraries are usually parameterized based on host microprocessor specifications to minimize data movement within the different levels of memory hierarchy. The performance of cache-oblivious algorithms does not rely on such parameterizations. This type of algorithm provides an elegant and portable solution to address the lack of standardization in modern-day processors. Our solution consists in an iterative blocking scheme that takes advantage of the locality-preserving properties of Hilbert space-filling curves to minimize data movement in any memory hierarchy. This scheme traverses the input matrix, in O(nm) time and space, improving the behavior of matrix algorithms that inherently present poor memory locality. The application of this technique to the problem of out-of-place matrix transposition achieved competitive results when compared to state-of-the-art approaches. The performance of our solution surpassed Intel MKL version after employing standard software prefetching techniques.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"39 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2022-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138537819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Remark on Algorithm 1010: Boosting Efficiency in Solving Quartic Equations with No Compromise in Accuracy 算法1010:在不影响精度的情况下提高求解四次方程的效率

IF 2.7 1区数学

ACM Transactions on Mathematical Software Pub Date : 2022-12-19 DOI: https://dl.acm.org/doi/10.1145/3564270

Cristiano De Michele

引用次数: 0

DIRECTGO: A New DIRECT-Type MATLAB Toolbox for Derivative-Free Global Optimization DIRECTGO:一种用于无导数全局优化的新型直接型MATLAB工具箱

IF 2.7 1区数学

ACM Transactions on Mathematical Software Pub Date : 2022-12-19 DOI: https://dl.acm.org/doi/10.1145/3559755

Linas Stripinis, Remigijus Paulavičius

{"title":"DIRECTGO: A New DIRECT-Type MATLAB Toolbox for Derivative-Free Global Optimization","authors":"Linas Stripinis, Remigijus Paulavičius","doi":"https://dl.acm.org/doi/10.1145/3559755","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3559755","url":null,"abstract":"In this work, we introduce <monospace>DIRECTGO</monospace>, a new <monospace>MATLAB</monospace> toolbox for derivative-free global optimization. <monospace>DIRECTGO</monospace> collects various deterministic derivative-free <monospace>DIRECT</monospace>-type algorithms for box-constrained, generally constrained, and problems with hidden constraints. Each sequential algorithm is implemented in two ways: using static and dynamic data structures for more efficient information storage and organization. Furthermore, parallel schemes are applied to some promising algorithms within <monospace>DIRECTGO</monospace>. The toolbox is equipped with a graphical user interface (GUI), ensuring the user-friendly use of all functionalities available in <monospace>DIRECTGO</monospace>. Available features are demonstrated in detailed computational studies using a comprehensive <monospace>DIRECTGOLib v1.0</monospace> library of global optimization test problems. Additionally, 11 classical engineering design problems illustrate the potential of <monospace>DIRECTGO</monospace> to solve challenging real-world problems. Finally, the appendix gives examples of accompanying <monospace>MATLAB</monospace> programs and provides a synopsis of its use on the test problems with box and general constraints.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"52 ","pages":""},"PeriodicalIF":2.7,"publicationDate":"2022-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Automatic Differentiation of C++ Codes on Emerging Manycore Architectures with Sacado 基于Sacado的新兴多核体系结构c++代码自动识别

IF 2.7 1区数学

ACM Transactions on Mathematical Software Pub Date : 2022-12-19 DOI: https://dl.acm.org/doi/10.1145/3560262

Eric Phipps, Roger Pawlowski, Christian Trott

{"title":"Automatic Differentiation of C++ Codes on Emerging Manycore Architectures with Sacado","authors":"Eric Phipps, Roger Pawlowski, Christian Trott","doi":"https://dl.acm.org/doi/10.1145/3560262","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3560262","url":null,"abstract":"Automatic differentiation (AD) is a well-known technique for evaluating analytic derivatives of calculations implemented on a computer, with numerous software tools available for incorporating AD technology into complex applications. However, a growing challenge for AD is the efficient differentiation of parallel computations implemented on emerging manycore computing architectures such as multicore CPUs, GPUs, and accelerators as these devices become more pervasive. In this work, we explore forward mode, operator overloading-based differentiation of C++ codes on these architectures using the widely available Sacado AD software package. In particular, we leverage Kokkos, a C++ tool providing APIs for implementing parallel computations that is portable to a wide variety of emerging architectures. We describe the challenges that arise when differentiating code for these architectures using Kokkos, and two approaches for overcoming them that ensure optimal memory access patterns as well as expose additional dimensions of fine-grained parallelism in the derivative calculation. We describe the results of several computational experiments that demonstrate the performance of the approach on a few contemporary CPU and GPU architectures. We then conclude with applications of these techniques to the simulation of discretized systems of partial differential equations.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"105 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2022-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138537803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Waveform Relaxation with Asynchronous Time-integration 异步时间积分的波形松弛

IF 2.7 1区数学

ACM Transactions on Mathematical Software Pub Date : 2022-12-19 DOI: https://dl.acm.org/doi/10.1145/3569578

Peter Meisrimel, Philipp Birken

{"title":"Waveform Relaxation with Asynchronous Time-integration","authors":"Peter Meisrimel, Philipp Birken","doi":"https://dl.acm.org/doi/10.1145/3569578","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3569578","url":null,"abstract":"We consider Waveform Relaxation (WR) methods for parallel and partitioned time-integration of surface-coupled multiphysics problems. WR allows independent time-discretizations on independent and adaptive time-grids, while maintaining high time-integration orders. Classical WR methods such as Jacobi or Gauss-Seidel WR are typically either parallel or converge quickly.We present a novel parallel WR method utilizing asynchronous communication techniques to get both properties. Classical WR methods exchange discrete functions after time-integration of a subproblem. We instead asynchronously exchange time-point solutions during time-integration and directly incorporate all new information in the interpolants. We show both continuous and time-discrete convergence in a framework that generalizes existing linear WR convergence theory. An algorithm for choosing optimal relaxation in our new WR method is presented. Convergence is demonstrated in two conjugate heat transfer examples. Our new method shows an improved performance over classical WR methods. In one example, we show a partitioned coupling of the compressible Euler equations with a nonlinear heat equation, with subproblems implemented using the open source libraries <monospace>DUNE</monospace> and <monospace>FEniCS</monospace>.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"75 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2022-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138537801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Algorithm 1034: An Accelerated Algorithm to Compute the Qn Robust Statistic, with Corrections to Constants 算法1034:一种计算Qn鲁棒统计量的加速算法，并对常数进行校正

IF 2.7 1区数学

ACM Transactions on Mathematical Software Pub Date : 2022-12-16 DOI: 10.1145/3576920

Thierry Fahmy

引用次数: 0

Algorithm xxx: Parallel Implementations for Computing the Minimum Distance of a Random Linear Code on Distributed-memory Architectures xxx算法：分布式存储器结构上计算随机线性码最小距离的并行实现

IF 2.7 1区数学

ACM Transactions on Mathematical Software Pub Date : 2022-12-05 DOI: 10.1145/3573383

G. Quintana-Ortí, Fernando Hernando, F. D. Igual

引用次数: 0

Array-Aware Matching: Taming the Complexity of Large-Scale Simulation Models 阵列感知匹配：驯服大规模仿真模型的复杂性

IF 2.7 1区数学

ACM Transactions on Mathematical Software Pub Date : 2022-11-22 DOI: 10.1145/3611661

Massimo Fioravanti, Daniele Cattaneo, F. Terraneo, Silvano Seva, Stefano Cherubin, G. Agosta, F. Casella, A. Leva

{"title":"Array-Aware Matching: Taming the Complexity of Large-Scale Simulation Models","authors":"Massimo Fioravanti, Daniele Cattaneo, F. Terraneo, Silvano Seva, Stefano Cherubin, G. Agosta, F. Casella, A. Leva","doi":"10.1145/3611661","DOIUrl":"https://doi.org/10.1145/3611661","url":null,"abstract":"Equation-based modelling is a powerful approach to tame the complexity of large-scale simulation problems. Equation-based tools automatically translate models into imperative languages. When confronted with nowadays’ problems, however, well assessed model translation techniques exhibit scalability issues that are particularly severe when models contain very large arrays. In fact, such models can be made very compact by enclosing equations into looping constructs, but reflecting the same compactness into the translated imperative code is nontrivial. In this paper, we face this issue by concentrating on a key step of equations-to-code translation, the equation/variable matching. We first show that an efficient translation of models with (large) arrays needs awareness of their presence, by defining a figure of merit to measure how much the looping constructs are preserved along the translation. We then show that the said figure of merit allows to define an optimal array-aware matching, and as our main result, that the so stated optimal array-aware matching problem is NP-complete. As an additional result, we propose a heuristic algorithm capable of performing array-aware matching in polynomial time. The proposed algorithm can be proficiently used by model translator developers in the implementation of efficient tools for large-scale system simulation.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"49 1","pages":"1 - 25"},"PeriodicalIF":2.7,"publicationDate":"2022-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42067557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Algorithm 1031: MQSI—Monotone Quintic Spline Interpolation 算法1031:MQSI——单调五次样条插值

IF 2.7 1区数学

ACM Transactions on Mathematical Software Pub Date : 2022-11-01 DOI: 10.1145/3570157

T. Lux, L.T. Watson, Tyler H. Chang, W. Thacker

引用次数: 2

Algorithm 1032: Bi-cubic Splines for Polyhedral Control Nets 算法1032:多面体控制网的双三次样条

IF 2.7 1区数学

ACM Transactions on Mathematical Software Pub Date : 2022-10-31 DOI: 10.1145/3570158

J. Peters, K. Lo, K. Karčiauskas

{"title":"Algorithm 1032: Bi-cubic Splines for Polyhedral Control Nets","authors":"J. Peters, K. Lo, K. Karčiauskas","doi":"10.1145/3570158","DOIUrl":"https://doi.org/10.1145/3570158","url":null,"abstract":"For control nets outlining a large class of topological polyhedra, not just tensor-product grids, bi-cubic polyhedral splines form a piecewise polynomial, first-order differentiable space that associates one function with each vertex. Akin to tensor-product splines, the resulting smooth surface approximates the polyhedron. Admissible polyhedral control nets consist of quadrilateral faces in a grid-like layout, star-configuration where n ≠ 4 quadrilateral faces join around an interior vertex, n-gon configurations, where 2n quadrilaterals surround an n-gon, polar configurations where a cone of n triangles meeting at a vertex is surrounded by a ribbon of n quadrilaterals, and three types of T-junctions where two quad-strips merge into one. The bi-cubic pieces of a polyhedral spline have matching derivatives along their break lines, possibly after a known change of variables. The pieces are represented in Bernstein-Bézier form with coefficients depending linearly on the polyhedral control net, so that evaluation, differentiation, integration, moments, and so on, are no more costly than for standard tensor-product splines. Bi-cubic polyhedral splines can be used both to model geometry and for computing functions on the geometry. Although polyhedral splines do not offer nested refinement by refinement of the control net, polyhedral splines support engineering analysis of curved smooth objects. Coarse nets typically suffice since the splines efficiently model curved features. Algorithm 1032 is a C++ library with input-output example pairs and an IGES output choice.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"49 1","pages":"1 - 12"},"PeriodicalIF":2.7,"publicationDate":"2022-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41729465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2