ACM Transactions on Mathematical Software最新文献

筛选
英文 中文
Cache Optimization and Performance Modeling of Batched, Small, and Rectangular Matrix Multiplication on Intel, AMD, and Fujitsu Processors 在英特尔、AMD和富士通处理器上批量、小矩阵和矩形矩阵乘法的缓存优化和性能建模
1区 数学
ACM Transactions on Mathematical Software Pub Date : 2023-09-19 DOI: 10.1145/3595178
Sameer Deshmukh, Rio Yokota, George Bosilca
{"title":"Cache Optimization and Performance Modeling of Batched, Small, and Rectangular Matrix Multiplication on Intel, AMD, and Fujitsu Processors","authors":"Sameer Deshmukh, Rio Yokota, George Bosilca","doi":"10.1145/3595178","DOIUrl":"https://doi.org/10.1145/3595178","url":null,"abstract":"Factorization and multiplication of dense matrices and tensors are critical, yet extremely expensive pieces of the scientific toolbox. Careful use of low rank approximation can drastically reduce the computation and memory requirements of these operations. In addition to a lower arithmetic complexity, such methods can, by their structure, be designed to efficiently exploit modern hardware architectures. The majority of existing work relies on batched BLAS libraries to handle the computation of many small dense matrices. We show that through careful analysis of the cache utilization, register accumulation using SIMD registers and a redesign of the implementation, one can achieve significantly higher throughput for these types of batched low-rank matrices across a large range of block and batch sizes. We test our algorithm on three CPUs using diverse ISAs – the Fujitsu A64FX using ARM SVE, the Intel Xeon 6148 using AVX-512, and AMD EPYC 7502 using AVX-2, and show that our new batching methodology is able to obtain more than twice the throughput of vendor optimized libraries for all CPU architectures and problem sizes.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135059806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
New subspace method for unconstrained derivative-free optimization 无约束无导数优化的新子空间方法
IF 2.7 1区 数学
ACM Transactions on Mathematical Software Pub Date : 2023-09-02 DOI: 10.1145/3618297
M. Kimiaei, A. Neumaier, Parvaneh Faramarzi
{"title":"New subspace method for unconstrained derivative-free optimization","authors":"M. Kimiaei, A. Neumaier, Parvaneh Faramarzi","doi":"10.1145/3618297","DOIUrl":"https://doi.org/10.1145/3618297","url":null,"abstract":"This paper defines an efficient subspace method, called SSDFO, for unconstrained derivative-free optimization problems where the gradients of the objective function are Lipschitz continuous but only exact function values are available. SSDFO employs line searches along directions constructed on the basis of quadratic models. These approximate the objective function in a subspace spanned by some previous search directions. A worst case complexity bound on the number of iterations and function evaluations is derived for a basic algorithm using this technique. Numerical results for a practical variant with additional heuristic features show that, on the unconstrained CUTEst test problems, SSDFO has superior performance compared to the best solvers from the literature.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":" ","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48424630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
IEEE-754 precision-p base-β arithmetic implemented in binary 用二进制实现的IEEE-754精度-p基-β算术
IF 2.7 1区 数学
ACM Transactions on Mathematical Software Pub Date : 2023-08-21 DOI: 10.1145/3596218
S. Rump
{"title":"IEEE-754 precision-p base-β arithmetic implemented in binary","authors":"S. Rump","doi":"10.1145/3596218","DOIUrl":"https://doi.org/10.1145/3596218","url":null,"abstract":"We show how an IEEE-754 conformant precision-p base-β arithmetic can be implemented based on some binary floating-point and/or integer arithmetic. This includes the four basic operations and square root subject to the five IEEE-754 rounding modes, namely the nearest roundings with roundTiesToEven and roundTiesToAway, the directed roundings downwards and upwards, as well as rounding towards zero. Exceptional values like ∞ of NaN are covered according to the IEEE-754 arithmetic standard. The results of the precision-p base-β operations are computed using some underlying precision-q binary arithmetic. We distinguish two cases. When using a precision-q binary integer arithmetic, the base-β precision p is limited for all operations by β2p ≤ 2q, whereas using a precision-q binary floating-point arithmetic imposes stronger limits on the base-β precision, namely β2p ≤ 2q for addition and multiplication, β2p ≤ 2q − 1 for division and β2p ≤ 2q − 3 for the square root. Those limitations cannot be improved. The algorithms are implemented in a Matlab/Octave flbeta-toolbox with the choice of using uint64 or binary64 as underlying arithmetic. The former allows larger precisions, the latter is advantageous for the square root, whereas computing times are similar. The flbeta-toolbox offers precision-p base-β scalar, vector and matrix operations including sparse matrices as well as corresponding interval operations. The base β can be chosen in the range β ∈ [2, 64]. The flbeta-toolbox will be part of Version 13 of INTLAB [18], the Matlab/Octave toolbox for reliable computing.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"1 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41531528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Algorithm xxxx: KCC: A MATLAB Package for K-means-based Consensus Clustering 算法xxxx: KCC:基于k均值的共识聚类的MATLAB包
IF 2.7 1区 数学
ACM Transactions on Mathematical Software Pub Date : 2023-08-15 DOI: 10.1145/3616011
Hao Lin, Hongfu Liu, Junjie Wu, Hong Li, Stephan Günnemann
{"title":"Algorithm xxxx: KCC: A MATLAB Package for K-means-based Consensus Clustering","authors":"Hao Lin, Hongfu Liu, Junjie Wu, Hong Li, Stephan Günnemann","doi":"10.1145/3616011","DOIUrl":"https://doi.org/10.1145/3616011","url":null,"abstract":"Consensus clustering is gaining increasing attention for its high quality and robustness. In particular, K-means-based Consensus Clustering (KCC) converts the usual computationally expensive problem to a classic K-means clustering with generalized utility functions, bringing potentials for large-scale data clustering on different types of data. Despite KCC’s applicability and generalizability, implementing this method such as representing the binary data set in the K-means heuristic is challenging, and has seldom been discussed in prior work. To fill this gap, we present a MATLAB package, KCC, that completely implements the KCC framework, and utilizes a sparse representation technique to achieve a low space complexity. Compared to alternative consensus clustering packages, the KCC package is of high flexibility, efficiency, and effectiveness. Extensive numerical experiments are also included to show its usability on real-world data sets.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":" ","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44656214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Sparse Approximate Multifrontal Factorization with Composite Compression Methods 复合压缩方法的稀疏近似多前沿因子分解
IF 2.7 1区 数学
ACM Transactions on Mathematical Software Pub Date : 2023-08-01 DOI: 10.1145/3611662
Lisa Claus, P. Ghysels, Yang Liu, T. Nhan, R. Thirumalaisamy, A. Bhalla, Sherry Li
{"title":"Sparse Approximate Multifrontal Factorization with Composite Compression Methods","authors":"Lisa Claus, P. Ghysels, Yang Liu, T. Nhan, R. Thirumalaisamy, A. Bhalla, Sherry Li","doi":"10.1145/3611662","DOIUrl":"https://doi.org/10.1145/3611662","url":null,"abstract":"This article presents a fast and approximate multifrontal solver for large sparse linear systems. In a recent work by Liu et al., we showed the efficiency of a multifrontal solver leveraging the butterfly algorithm and its hierarchical matrix extension, HODBF (hierarchical off-diagonal butterfly) compression to compress large frontal matrices. The resulting multifrontal solver can attain quasi-linear computation and memory complexity when applied to sparse linear systems arising from spatial discretization of high-frequency wave equations. To further reduce the overall number of operations and especially the factorization memory usage to scale to larger problem sizes, in this article we develop a composite multifrontal solver that employs the HODBF format for large-sized fronts, a reduced-memory version of the nonhierarchical block low-rank format for medium-sized fronts, and a lossy compression format for small-sized fronts. This allows us to solve sparse linear systems of dimension up to 2.7 × larger than before and leads to a memory consumption that is reduced by 70% while ensuring the same execution time. The code is made publicly available in GitHub.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"49 1","pages":"1 - 28"},"PeriodicalIF":2.7,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45941947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
emgr – EMpirical GRamian Framework Version 5.99 emgr -经验语法框架版本5.99
IF 2.7 1区 数学
ACM Transactions on Mathematical Software Pub Date : 2023-07-20 DOI: https://dl.acm.org/doi/10.1145/3609860
Christian Himpe
{"title":"emgr – EMpirical GRamian Framework Version 5.99","authors":"Christian Himpe","doi":"https://dl.acm.org/doi/10.1145/3609860","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3609860","url":null,"abstract":"<p>Version 5.99 of the empirical Gramian framework – <monospace>emgr</monospace> – completes a development cycle which focused on parametric model order reduction of gas network models while preserving compatibility to the previous development for the application of combined state and parameter reduction for neuroscience network models. Secondarily, new features concerning empirical Gramian types, perturbation design, and trajectory post-processing, as well as a Python version in addition to the default MATLAB / Octave implementation, have been added. This work summarizes these changes, particularly since <monospace>emgr</monospace> version 5.4, see <span>Himpe</span>, 2018 [Algorithms 11(7): 91], and gives recent as well as future applications, such as parameter identification in systems biology, based on the current feature set.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"2014 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138537788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IFISS3D: A computational laboratory for investigating finite element approximation in three dimensions IFISS3D:一个用于研究三维有限元近似的计算实验室
IF 2.7 1区 数学
ACM Transactions on Mathematical Software Pub Date : 2023-06-20 DOI: https://dl.acm.org/doi/10.1145/3604934
Georgios Papanikos, Catherine E. Powell, David J. Silvester
{"title":"IFISS3D: A computational laboratory for investigating finite element approximation in three dimensions","authors":"Georgios Papanikos, Catherine E. Powell, David J. Silvester","doi":"https://dl.acm.org/doi/10.1145/3604934","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3604934","url":null,"abstract":"<p>IFISS is an established MATLAB finite element software package for studying strategies for solving partial differential equations (PDEs). IFISS3D is a new add-on toolbox that extends IFISS capabilities for elliptic PDEs from two to three space dimensions. The open-source MATLAB framework provides a computational laboratory for experimentation and exploration of finite element approximation and error estimation, as well as iterative solvers. The package is designed to be useful as a teaching tool for instructors and students who want to learn about state-of-the-art finite element methodology. It will also be useful for researchers as a source of reproducible test matrices of arbitrarily large dimension.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"30 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138537820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Approximating inverse cumulative distribution functions to produce approximate random variables 近似逆累积分布函数以产生近似随机变量
IF 2.7 1区 数学
ACM Transactions on Mathematical Software Pub Date : 2023-06-17 DOI: https://dl.acm.org/doi/10.1145/3604935
Michael Giles, Oliver Sheridan-Methven
{"title":"Approximating inverse cumulative distribution functions to produce approximate random variables","authors":"Michael Giles, Oliver Sheridan-Methven","doi":"https://dl.acm.org/doi/10.1145/3604935","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3604935","url":null,"abstract":"<p>For random variables produced through the inverse transform method, approximate random variables are introduced, which are produced using approximations to a distribution’s inverse cumulative distribution function. These approximations are designed to be computationally inexpensive, and much cheaper than library functions which are exact to within machine precision, and thus highly suitable for use in Monte Carlo simulations. The approximation errors they introduce can then be eliminated through use of the multilevel Monte Carlo method. Two approximations are presented for the Gaussian distribution: a piecewise constant on equally spaced intervals, and a piecewise linear using geometrically decaying intervals. The errors of the approximations are bounded and the convergence demonstrated, and the computational savings measured for C and C++ implementations. Implementations tailored for Intel and Arm hardware are inspected, alongside hardware agnostic implementations built using OpenMP. The savings are incorporated into a nested multilevel Monte Carlo framework with the Euler-Maruyama scheme to exploit the speed ups without losing accuracy, offering speed ups by a factor of 5–7. These ideas are empirically extended to the Milstein scheme, and the non-central <i>χ</i><sup>2</sup> distribution for the Cox-Ingersoll-Ross process, offering speed ups of a factor of 250 or more.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"86 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138537789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CPFloat: A C Library for Simulating Low-precision Arithmetic 一个模拟低精度算术的C语言库
IF 2.7 1区 数学
ACM Transactions on Mathematical Software Pub Date : 2023-06-17 DOI: https://dl.acm.org/doi/10.1145/3585515
Massimiliano Fasi, Mantas Mikaitis
{"title":"CPFloat: A C Library for Simulating Low-precision Arithmetic","authors":"Massimiliano Fasi, Mantas Mikaitis","doi":"https://dl.acm.org/doi/10.1145/3585515","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3585515","url":null,"abstract":"<p>One can simulate low-precision floating-point arithmetic via software by executing each arithmetic operation in hardware and then rounding the result to the desired number of significant bits. For IEEE-compliant formats, rounding requires only standard mathematical library functions, but handling subnormals, underflow, and overflow demands special attention, and numerical errors can cause mathematically correct formulae to behave incorrectly in finite arithmetic. Moreover, the ensuing implementations are not necessarily efficient, as the library functions these techniques build upon are typically designed to handle a broad range of cases and may not be optimized for the specific needs of rounding algorithms. CPFloat is a C library for simulating low-precision arithmetics. It offers efficient routines for rounding, performing mathematical computations, and querying properties of the simulated low-precision format. The software exploits the bit-level floating-point representation of the format in which the numbers are stored and replaces costly library calls with low-level bit manipulations and integer arithmetic. In numerical experiments, the new techniques bring a considerable speedup (typically one order of magnitude or more) over existing alternatives in C, C++, and MATLAB. To our knowledge, CPFloat is currently the most efficient and complete library for experimenting with custom low-precision floating-point arithmetic.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"69 ","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Task-based Parallel Programming for Scalable Matrix Product Algorithms 基于任务的可扩展矩阵积算法并行编程
IF 2.7 1区 数学
ACM Transactions on Mathematical Software Pub Date : 2023-06-15 DOI: https://dl.acm.org/doi/10.1145/3583560
Emmanuel Agullo, Alfredo Buttari, Abdou Guermouche, Julien Herrmann, Antoine Jego
{"title":"Task-based Parallel Programming for Scalable Matrix Product Algorithms","authors":"Emmanuel Agullo, Alfredo Buttari, Abdou Guermouche, Julien Herrmann, Antoine Jego","doi":"https://dl.acm.org/doi/10.1145/3583560","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3583560","url":null,"abstract":"<p>Task-based programming models have succeeded in gaining the interest of the high-performance mathematical software community because they relieve part of the burden of developing and implementing distributed-memory parallel algorithms in an efficient and portable way.In increasingly larger, more heterogeneous clusters of computers, these models appear as a way to maintain and enhance more complex algorithms. However, task-based programming models lack the flexibility and the features that are necessary to express in an elegant and compact way scalable algorithms that rely on advanced communication patterns. We show that the Sequential Task Flow paradigm can be extended to write compact yet efficient and scalable routines for linear algebra computations. Although, this work focuses on dense General Matrix Multiplication, the proposed features enable the implementation of more complex algorithms. We describe the implementation of these features and of the resulting GEMM operation. Finally, we present an experimental analysis on two homogeneous supercomputers showing that our approach is competitive up to 32,768 CPU cores with state-of-the-art libraries and may outperform them for some problem dimensions. Although our code can use GPUs straightforwardly, we do not deal with this case because it implies other issues which are out of the scope of this work.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"63 ","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信