arXiv - CS - Mathematical Software最新文献

筛选
英文 中文
Semidefinite Programming by Projective Cutting Planes 投影切割平面的半定规划
arXiv - CS - Mathematical Software Pub Date : 2023-11-15 DOI: arxiv-2311.09365
Daniel Porumbel
{"title":"Semidefinite Programming by Projective Cutting Planes","authors":"Daniel Porumbel","doi":"arxiv-2311.09365","DOIUrl":"https://doi.org/arxiv-2311.09365","url":null,"abstract":"Seeking tighter relaxations of combinatorial optimization problems,\u0000semidefinite programming is a generalization of linear programming that offers\u0000better bounds and is still polynomially solvable. Yet, in practice, a\u0000semidefinite program is still significantly harder to solve than a similar-size\u0000Linear Program (LP). It is well-known that a semidefinite program can be\u0000written as an LP with infinitely-many cuts that could be solved by repeated\u0000separation in a Cutting-Planes scheme; this approach is likely to end up in\u0000failure. We proposed in [Projective Cutting-Planes, Daniel Porumbel, Siam\u0000Journal on Optimization, 2020] the Projective Cutting-Planes method that\u0000upgrades t he well-known separation sub-problem to the projection sub-problem:\u0000given a feasible $y$ inside a polytope $P$ and a direction $d$, find the\u0000maximum $t^*$ so that $y+t^*din P$. Using this new sub-problem, one can\u0000generate a sequence of both inner and outer solutions that converge to the\u0000optimum over $P$. This paper shows that the projection sub-problem can be\u0000solved very efficiently in a semidefinite programming context, enabling the\u0000resulting method to compete very well with state-of-the-art semidefinite\u0000optimization software (refined over decades). Results suggest it may the\u0000fastest method for matrix sizes larger than $2000times 2000$.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"15 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Case Study in Analytic Protocol Analysis in ACL2 ACL2中分析协议分析的案例研究
arXiv - CS - Mathematical Software Pub Date : 2023-11-15 DOI: arxiv-2311.08855
Max von HippelNortheastern University, Panagiotis ManoliosNortheastern University, Kenneth L. McMillanUniversity of Texas at Austin, Cristina Nita-RotaruNortheastern University, Lenore ZuckUniversity of Illinois Chicago
{"title":"A Case Study in Analytic Protocol Analysis in ACL2","authors":"Max von HippelNortheastern University, Panagiotis ManoliosNortheastern University, Kenneth L. McMillanUniversity of Texas at Austin, Cristina Nita-RotaruNortheastern University, Lenore ZuckUniversity of Illinois Chicago","doi":"arxiv-2311.08855","DOIUrl":"https://doi.org/arxiv-2311.08855","url":null,"abstract":"When verifying computer systems we sometimes want to study their asymptotic\u0000behaviors, i.e., how they behave in the long run. In such cases, we need real\u0000analysis, the area of mathematics that deals with limits and the foundations of\u0000calculus. In a prior work, we used real analysis in ACL2s to study the\u0000asymptotic behavior of the RTO computation, commonly used in congestion control\u0000algorithms across the Internet. One key component in our RTO computation\u0000analysis was proving in ACL2s that for all alpha in [0, 1), the limit as n\u0000approaches infinity of alpha raised to n is zero. Whereas the most obvious\u0000proof strategy involves the logarithm, whose codomain includes irrationals, by\u0000default ACL2 only supports rationals, which forced us to take a non-standard\u0000approach. In this paper, we explore different approaches to proving the above\u0000result in ACL2(r) and ACL2s, from the perspective of a relatively new user to\u0000each. We also contextualize the theorem by showing how it allowed us to prove\u0000important asymptotic properties of the RTO computation. Finally, we discuss\u0000tradeoffs between the various proof strategies and directions for future\u0000research.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"17 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cache Optimization and Performance Modeling of Batched, Small, and Rectangular Matrix Multiplication on Intel, AMD, and Fujitsu Processors 在英特尔、AMD和富士通处理器上批量、小矩阵和矩形矩阵乘法的缓存优化和性能建模
arXiv - CS - Mathematical Software Pub Date : 2023-11-11 DOI: arxiv-2311.07602
Sameer Deshmukh, Rio Yokota, George Bosilca
{"title":"Cache Optimization and Performance Modeling of Batched, Small, and Rectangular Matrix Multiplication on Intel, AMD, and Fujitsu Processors","authors":"Sameer Deshmukh, Rio Yokota, George Bosilca","doi":"arxiv-2311.07602","DOIUrl":"https://doi.org/arxiv-2311.07602","url":null,"abstract":"Factorization and multiplication of dense matrices and tensors are critical,\u0000yet extremely expensive pieces of the scientific toolbox. Careful use of low\u0000rank approximation can drastically reduce the computation and memory\u0000requirements of these operations. In addition to a lower arithmetic complexity,\u0000such methods can, by their structure, be designed to efficiently exploit modern\u0000hardware architectures. The majority of existing work relies on batched BLAS\u0000libraries to handle the computation of many small dense matrices. We show that\u0000through careful analysis of the cache utilization, register accumulation using\u0000SIMD registers and a redesign of the implementation, one can achieve\u0000significantly higher throughput for these types of batched low-rank matrices\u0000across a large range of block and batch sizes. We test our algorithm on 3 CPUs\u0000using diverse ISAs -- the Fujitsu A64FX using ARM SVE, the Intel Xeon 6148\u0000using AVX-512 and AMD EPYC 7502 using AVX-2, and show that our new batching\u0000methodology is able to obtain more than twice the throughput of vendor\u0000optimized libraries for all CPU architectures and problem sizes.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"10 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Efficient Framework for Global Non-Convex Polynomial Optimization with Nonlinear Polynomial Constraints 具有非线性多项式约束的全局非凸多项式优化的有效框架
arXiv - CS - Mathematical Software Pub Date : 2023-11-03 DOI: arxiv-2311.02037
Mitchell Tong Harris, Pierre-David Letourneau, Dalton Jones, M. Harper Langston
{"title":"An Efficient Framework for Global Non-Convex Polynomial Optimization with Nonlinear Polynomial Constraints","authors":"Mitchell Tong Harris, Pierre-David Letourneau, Dalton Jones, M. Harper Langston","doi":"arxiv-2311.02037","DOIUrl":"https://doi.org/arxiv-2311.02037","url":null,"abstract":"We present an efficient framework for solving constrained global non-convex\u0000polynomial optimization problems. We prove the existence of an equivalent\u0000nonlinear reformulation of such problems that possesses essentially no spurious\u0000local minima. We show through numerical experiments that polynomial scaling in\u0000dimension and degree is achievable for computing the optimal value and location\u0000of previously intractable global constrained polynomial optimization problems\u0000in high dimension.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"14 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
$O(N)$ distributed direct factorization of structured dense matrices using runtime systems 基于运行时系统的结构化密集矩阵的O(N)分布直接分解
arXiv - CS - Mathematical Software Pub Date : 2023-11-02 DOI: arxiv-2311.00921
Sameer Deshmukh, Qinxiang Ma, Rio Yokota, George Bosilca
{"title":"$O(N)$ distributed direct factorization of structured dense matrices using runtime systems","authors":"Sameer Deshmukh, Qinxiang Ma, Rio Yokota, George Bosilca","doi":"arxiv-2311.00921","DOIUrl":"https://doi.org/arxiv-2311.00921","url":null,"abstract":"Structured dense matrices result from boundary integral problems in\u0000electrostatics and geostatistics, and also Schur complements in sparse\u0000preconditioners such as multi-frontal methods. Exploiting the structure of such\u0000matrices can reduce the time for dense direct factorization from $O(N^3)$ to\u0000$O(N)$. The Hierarchically Semi-Separable (HSS) matrix is one such low rank\u0000matrix format that can be factorized using a Cholesky-like algorithm called ULV\u0000factorization. The HSS-ULV algorithm is highly parallel because it removes the\u0000dependency on trailing sub-matrices at each HSS level. However, a key merge\u0000step that links two successive HSS levels remains a challenge for efficient\u0000parallelization. In this paper, we use an asynchronous runtime system PaRSEC\u0000with the HSS-ULV algorithm. We compare our work with STRUMPACK and LORAPO, both\u0000state-of-the-art implementations of dense direct low rank factorization, and\u0000achieve up to 2x better factorization time for matrices arising from a diverse\u0000set of applications on up to 128 nodes of Fugaku for similar or better accuracy\u0000for all the problems that we survey.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"13 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Performance Optimization of Deep Learning Sparse Matrix Kernels on Intel Max Series GPU 深度学习稀疏矩阵核在Intel Max系列GPU上的性能优化
arXiv - CS - Mathematical Software Pub Date : 2023-11-01 DOI: arxiv-2311.00368
Mohammad Zubair, Christoph Bauinger
{"title":"Performance Optimization of Deep Learning Sparse Matrix Kernels on Intel Max Series GPU","authors":"Mohammad Zubair, Christoph Bauinger","doi":"arxiv-2311.00368","DOIUrl":"https://doi.org/arxiv-2311.00368","url":null,"abstract":"In this paper, we focus on three sparse matrix operations that are relevant\u0000for machine learning applications, namely, the sparse-dense matrix\u0000multiplication (SPMM), the sampled dense-dense matrix multiplication (SDDMM),\u0000and the composition of the SDDMM with SPMM, also termed as FusedMM. We develop\u0000optimized implementations for SPMM, SDDMM, and FusedMM operations utilizing\u0000Intel oneAPI's Explicit SIMD (ESIMD) SYCL extension API. In contrast to CUDA or\u0000SYCL, the ESIMD API enables the writing of explicitly vectorized kernel code.\u0000Sparse matrix algorithms implemented with the ESIMD API achieved performance\u0000close to the peak of the targeted Intel Data Center GPU. We compare our\u0000performance results to Intel's oneMKL library on Intel GPUs and to a recent\u0000CUDA implementation for the sparse matrix operations on NVIDIA's V100 GPU and\u0000demonstrate that our implementations for sparse matrix operations outperform\u0000either.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"12 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NoMoPy: Noise Modeling in Python NoMoPy: Python中的噪声建模
arXiv - CS - Mathematical Software Pub Date : 2023-10-31 DOI: arxiv-2311.00084
Dylan Albrecht, N. Tobias Jacobson
{"title":"NoMoPy: Noise Modeling in Python","authors":"Dylan Albrecht, N. Tobias Jacobson","doi":"arxiv-2311.00084","DOIUrl":"https://doi.org/arxiv-2311.00084","url":null,"abstract":"NoMoPy is a code for fitting, analyzing, and generating noise modeled as a\u0000hidden Markov model (HMM) or, more generally, factorial hidden Markov model\u0000(FHMM). This code, written in Python, implements approximate and exact\u0000expectation maximization (EM) algorithms for performing the parameter\u0000estimation process, model selection procedures via cross-validation, and\u0000parameter confidence region estimation. Here, we describe in detail the\u0000functionality implemented in NoMoPy and provide examples of its use and\u0000performance on example problems.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"16 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Factor Fitting, Rank Allocation, and Partitioning in Multilevel Low Rank Matrices 多层次低秩矩阵的因子拟合、秩分配与划分
arXiv - CS - Mathematical Software Pub Date : 2023-10-30 DOI: arxiv-2310.19214
Tetiana Parshakova, Trevor Hastie, Eric Darve, Stephen Boyd
{"title":"Factor Fitting, Rank Allocation, and Partitioning in Multilevel Low Rank Matrices","authors":"Tetiana Parshakova, Trevor Hastie, Eric Darve, Stephen Boyd","doi":"arxiv-2310.19214","DOIUrl":"https://doi.org/arxiv-2310.19214","url":null,"abstract":"We consider multilevel low rank (MLR) matrices, defined as a row and column\u0000permutation of a sum of matrices, each one a block diagonal refinement of the\u0000previous one, with all blocks low rank given in factored form. MLR matrices\u0000extend low rank matrices but share many of their properties, such as the total\u0000storage required and complexity of matrix-vector multiplication. We address\u0000three problems that arise in fitting a given matrix by an MLR matrix in the\u0000Frobenius norm. The first problem is factor fitting, where we adjust the\u0000factors of the MLR matrix. The second is rank allocation, where we choose the\u0000ranks of the blocks in each level, subject to the total rank having a given\u0000value, which preserves the total storage needed for the MLR matrix. The final\u0000problem is to choose the hierarchical partition of rows and columns, along with\u0000the ranks and factors. This paper is accompanied by an open source package that\u0000implements the proposed methods.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"18 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Survey of Methods for Estimating Hurst Exponent of Time Sequence 时间序列赫斯特指数估计方法综述
arXiv - CS - Mathematical Software Pub Date : 2023-10-29 DOI: arxiv-2310.19051
Hong-Yan Zhang, Zhi-Qiang Feng, Si-Yu Feng, Yu Zhou
{"title":"A Survey of Methods for Estimating Hurst Exponent of Time Sequence","authors":"Hong-Yan Zhang, Zhi-Qiang Feng, Si-Yu Feng, Yu Zhou","doi":"arxiv-2310.19051","DOIUrl":"https://doi.org/arxiv-2310.19051","url":null,"abstract":"The Hurst exponent is a significant indicator for characterizing the\u0000self-similarity and long-term memory properties of time sequences. It has wide\u0000applications in physics, technologies, engineering, mathematics, statistics,\u0000economics, psychology and so on. Currently, available methods for estimating\u0000the Hurst exponent of time sequences can be divided into different categories:\u0000time-domain methods and spectrum-domain methods based on the representation of\u0000time sequence, linear regression methods and Bayesian methods based on\u0000parameter estimation methods. Although various methods are discussed in\u0000literature, there are still some deficiencies: the descriptions of the\u0000estimation algorithms are just mathematics-oriented and the pseudo-codes are\u0000missing; the effectiveness and accuracy of the estimation algorithms are not\u0000clear; the classification of estimation methods is not considered and there is\u0000a lack of guidance for selecting the estimation methods. In this work, the\u0000emphasis is put on thirteen dominant methods for estimating the Hurst exponent.\u0000For the purpose of decreasing the difficulty of implementing the estimation\u0000methods with computer programs, the mathematical principles are discussed\u0000briefly and the pseudo-codes of algorithms are presented with necessary\u0000details. It is expected that the survey could help the researchers to select,\u0000implement and apply the estimation algorithms of interest in practical\u0000situations in an easy way.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"16 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tackling the Matrix Multiplication Micro-kernel Generation with Exo 用Exo处理矩阵乘法微核生成
arXiv - CS - Mathematical Software Pub Date : 2023-10-26 DOI: arxiv-2310.17408
Adrián Castelló, Julian Bellavita, Grace Dinh, Yuka Ikarashi, Héctor Martínez
{"title":"Tackling the Matrix Multiplication Micro-kernel Generation with Exo","authors":"Adrián Castelló, Julian Bellavita, Grace Dinh, Yuka Ikarashi, Héctor Martínez","doi":"arxiv-2310.17408","DOIUrl":"https://doi.org/arxiv-2310.17408","url":null,"abstract":"The optimization of the matrix multiplication (or GEMM) has been a need\u0000during the last decades. This operation is considered the flagship of current\u0000linear algebra libraries such as BLIS, OpenBLAS, or Intel OneAPI because of its\u0000widespread use in a large variety of scientific applications. The GEMM is\u0000usually implemented following the GotoBLAS philosophy, which tiles the GEMM\u0000operands and uses a series of nested loops for performance improvement. These\u0000approaches extract the maximum computational power of the architectures through\u0000small pieces of hardware-oriented, high-performance code called micro-kernel.\u0000However, this approach forces developers to generate, with a non-negligible\u0000effort, a dedicated micro-kernel for each new hardware. In this work, we present a step-by-step procedure for generating\u0000micro-kernels with the Exo compiler that performs close to (or even better\u0000than) manually developed microkernels written with intrinsic functions or\u0000assembly language. Our solution also improves the portability of the generated\u0000code, since a hardware target is fully specified by a concise library-based\u0000description of its instructions.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"11 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信