arXiv - CS - Mathematical Software最新文献_第7页

Semidefinite Programming by Projective Cutting Planes 投影切割平面的半定规划

arXiv - CS - Mathematical Software Pub Date : 2023-11-15 DOI: arxiv-2311.09365

Daniel Porumbel

{"title":"Semidefinite Programming by Projective Cutting Planes","authors":"Daniel Porumbel","doi":"arxiv-2311.09365","DOIUrl":"https://doi.org/arxiv-2311.09365","url":null,"abstract":"Seeking tighter relaxations of combinatorial optimization problems,\u0000semidefinite programming is a generalization of linear programming that offers\u0000better bounds and is still polynomially solvable. Yet, in practice, a\u0000semidefinite program is still significantly harder to solve than a similar-size\u0000Linear Program (LP). It is well-known that a semidefinite program can be\u0000written as an LP with infinitely-many cuts that could be solved by repeated\u0000separation in a Cutting-Planes scheme; this approach is likely to end up in\u0000failure. We proposed in [Projective Cutting-Planes, Daniel Porumbel, Siam\u0000Journal on Optimization, 2020] the Projective Cutting-Planes method that\u0000upgrades t he well-known separation sub-problem to the projection sub-problem:\u0000given a feasible $y$ inside a polytope $P$ and a direction $d$, find the\u0000maximum $t^*$ so that $y+t^*din P$. Using this new sub-problem, one can\u0000generate a sequence of both inner and outer solutions that converge to the\u0000optimum over $P$. This paper shows that the projection sub-problem can be\u0000solved very efficiently in a semidefinite programming context, enabling the\u0000resulting method to compete very well with state-of-the-art semidefinite\u0000optimization software (refined over decades). Results suggest it may the\u0000fastest method for matrix sizes larger than $2000times 2000$.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"15 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Case Study in Analytic Protocol Analysis in ACL2 ACL2中分析协议分析的案例研究

arXiv - CS - Mathematical Software Pub Date : 2023-11-15 DOI: arxiv-2311.08855

Max von HippelNortheastern University, Panagiotis ManoliosNortheastern University, Kenneth L. McMillanUniversity of Texas at Austin, Cristina Nita-RotaruNortheastern University, Lenore ZuckUniversity of Illinois Chicago

{"title":"A Case Study in Analytic Protocol Analysis in ACL2","authors":"Max von HippelNortheastern University, Panagiotis ManoliosNortheastern University, Kenneth L. McMillanUniversity of Texas at Austin, Cristina Nita-RotaruNortheastern University, Lenore ZuckUniversity of Illinois Chicago","doi":"arxiv-2311.08855","DOIUrl":"https://doi.org/arxiv-2311.08855","url":null,"abstract":"When verifying computer systems we sometimes want to study their asymptotic\u0000behaviors, i.e., how they behave in the long run. In such cases, we need real\u0000analysis, the area of mathematics that deals with limits and the foundations of\u0000calculus. In a prior work, we used real analysis in ACL2s to study the\u0000asymptotic behavior of the RTO computation, commonly used in congestion control\u0000algorithms across the Internet. One key component in our RTO computation\u0000analysis was proving in ACL2s that for all alpha in [0, 1), the limit as n\u0000approaches infinity of alpha raised to n is zero. Whereas the most obvious\u0000proof strategy involves the logarithm, whose codomain includes irrationals, by\u0000default ACL2 only supports rationals, which forced us to take a non-standard\u0000approach. In this paper, we explore different approaches to proving the above\u0000result in ACL2(r) and ACL2s, from the perspective of a relatively new user to\u0000each. We also contextualize the theorem by showing how it allowed us to prove\u0000important asymptotic properties of the RTO computation. Finally, we discuss\u0000tradeoffs between the various proof strategies and directions for future\u0000research.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"17 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cache Optimization and Performance Modeling of Batched, Small, and Rectangular Matrix Multiplication on Intel, AMD, and Fujitsu Processors 在英特尔、AMD和富士通处理器上批量、小矩阵和矩形矩阵乘法的缓存优化和性能建模

arXiv - CS - Mathematical Software Pub Date : 2023-11-11 DOI: arxiv-2311.07602

Sameer Deshmukh, Rio Yokota, George Bosilca

{"title":"Cache Optimization and Performance Modeling of Batched, Small, and Rectangular Matrix Multiplication on Intel, AMD, and Fujitsu Processors","authors":"Sameer Deshmukh, Rio Yokota, George Bosilca","doi":"arxiv-2311.07602","DOIUrl":"https://doi.org/arxiv-2311.07602","url":null,"abstract":"Factorization and multiplication of dense matrices and tensors are critical,\u0000yet extremely expensive pieces of the scientific toolbox. Careful use of low\u0000rank approximation can drastically reduce the computation and memory\u0000requirements of these operations. In addition to a lower arithmetic complexity,\u0000such methods can, by their structure, be designed to efficiently exploit modern\u0000hardware architectures. The majority of existing work relies on batched BLAS\u0000libraries to handle the computation of many small dense matrices. We show that\u0000through careful analysis of the cache utilization, register accumulation using\u0000SIMD registers and a redesign of the implementation, one can achieve\u0000significantly higher throughput for these types of batched low-rank matrices\u0000across a large range of block and batch sizes. We test our algorithm on 3 CPUs\u0000using diverse ISAs -- the Fujitsu A64FX using ARM SVE, the Intel Xeon 6148\u0000using AVX-512 and AMD EPYC 7502 using AVX-2, and show that our new batching\u0000methodology is able to obtain more than twice the throughput of vendor\u0000optimized libraries for all CPU architectures and problem sizes.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"10 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Efficient Framework for Global Non-Convex Polynomial Optimization with Nonlinear Polynomial Constraints 具有非线性多项式约束的全局非凸多项式优化的有效框架

arXiv - CS - Mathematical Software Pub Date : 2023-11-03 DOI: arxiv-2311.02037

Mitchell Tong Harris, Pierre-David Letourneau, Dalton Jones, M. Harper Langston

引用次数: 0

$O(N)$ distributed direct factorization of structured dense matrices using runtime systems 基于运行时系统的结构化密集矩阵的O(N)分布直接分解

arXiv - CS - Mathematical Software Pub Date : 2023-11-02 DOI: arxiv-2311.00921

Sameer Deshmukh, Qinxiang Ma, Rio Yokota, George Bosilca

{"title":"$O(N)$ distributed direct factorization of structured dense matrices using runtime systems","authors":"Sameer Deshmukh, Qinxiang Ma, Rio Yokota, George Bosilca","doi":"arxiv-2311.00921","DOIUrl":"https://doi.org/arxiv-2311.00921","url":null,"abstract":"Structured dense matrices result from boundary integral problems in\u0000electrostatics and geostatistics, and also Schur complements in sparse\u0000preconditioners such as multi-frontal methods. Exploiting the structure of such\u0000matrices can reduce the time for dense direct factorization from $O(N^3)$ to\u0000$O(N)$. The Hierarchically Semi-Separable (HSS) matrix is one such low rank\u0000matrix format that can be factorized using a Cholesky-like algorithm called ULV\u0000factorization. The HSS-ULV algorithm is highly parallel because it removes the\u0000dependency on trailing sub-matrices at each HSS level. However, a key merge\u0000step that links two successive HSS levels remains a challenge for efficient\u0000parallelization. In this paper, we use an asynchronous runtime system PaRSEC\u0000with the HSS-ULV algorithm. We compare our work with STRUMPACK and LORAPO, both\u0000state-of-the-art implementations of dense direct low rank factorization, and\u0000achieve up to 2x better factorization time for matrices arising from a diverse\u0000set of applications on up to 128 nodes of Fugaku for similar or better accuracy\u0000for all the problems that we survey.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"13 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Performance Optimization of Deep Learning Sparse Matrix Kernels on Intel Max Series GPU 深度学习稀疏矩阵核在Intel Max系列GPU上的性能优化

arXiv - CS - Mathematical Software Pub Date : 2023-11-01 DOI: arxiv-2311.00368

Mohammad Zubair, Christoph Bauinger

引用次数: 0

NoMoPy: Noise Modeling in Python NoMoPy: Python中的噪声建模

arXiv - CS - Mathematical Software Pub Date : 2023-10-31 DOI: arxiv-2311.00084

Dylan Albrecht, N. Tobias Jacobson

引用次数: 0

Factor Fitting, Rank Allocation, and Partitioning in Multilevel Low Rank Matrices 多层次低秩矩阵的因子拟合、秩分配与划分

arXiv - CS - Mathematical Software Pub Date : 2023-10-30 DOI: arxiv-2310.19214

Tetiana Parshakova, Trevor Hastie, Eric Darve, Stephen Boyd

引用次数: 0

A Survey of Methods for Estimating Hurst Exponent of Time Sequence 时间序列赫斯特指数估计方法综述

arXiv - CS - Mathematical Software Pub Date : 2023-10-29 DOI: arxiv-2310.19051

Hong-Yan Zhang, Zhi-Qiang Feng, Si-Yu Feng, Yu Zhou

{"title":"A Survey of Methods for Estimating Hurst Exponent of Time Sequence","authors":"Hong-Yan Zhang, Zhi-Qiang Feng, Si-Yu Feng, Yu Zhou","doi":"arxiv-2310.19051","DOIUrl":"https://doi.org/arxiv-2310.19051","url":null,"abstract":"The Hurst exponent is a significant indicator for characterizing the\u0000self-similarity and long-term memory properties of time sequences. It has wide\u0000applications in physics, technologies, engineering, mathematics, statistics,\u0000economics, psychology and so on. Currently, available methods for estimating\u0000the Hurst exponent of time sequences can be divided into different categories:\u0000time-domain methods and spectrum-domain methods based on the representation of\u0000time sequence, linear regression methods and Bayesian methods based on\u0000parameter estimation methods. Although various methods are discussed in\u0000literature, there are still some deficiencies: the descriptions of the\u0000estimation algorithms are just mathematics-oriented and the pseudo-codes are\u0000missing; the effectiveness and accuracy of the estimation algorithms are not\u0000clear; the classification of estimation methods is not considered and there is\u0000a lack of guidance for selecting the estimation methods. In this work, the\u0000emphasis is put on thirteen dominant methods for estimating the Hurst exponent.\u0000For the purpose of decreasing the difficulty of implementing the estimation\u0000methods with computer programs, the mathematical principles are discussed\u0000briefly and the pseudo-codes of algorithms are presented with necessary\u0000details. It is expected that the survey could help the researchers to select,\u0000implement and apply the estimation algorithms of interest in practical\u0000situations in an easy way.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"16 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Tackling the Matrix Multiplication Micro-kernel Generation with Exo 用Exo处理矩阵乘法微核生成

arXiv - CS - Mathematical Software Pub Date : 2023-10-26 DOI: arxiv-2310.17408

Adrián Castelló, Julian Bellavita, Grace Dinh, Yuka Ikarashi, Héctor Martínez

{"title":"Tackling the Matrix Multiplication Micro-kernel Generation with Exo","authors":"Adrián Castelló, Julian Bellavita, Grace Dinh, Yuka Ikarashi, Héctor Martínez","doi":"arxiv-2310.17408","DOIUrl":"https://doi.org/arxiv-2310.17408","url":null,"abstract":"The optimization of the matrix multiplication (or GEMM) has been a need\u0000during the last decades. This operation is considered the flagship of current\u0000linear algebra libraries such as BLIS, OpenBLAS, or Intel OneAPI because of its\u0000widespread use in a large variety of scientific applications. The GEMM is\u0000usually implemented following the GotoBLAS philosophy, which tiles the GEMM\u0000operands and uses a series of nested loops for performance improvement. These\u0000approaches extract the maximum computational power of the architectures through\u0000small pieces of hardware-oriented, high-performance code called micro-kernel.\u0000However, this approach forces developers to generate, with a non-negligible\u0000effort, a dedicated micro-kernel for each new hardware. In this work, we present a step-by-step procedure for generating\u0000micro-kernels with the Exo compiler that performs close to (or even better\u0000than) manually developed microkernels written with intrinsic functions or\u0000assembly language. Our solution also improves the portability of the generated\u0000code, since a hardware target is fully specified by a concise library-based\u0000description of its instructions.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"11 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0