{"title":"在共享内存 CPU 架构和 GPU 架构上解决大型缺阶线性最小二乘法问题","authors":"Mónica Chillarón, Gregorio Quintana-Ortí, Vicente Vidal, Per-Gunnar Martinsson","doi":"arxiv-2408.05238","DOIUrl":null,"url":null,"abstract":"Solving very large linear systems of equations is a key computational task in\nscience and technology. In many cases, the coefficient matrix of the linear\nsystem is rank-deficient, leading to systems that may be underdetermined,\ninconsistent, or both. In such cases, one generally seeks to compute the least\nsquares solution that minimizes the residual of the problem, which can be\nfurther defined as the solution with smallest norm in cases where the\ncoefficient matrix has a nontrivial nullspace. This work presents several new\ntechniques for solving least squares problems involving coefficient matrices\nthat are so large that they do not fit in main memory. The implementations\ninclude both CPU and GPU variants. All techniques rely on complete orthogonal\ndecompositions that guarantee that both conditions of a least squares solution\nare met, regardless of the rank properties of the matrix. Specifically, they\nrely on the recently proposed \"randUTV\" algorithm that is particularly\neffective in strongly communication-constrained environments. A detailed\nprecision and performance study reveals that the new methods, that operate on\ndata stored on disk, are competitive with state-of-the-art methods that store\nall data in main memory.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"8 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Solving Large Rank-Deficient Linear Least-Squares Problems on Shared-Memory CPU Architectures and GPU Architectures\",\"authors\":\"Mónica Chillarón, Gregorio Quintana-Ortí, Vicente Vidal, Per-Gunnar Martinsson\",\"doi\":\"arxiv-2408.05238\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Solving very large linear systems of equations is a key computational task in\\nscience and technology. In many cases, the coefficient matrix of the linear\\nsystem is rank-deficient, leading to systems that may be underdetermined,\\ninconsistent, or both. In such cases, one generally seeks to compute the least\\nsquares solution that minimizes the residual of the problem, which can be\\nfurther defined as the solution with smallest norm in cases where the\\ncoefficient matrix has a nontrivial nullspace. This work presents several new\\ntechniques for solving least squares problems involving coefficient matrices\\nthat are so large that they do not fit in main memory. The implementations\\ninclude both CPU and GPU variants. All techniques rely on complete orthogonal\\ndecompositions that guarantee that both conditions of a least squares solution\\nare met, regardless of the rank properties of the matrix. Specifically, they\\nrely on the recently proposed \\\"randUTV\\\" algorithm that is particularly\\neffective in strongly communication-constrained environments. A detailed\\nprecision and performance study reveals that the new methods, that operate on\\ndata stored on disk, are competitive with state-of-the-art methods that store\\nall data in main memory.\",\"PeriodicalId\":501291,\"journal\":{\"name\":\"arXiv - CS - Performance\",\"volume\":\"8 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Performance\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.05238\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Performance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.05238","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
求解超大线性方程组是科学和技术领域的一项关键计算任务。在许多情况下,线性方程组的系数矩阵存在秩缺陷,导致方程组可能是未定方程、不一致方程或两者兼而有之。在这种情况下,人们通常寻求计算最小二乘法解,使问题的残差最小,在系数矩阵具有非三维空域的情况下,残差可进一步定义为具有最小规范的解。本研究提出了几种新技术,用于求解涉及系数矩阵大到无法放入主内存的最小二乘法问题。实现方法包括 CPU 和 GPU 变体。所有技术都依赖于完整的正交分解,无论矩阵的秩属性如何,都能保证满足最小二乘法求解的两个条件。具体来说,它们依赖于最近提出的 "randUTV "算法,该算法在通信受限的环境中特别有效。详细的精度和性能研究表明,新方法对存储在磁盘上的数据进行操作,与将所有数据存储在主存储器中的最先进方法相比,具有很强的竞争力。
Solving Large Rank-Deficient Linear Least-Squares Problems on Shared-Memory CPU Architectures and GPU Architectures
Solving very large linear systems of equations is a key computational task in
science and technology. In many cases, the coefficient matrix of the linear
system is rank-deficient, leading to systems that may be underdetermined,
inconsistent, or both. In such cases, one generally seeks to compute the least
squares solution that minimizes the residual of the problem, which can be
further defined as the solution with smallest norm in cases where the
coefficient matrix has a nontrivial nullspace. This work presents several new
techniques for solving least squares problems involving coefficient matrices
that are so large that they do not fit in main memory. The implementations
include both CPU and GPU variants. All techniques rely on complete orthogonal
decompositions that guarantee that both conditions of a least squares solution
are met, regardless of the rank properties of the matrix. Specifically, they
rely on the recently proposed "randUTV" algorithm that is particularly
effective in strongly communication-constrained environments. A detailed
precision and performance study reveals that the new methods, that operate on
data stored on disk, are competitive with state-of-the-art methods that store
all data in main memory.