在共享内存 CPU 架构和 GPU 架构上解决大型缺阶线性最小二乘法问题

Mónica Chillarón, Gregorio Quintana-Ortí, Vicente Vidal, Per-Gunnar Martinsson
{"title":"在共享内存 CPU 架构和 GPU 架构上解决大型缺阶线性最小二乘法问题","authors":"Mónica Chillarón, Gregorio Quintana-Ortí, Vicente Vidal, Per-Gunnar Martinsson","doi":"arxiv-2408.05238","DOIUrl":null,"url":null,"abstract":"Solving very large linear systems of equations is a key computational task in\nscience and technology. In many cases, the coefficient matrix of the linear\nsystem is rank-deficient, leading to systems that may be underdetermined,\ninconsistent, or both. In such cases, one generally seeks to compute the least\nsquares solution that minimizes the residual of the problem, which can be\nfurther defined as the solution with smallest norm in cases where the\ncoefficient matrix has a nontrivial nullspace. This work presents several new\ntechniques for solving least squares problems involving coefficient matrices\nthat are so large that they do not fit in main memory. The implementations\ninclude both CPU and GPU variants. All techniques rely on complete orthogonal\ndecompositions that guarantee that both conditions of a least squares solution\nare met, regardless of the rank properties of the matrix. Specifically, they\nrely on the recently proposed \"randUTV\" algorithm that is particularly\neffective in strongly communication-constrained environments. A detailed\nprecision and performance study reveals that the new methods, that operate on\ndata stored on disk, are competitive with state-of-the-art methods that store\nall data in main memory.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"8 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Solving Large Rank-Deficient Linear Least-Squares Problems on Shared-Memory CPU Architectures and GPU Architectures\",\"authors\":\"Mónica Chillarón, Gregorio Quintana-Ortí, Vicente Vidal, Per-Gunnar Martinsson\",\"doi\":\"arxiv-2408.05238\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Solving very large linear systems of equations is a key computational task in\\nscience and technology. In many cases, the coefficient matrix of the linear\\nsystem is rank-deficient, leading to systems that may be underdetermined,\\ninconsistent, or both. In such cases, one generally seeks to compute the least\\nsquares solution that minimizes the residual of the problem, which can be\\nfurther defined as the solution with smallest norm in cases where the\\ncoefficient matrix has a nontrivial nullspace. This work presents several new\\ntechniques for solving least squares problems involving coefficient matrices\\nthat are so large that they do not fit in main memory. The implementations\\ninclude both CPU and GPU variants. All techniques rely on complete orthogonal\\ndecompositions that guarantee that both conditions of a least squares solution\\nare met, regardless of the rank properties of the matrix. Specifically, they\\nrely on the recently proposed \\\"randUTV\\\" algorithm that is particularly\\neffective in strongly communication-constrained environments. A detailed\\nprecision and performance study reveals that the new methods, that operate on\\ndata stored on disk, are competitive with state-of-the-art methods that store\\nall data in main memory.\",\"PeriodicalId\":501291,\"journal\":{\"name\":\"arXiv - CS - Performance\",\"volume\":\"8 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Performance\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.05238\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Performance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.05238","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

求解超大线性方程组是科学和技术领域的一项关键计算任务。在许多情况下,线性方程组的系数矩阵存在秩缺陷,导致方程组可能是未定方程、不一致方程或两者兼而有之。在这种情况下,人们通常寻求计算最小二乘法解,使问题的残差最小,在系数矩阵具有非三维空域的情况下,残差可进一步定义为具有最小规范的解。本研究提出了几种新技术,用于求解涉及系数矩阵大到无法放入主内存的最小二乘法问题。实现方法包括 CPU 和 GPU 变体。所有技术都依赖于完整的正交分解,无论矩阵的秩属性如何,都能保证满足最小二乘法求解的两个条件。具体来说,它们依赖于最近提出的 "randUTV "算法,该算法在通信受限的环境中特别有效。详细的精度和性能研究表明,新方法对存储在磁盘上的数据进行操作,与将所有数据存储在主存储器中的最先进方法相比,具有很强的竞争力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Solving Large Rank-Deficient Linear Least-Squares Problems on Shared-Memory CPU Architectures and GPU Architectures
Solving very large linear systems of equations is a key computational task in science and technology. In many cases, the coefficient matrix of the linear system is rank-deficient, leading to systems that may be underdetermined, inconsistent, or both. In such cases, one generally seeks to compute the least squares solution that minimizes the residual of the problem, which can be further defined as the solution with smallest norm in cases where the coefficient matrix has a nontrivial nullspace. This work presents several new techniques for solving least squares problems involving coefficient matrices that are so large that they do not fit in main memory. The implementations include both CPU and GPU variants. All techniques rely on complete orthogonal decompositions that guarantee that both conditions of a least squares solution are met, regardless of the rank properties of the matrix. Specifically, they rely on the recently proposed "randUTV" algorithm that is particularly effective in strongly communication-constrained environments. A detailed precision and performance study reveals that the new methods, that operate on data stored on disk, are competitive with state-of-the-art methods that store all data in main memory.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信