批量同步并行体系结构中的IBiCGStab方法

Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications Pub Date : 2002-06-16 DOI:10.1109/HPCSA.2002.1019147

L. Yang, R. E. Shaw

{"title":"批量同步并行体系结构中的IBiCGStab方法","authors":"L. Yang, R. E. Shaw","doi":"10.1109/HPCSA.2002.1019147","DOIUrl":null,"url":null,"abstract":"In this paper, an improved version of the BiCGStab method for the solutions of large and sparse linear systems of equations with unsymmetric coefficient matrices is proposed. The method combines elements of numerical stability and parallel algorithm design without increasing the computational costs. The algorithm is derived such that all inner products of a single iteration step are independent and communication time required for inner product can be overlapped efficiently with computation time of vector updates. Therefore, the cost of global communication can be significantly reduced. In this paper, the bulk synchronous parallel (BSP) model is used to design a fully efficient, scalable and portable parallel proposed algorithm and to provide accurate performance prediction of the algorithm for a wide range of architectures including the Cray T3D, the Parsytec, and a cluster of workstations connected by an Ethernet. This performance model provides us useful insight in the time complexity of the method using only a few system dependent parameters based on a simple and accurate cost modelling. The theoretical performance prediction are compared with some preliminary measured timing results of a numerical application from ocean flow simulation.","PeriodicalId":111862,"journal":{"name":"Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"The IBiCGStab method on bulk synchronous parallel architectures\",\"authors\":\"L. Yang, R. E. Shaw\",\"doi\":\"10.1109/HPCSA.2002.1019147\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, an improved version of the BiCGStab method for the solutions of large and sparse linear systems of equations with unsymmetric coefficient matrices is proposed. The method combines elements of numerical stability and parallel algorithm design without increasing the computational costs. The algorithm is derived such that all inner products of a single iteration step are independent and communication time required for inner product can be overlapped efficiently with computation time of vector updates. Therefore, the cost of global communication can be significantly reduced. In this paper, the bulk synchronous parallel (BSP) model is used to design a fully efficient, scalable and portable parallel proposed algorithm and to provide accurate performance prediction of the algorithm for a wide range of architectures including the Cray T3D, the Parsytec, and a cluster of workstations connected by an Ethernet. This performance model provides us useful insight in the time complexity of the method using only a few system dependent parameters based on a simple and accurate cost modelling. The theoretical performance prediction are compared with some preliminary measured timing results of a numerical application from ocean flow simulation.\",\"PeriodicalId\":111862,\"journal\":{\"name\":\"Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-06-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCSA.2002.1019147\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCSA.2002.1019147","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

本文提出了一种改进的BiCGStab方法，用于求解具有非对称系数矩阵的大型稀疏线性方程组。该方法在不增加计算量的前提下，将数值稳定性与并行算法设计相结合。该算法使单个迭代步骤的所有内积相互独立，并且内积所需的通信时间可以有效地与矢量更新的计算时间重叠。因此，全球通信的成本可以大大降低。在本文中，采用批量同步并行(BSP)模型设计了一种完全高效、可扩展和可移植的并行算法，并为包括Cray T3D、Parsytec和通过以太网连接的工作站集群在内的各种架构提供了准确的算法性能预测。该性能模型基于简单而准确的成本建模，仅使用几个系统相关参数，为我们提供了有用的洞察方法的时间复杂性。并将理论性能预测结果与海洋流动模拟数值应用的初步定时测量结果进行了比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

The IBiCGStab method on bulk synchronous parallel architectures

In this paper, an improved version of the BiCGStab method for the solutions of large and sparse linear systems of equations with unsymmetric coefficient matrices is proposed. The method combines elements of numerical stability and parallel algorithm design without increasing the computational costs. The algorithm is derived such that all inner products of a single iteration step are independent and communication time required for inner product can be overlapped efficiently with computation time of vector updates. Therefore, the cost of global communication can be significantly reduced. In this paper, the bulk synchronous parallel (BSP) model is used to design a fully efficient, scalable and portable parallel proposed algorithm and to provide accurate performance prediction of the algorithm for a wide range of architectures including the Cray T3D, the Parsytec, and a cluster of workstations connected by an Ethernet. This performance model provides us useful insight in the time complexity of the method using only a few system dependent parameters based on a simple and accurate cost modelling. The theoretical performance prediction are compared with some preliminary measured timing results of a numerical application from ocean flow simulation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications

自引率

0.00%

发文量