{"title":"并行分布式存储结构的改进共轭梯度平方(ICGS)方法","authors":"L. Yang, R. Brent","doi":"10.1109/ICPPW.2001.951924","DOIUrl":null,"url":null,"abstract":"For the solutions of large and sparse linear systems of equations with unsymmetric coefficient matrices, we propose an improved version of the Conjugate Gradient Squared method (ICGS) method. The algorithm is derived such that all inner products, matrix-vector multiplications and vector updates of a single iteration step are independent and communication time required for inner product can be overlapped efficiently with computation time of vector updates. Therefore, the cost of global communication on parallel distributed memory computers can be significantly reduced. The resulting ICGS algorithm maintains the favorable properties of the algorithm while not increasing computational costs. Data distribution suitable for both irregularly and regularly structured matrices based on the analysis of the non-zero matrix elements is also presented. Communication scheme is supported by overlapping execution of computation and communication to reduce mailing times. The efficiency of this method is demonstrated by numerical experimental results carried out on a massively parallel distributed memory system.","PeriodicalId":93355,"journal":{"name":"Proceedings of the ... ICPP Workshops on. International Conference on Parallel Processing Workshops","volume":"1 1","pages":"161-165"},"PeriodicalIF":0.0000,"publicationDate":"2001-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"The improved conjugate gradient squared (ICGS) method on parallel distributed memory architectures\",\"authors\":\"L. Yang, R. Brent\",\"doi\":\"10.1109/ICPPW.2001.951924\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"For the solutions of large and sparse linear systems of equations with unsymmetric coefficient matrices, we propose an improved version of the Conjugate Gradient Squared method (ICGS) method. The algorithm is derived such that all inner products, matrix-vector multiplications and vector updates of a single iteration step are independent and communication time required for inner product can be overlapped efficiently with computation time of vector updates. Therefore, the cost of global communication on parallel distributed memory computers can be significantly reduced. The resulting ICGS algorithm maintains the favorable properties of the algorithm while not increasing computational costs. Data distribution suitable for both irregularly and regularly structured matrices based on the analysis of the non-zero matrix elements is also presented. Communication scheme is supported by overlapping execution of computation and communication to reduce mailing times. The efficiency of this method is demonstrated by numerical experimental results carried out on a massively parallel distributed memory system.\",\"PeriodicalId\":93355,\"journal\":{\"name\":\"Proceedings of the ... ICPP Workshops on. International Conference on Parallel Processing Workshops\",\"volume\":\"1 1\",\"pages\":\"161-165\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2001-09-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ... ICPP Workshops on. International Conference on Parallel Processing Workshops\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPPW.2001.951924\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... ICPP Workshops on. International Conference on Parallel Processing Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPPW.2001.951924","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The improved conjugate gradient squared (ICGS) method on parallel distributed memory architectures
For the solutions of large and sparse linear systems of equations with unsymmetric coefficient matrices, we propose an improved version of the Conjugate Gradient Squared method (ICGS) method. The algorithm is derived such that all inner products, matrix-vector multiplications and vector updates of a single iteration step are independent and communication time required for inner product can be overlapped efficiently with computation time of vector updates. Therefore, the cost of global communication on parallel distributed memory computers can be significantly reduced. The resulting ICGS algorithm maintains the favorable properties of the algorithm while not increasing computational costs. Data distribution suitable for both irregularly and regularly structured matrices based on the analysis of the non-zero matrix elements is also presented. Communication scheme is supported by overlapping execution of computation and communication to reduce mailing times. The efficiency of this method is demonstrated by numerical experimental results carried out on a massively parallel distributed memory system.