D. Baxter, J. Saltz, M. Schultz, S. Eisenstat, K. Crowley
{"title":"An experimental study of methods for parallel preconditioned Krylov methods","authors":"D. Baxter, J. Saltz, M. Schultz, S. Eisenstat, K. Crowley","doi":"10.1145/63047.63128","DOIUrl":null,"url":null,"abstract":"High performance multiprocessor architectures differ both in the number of processors, and in the delay costs for synchronization and communication. In order to obtain good performance on a given architecture for a given problem, adequate parallelization, good balance of load and an appropriate choice of granularity are essential.\nWe discuss the implementation of parallel version of PCGPAK for both shared memory architectures and hypercubes. Our parallel implementation is sufficiently efficient to allow us to complete the solution of our test problems on 16 processors of the Encore Multimax/320 in an amount of time that is a small multiple of that required by a single head of a Cray X/MP, despite the fact that the peak performance of the Multimax processors is not even close to the supercomputer range. We illustrate the effectiveness of our approach on a number of model problems from reservoir engineering and mathematics.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"43","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Conference on Hypercube Concurrent Computers and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/63047.63128","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 43
Abstract
High performance multiprocessor architectures differ both in the number of processors, and in the delay costs for synchronization and communication. In order to obtain good performance on a given architecture for a given problem, adequate parallelization, good balance of load and an appropriate choice of granularity are essential.
We discuss the implementation of parallel version of PCGPAK for both shared memory architectures and hypercubes. Our parallel implementation is sufficiently efficient to allow us to complete the solution of our test problems on 16 processors of the Encore Multimax/320 in an amount of time that is a small multiple of that required by a single head of a Cray X/MP, despite the fact that the peak performance of the Multimax processors is not even close to the supercomputer range. We illustrate the effectiveness of our approach on a number of model problems from reservoir engineering and mathematics.