Aygul Jamal, M. Baboulin, Amal Khabou, M. Sosonkina
{"title":"并行代数递归多层求解器pARMS的CPU/GPU混合方法","authors":"Aygul Jamal, M. Baboulin, Amal Khabou, M. Sosonkina","doi":"10.1109/SYNASC.2016.069","DOIUrl":null,"url":null,"abstract":"We illustrate how the distributed parallel Algebraic Recursive Multilevel Solver based on MPI can be adapted for heterogeneous CPU/GPU architectures. The tasks performed on the GPU are related to the preconditioning of each part of the distributed matrix (local preconditioning) which is handled in the distributed version by each MPI process. The solving step remains on the CPU. In our implementation, the local preconditioning can be based either on the randomization of the last Schur complement system in the multilevel recursive process, or on an Incomplete LU factorization from the MAGMA library. Numerical experiments show that a promising performance improvement can be obtained using either randomized multilevel recursive preconditioning or Incomplete LU preconditioning for large enough matrices. Each preconditioning method ensures a good performance for a given set of matrices.","PeriodicalId":268635,"journal":{"name":"2016 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"A Hybrid CPU/GPU Approach for the Parallel Algebraic Recursive Multilevel Solver pARMS\",\"authors\":\"Aygul Jamal, M. Baboulin, Amal Khabou, M. Sosonkina\",\"doi\":\"10.1109/SYNASC.2016.069\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We illustrate how the distributed parallel Algebraic Recursive Multilevel Solver based on MPI can be adapted for heterogeneous CPU/GPU architectures. The tasks performed on the GPU are related to the preconditioning of each part of the distributed matrix (local preconditioning) which is handled in the distributed version by each MPI process. The solving step remains on the CPU. In our implementation, the local preconditioning can be based either on the randomization of the last Schur complement system in the multilevel recursive process, or on an Incomplete LU factorization from the MAGMA library. Numerical experiments show that a promising performance improvement can be obtained using either randomized multilevel recursive preconditioning or Incomplete LU preconditioning for large enough matrices. Each preconditioning method ensures a good performance for a given set of matrices.\",\"PeriodicalId\":268635,\"journal\":{\"name\":\"2016 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-09-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SYNASC.2016.069\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SYNASC.2016.069","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Hybrid CPU/GPU Approach for the Parallel Algebraic Recursive Multilevel Solver pARMS
We illustrate how the distributed parallel Algebraic Recursive Multilevel Solver based on MPI can be adapted for heterogeneous CPU/GPU architectures. The tasks performed on the GPU are related to the preconditioning of each part of the distributed matrix (local preconditioning) which is handled in the distributed version by each MPI process. The solving step remains on the CPU. In our implementation, the local preconditioning can be based either on the randomization of the last Schur complement system in the multilevel recursive process, or on an Incomplete LU factorization from the MAGMA library. Numerical experiments show that a promising performance improvement can be obtained using either randomized multilevel recursive preconditioning or Incomplete LU preconditioning for large enough matrices. Each preconditioning method ensures a good performance for a given set of matrices.