GPU加速聚合AMG的步骤

2012 11th International Symposium on Parallel and Distributed Computing Pub Date : 2012-06-25 DOI:10.1109/ISPDC.2012.19

M. Emans, M. Liebmann, B. Basara

{"title":"GPU加速聚合AMG的步骤","authors":"M. Emans, M. Liebmann, B. Basara","doi":"10.1109/ISPDC.2012.19","DOIUrl":null,"url":null,"abstract":"We present an implementation of AMG with simple aggregation techniques on multiple GPUs. It supports the parallel matrix representations typically used for finite volume discretisation. We employ the ICRS sparse matrix format and the asynchronous exchange mechanism of MPI on CPUs that has been modified to make it suitable for the GPU coprocessors. We show that the solution phase of the standard v-cycle AMG with simple aggregation is accelerated by a factor of up to 12. The solution phase of the more advanced Krylov-accelerated AMG runs faster by a factor of up to 7 on Nvidia TESLA C2070 compared to calculation on Intel X5650 CPUs.","PeriodicalId":287900,"journal":{"name":"2012 11th International Symposium on Parallel and Distributed Computing","volume":"69 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Steps towards GPU Accelerated Aggregation AMG\",\"authors\":\"M. Emans, M. Liebmann, B. Basara\",\"doi\":\"10.1109/ISPDC.2012.19\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present an implementation of AMG with simple aggregation techniques on multiple GPUs. It supports the parallel matrix representations typically used for finite volume discretisation. We employ the ICRS sparse matrix format and the asynchronous exchange mechanism of MPI on CPUs that has been modified to make it suitable for the GPU coprocessors. We show that the solution phase of the standard v-cycle AMG with simple aggregation is accelerated by a factor of up to 12. The solution phase of the more advanced Krylov-accelerated AMG runs faster by a factor of up to 7 on Nvidia TESLA C2070 compared to calculation on Intel X5650 CPUs.\",\"PeriodicalId\":287900,\"journal\":{\"name\":\"2012 11th International Symposium on Parallel and Distributed Computing\",\"volume\":\"69 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-06-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 11th International Symposium on Parallel and Distributed Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISPDC.2012.19\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 11th International Symposium on Parallel and Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPDC.2012.19","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

我们提出了一种在多个gpu上使用简单聚合技术实现AMG的方法。它支持通常用于有限体积离散的并行矩阵表示。我们采用了ICRS稀疏矩阵格式和MPI在cpu上的异步交换机制，并对其进行了修改，使其适合GPU协处理器。我们证明了具有简单聚集的标准v循环AMG的溶液相被加速了高达12倍。更先进的krylov加速AMG的解决方案阶段在Nvidia TESLA C2070上的运行速度比在Intel X5650 cpu上的计算速度快了7倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Steps towards GPU Accelerated Aggregation AMG

We present an implementation of AMG with simple aggregation techniques on multiple GPUs. It supports the parallel matrix representations typically used for finite volume discretisation. We employ the ICRS sparse matrix format and the asynchronous exchange mechanism of MPI on CPUs that has been modified to make it suitable for the GPU coprocessors. We show that the solution phase of the standard v-cycle AMG with simple aggregation is accelerated by a factor of up to 12. The solution phase of the more advanced Krylov-accelerated AMG runs faster by a factor of up to 7 on Nvidia TESLA C2070 compared to calculation on Intel X5650 CPUs.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2012 11th International Symposium on Parallel and Distributed Computing

自引率

0.00%

发文量