多核平台上一个稀疏三角形线性系统的解评估

2021 XLVII Latin American Computing Conference (CLEI) Pub Date : 2021-10-25 DOI:10.1109/CLEI53233.2021.9640084

Raúl Marichal, Ernesto Dufrechu, P. Ezzatti

{"title":"多核平台上一个稀疏三角形线性系统的解评估","authors":"Raúl Marichal, Ernesto Dufrechu, P. Ezzatti","doi":"10.1109/CLEI53233.2021.9640084","DOIUrl":null,"url":null,"abstract":"The solution of sparse triangular linear systems is an important building block for a large number of numerical methods used in science and engineering. It is then crucial to count with implementations of this operation that can execute efficiently in the most recent hardware platforms. In the case of GPUs, several methods have been proposed in the last years. These methods belong to two main categories. On the one hand, there are the methods that rely on a previous analysis of the sparse matrix to determine a better execution schedule and, on the other hand, there are methods that decide this scheduling dynamically. The experimental results in the literature are not conclussive in favour of any of these strategies. However, the experimental evaluations usually focus on the use case where many systems have to be solved with the same sparse matrix, where the analysis phase needs to be performed only once and its cost is not important in relation with the total runtime. In this work we are interested in determining which is the best strategy, according to the degree of parallelism of the problem, when only one sytem is to be solved. The experimental evaluation performed on NVIDIA P100 accelerators shows that the self-scheduled routines present important advantages when the degree of parallelism of the problem allows it.","PeriodicalId":6803,"journal":{"name":"2021 XLVII Latin American Computing Conference (CLEI)","volume":"51 1","pages":"1-8"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Assessing the solution of one sparse triangular linear system on multi-many core platforms\",\"authors\":\"Raúl Marichal, Ernesto Dufrechu, P. Ezzatti\",\"doi\":\"10.1109/CLEI53233.2021.9640084\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The solution of sparse triangular linear systems is an important building block for a large number of numerical methods used in science and engineering. It is then crucial to count with implementations of this operation that can execute efficiently in the most recent hardware platforms. In the case of GPUs, several methods have been proposed in the last years. These methods belong to two main categories. On the one hand, there are the methods that rely on a previous analysis of the sparse matrix to determine a better execution schedule and, on the other hand, there are methods that decide this scheduling dynamically. The experimental results in the literature are not conclussive in favour of any of these strategies. However, the experimental evaluations usually focus on the use case where many systems have to be solved with the same sparse matrix, where the analysis phase needs to be performed only once and its cost is not important in relation with the total runtime. In this work we are interested in determining which is the best strategy, according to the degree of parallelism of the problem, when only one sytem is to be solved. The experimental evaluation performed on NVIDIA P100 accelerators shows that the self-scheduled routines present important advantages when the degree of parallelism of the problem allows it.\",\"PeriodicalId\":6803,\"journal\":{\"name\":\"2021 XLVII Latin American Computing Conference (CLEI)\",\"volume\":\"51 1\",\"pages\":\"1-8\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 XLVII Latin American Computing Conference (CLEI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CLEI53233.2021.9640084\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 XLVII Latin American Computing Conference (CLEI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLEI53233.2021.9640084","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

稀疏三角形线性系统的解是科学和工程中大量数值方法的重要组成部分。因此，计算在最新硬件平台上可以有效执行的此操作的实现是至关重要的。就gpu而言，在过去几年中已经提出了几种方法。这些方法主要分为两大类。一方面，有些方法依赖于先前对稀疏矩阵的分析来确定更好的执行调度，另一方面，有些方法动态地决定这种调度。文献中的实验结果并不支持这些策略中的任何一种。然而，实验评估通常集中在许多系统必须用相同的稀疏矩阵来解决的用例上，其中分析阶段只需要执行一次，其成本与总运行时的关系并不重要。在这项工作中，我们感兴趣的是，当只有一个系统要解决时，根据问题的并行度，确定哪种策略是最佳策略。在NVIDIA P100加速器上进行的实验评估表明，当问题的并行度允许时，自调度例程具有重要的优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Assessing the solution of one sparse triangular linear system on multi-many core platforms

The solution of sparse triangular linear systems is an important building block for a large number of numerical methods used in science and engineering. It is then crucial to count with implementations of this operation that can execute efficiently in the most recent hardware platforms. In the case of GPUs, several methods have been proposed in the last years. These methods belong to two main categories. On the one hand, there are the methods that rely on a previous analysis of the sparse matrix to determine a better execution schedule and, on the other hand, there are methods that decide this scheduling dynamically. The experimental results in the literature are not conclussive in favour of any of these strategies. However, the experimental evaluations usually focus on the use case where many systems have to be solved with the same sparse matrix, where the analysis phase needs to be performed only once and its cost is not important in relation with the total runtime. In this work we are interested in determining which is the best strategy, according to the degree of parallelism of the problem, when only one sytem is to be solved. The experimental evaluation performed on NVIDIA P100 accelerators shows that the self-scheduled routines present important advantages when the degree of parallelism of the problem allows it.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 XLVII Latin American Computing Conference (CLEI)

自引率

0.00%

发文量