{"title":"多核平台上一个稀疏三角形线性系统的解评估","authors":"Raúl Marichal, Ernesto Dufrechu, P. Ezzatti","doi":"10.1109/CLEI53233.2021.9640084","DOIUrl":null,"url":null,"abstract":"The solution of sparse triangular linear systems is an important building block for a large number of numerical methods used in science and engineering. It is then crucial to count with implementations of this operation that can execute efficiently in the most recent hardware platforms. In the case of GPUs, several methods have been proposed in the last years. These methods belong to two main categories. On the one hand, there are the methods that rely on a previous analysis of the sparse matrix to determine a better execution schedule and, on the other hand, there are methods that decide this scheduling dynamically. The experimental results in the literature are not conclussive in favour of any of these strategies. However, the experimental evaluations usually focus on the use case where many systems have to be solved with the same sparse matrix, where the analysis phase needs to be performed only once and its cost is not important in relation with the total runtime. In this work we are interested in determining which is the best strategy, according to the degree of parallelism of the problem, when only one sytem is to be solved. The experimental evaluation performed on NVIDIA P100 accelerators shows that the self-scheduled routines present important advantages when the degree of parallelism of the problem allows it.","PeriodicalId":6803,"journal":{"name":"2021 XLVII Latin American Computing Conference (CLEI)","volume":"51 1","pages":"1-8"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Assessing the solution of one sparse triangular linear system on multi-many core platforms\",\"authors\":\"Raúl Marichal, Ernesto Dufrechu, P. Ezzatti\",\"doi\":\"10.1109/CLEI53233.2021.9640084\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The solution of sparse triangular linear systems is an important building block for a large number of numerical methods used in science and engineering. It is then crucial to count with implementations of this operation that can execute efficiently in the most recent hardware platforms. In the case of GPUs, several methods have been proposed in the last years. These methods belong to two main categories. On the one hand, there are the methods that rely on a previous analysis of the sparse matrix to determine a better execution schedule and, on the other hand, there are methods that decide this scheduling dynamically. The experimental results in the literature are not conclussive in favour of any of these strategies. However, the experimental evaluations usually focus on the use case where many systems have to be solved with the same sparse matrix, where the analysis phase needs to be performed only once and its cost is not important in relation with the total runtime. In this work we are interested in determining which is the best strategy, according to the degree of parallelism of the problem, when only one sytem is to be solved. The experimental evaluation performed on NVIDIA P100 accelerators shows that the self-scheduled routines present important advantages when the degree of parallelism of the problem allows it.\",\"PeriodicalId\":6803,\"journal\":{\"name\":\"2021 XLVII Latin American Computing Conference (CLEI)\",\"volume\":\"51 1\",\"pages\":\"1-8\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 XLVII Latin American Computing Conference (CLEI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CLEI53233.2021.9640084\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 XLVII Latin American Computing Conference (CLEI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLEI53233.2021.9640084","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Assessing the solution of one sparse triangular linear system on multi-many core platforms
The solution of sparse triangular linear systems is an important building block for a large number of numerical methods used in science and engineering. It is then crucial to count with implementations of this operation that can execute efficiently in the most recent hardware platforms. In the case of GPUs, several methods have been proposed in the last years. These methods belong to two main categories. On the one hand, there are the methods that rely on a previous analysis of the sparse matrix to determine a better execution schedule and, on the other hand, there are methods that decide this scheduling dynamically. The experimental results in the literature are not conclussive in favour of any of these strategies. However, the experimental evaluations usually focus on the use case where many systems have to be solved with the same sparse matrix, where the analysis phase needs to be performed only once and its cost is not important in relation with the total runtime. In this work we are interested in determining which is the best strategy, according to the degree of parallelism of the problem, when only one sytem is to be solved. The experimental evaluation performed on NVIDIA P100 accelerators shows that the self-scheduled routines present important advantages when the degree of parallelism of the problem allows it.