GPU Implementation of Finite Difference Solvers

M. Giles, E. László, I. Reguly, J. Appleyard, Julien Demouth
{"title":"GPU Implementation of Finite Difference Solvers","authors":"M. Giles, E. László, I. Reguly, J. Appleyard, Julien Demouth","doi":"10.1109/WHPCF.2014.10","DOIUrl":null,"url":null,"abstract":"This paper discusses the implementation of one-factor and three-factor PDE models on GPUs. Both explicit and implicit time-marching methods are considered, with the latter requiring the solution of multiple tridiagonal systems of equations.Because of the small amount of data involved, one-factor models are primarily compute-limited, with a very good fraction of the peak compute capability being achieved. The key to the performance lies in the heavy use of registers and shuffle instructions for the explicit method, and a non-standard hybrid Thomas/PCR algorithm for solving the tridiagonal systems for the implicit solverThe three-factor problems involve much more data, and hence their execution is more evenly balanced between computation and data communication to/from the main graphics memory. However, it is again possible to achieve a good fraction of the theoretical peak performance on both measures. The high performance requires particularly careful attention to coalescence in the data transfers, using local shared memory for small array transpositions, and padding to avoid shared memory bank conicts.Computational results include comparisons to computations on Sandy Bridge and Haswell Intel Xeon processors, using both multithreading and AVX vectorisation.","PeriodicalId":368134,"journal":{"name":"2014 Seventh Workshop on High Performance Computational Finance","volume":"144 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 Seventh Workshop on High Performance Computational Finance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WHPCF.2014.10","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 22

Abstract

This paper discusses the implementation of one-factor and three-factor PDE models on GPUs. Both explicit and implicit time-marching methods are considered, with the latter requiring the solution of multiple tridiagonal systems of equations.Because of the small amount of data involved, one-factor models are primarily compute-limited, with a very good fraction of the peak compute capability being achieved. The key to the performance lies in the heavy use of registers and shuffle instructions for the explicit method, and a non-standard hybrid Thomas/PCR algorithm for solving the tridiagonal systems for the implicit solverThe three-factor problems involve much more data, and hence their execution is more evenly balanced between computation and data communication to/from the main graphics memory. However, it is again possible to achieve a good fraction of the theoretical peak performance on both measures. The high performance requires particularly careful attention to coalescence in the data transfers, using local shared memory for small array transpositions, and padding to avoid shared memory bank conicts.Computational results include comparisons to computations on Sandy Bridge and Haswell Intel Xeon processors, using both multithreading and AVX vectorisation.
有限差分求解器的GPU实现
本文讨论了单因素和三因素PDE模型在gpu上的实现。考虑了显式和隐式时间推进方法,隐式时间推进方法要求求解多个三对角方程组。由于所涉及的数据量很少,单因素模型主要是受计算限制的,可以实现峰值计算能力的很大一部分。性能的关键在于显式方法大量使用寄存器和shuffle指令,而隐式求解器使用非标准的混合Thomas/PCR算法来求解三对角线系统。三因素问题涉及更多的数据,因此它们的执行在计算和与主图形存储器的数据通信之间更加均衡。然而,在这两种测量方法上,也有可能达到理论峰值性能的很大一部分。高性能需要特别注意数据传输中的合并,使用本地共享内存进行小数组换位,并填充以避免共享内存库冲突。计算结果包括使用多线程和AVX向量化与Sandy Bridge和Haswell Intel Xeon处理器的计算进行比较。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信