FPGA Acceleration for HPC Supercapacitor Simulations

Charles Prouveur, M. Haefele, Tobias Kenter, Nils Voss
{"title":"FPGA Acceleration for HPC Supercapacitor Simulations","authors":"Charles Prouveur, M. Haefele, Tobias Kenter, Nils Voss","doi":"10.1145/3592979.3593419","DOIUrl":null,"url":null,"abstract":"In the search of more energy efficient computing devices that could be assembled to build future exascale systems, this study proposes a chip to chip comparison between a CPU, a GPU and a FPGA, as well as a scalability study on multiple FPGAs from two of the available vendors. The application considered here has been extracted from a production code in material science. This allows for the benchmarking of different implementations to be performed on a production test case and not just theoretical ones. The core algorithm is a matrix free conjugate gradient that computes the total electrostatic energy with an Ewald summation at each iteration. This paper depicts the original MPI implementation of the application, details a numerical accuracy study and explains the methodology followed as well as the resulting FPGA implementation based on MaxCompiler. The FPGA implementation using 40 bits floating point number representation outperforms the CPU implementation both in terms of computing power and energy usage resulting in an energy efficiency more than 15 times better. Compared to the GPU of the same generation, the FPGA reaches 60% of the GPU performance while the ratio of the performance per watt is still better by a factor of 2. Thanks to its low average power usage, the FPGA bests both fully loaded CPU and GPU in terms of number of conjugate gradient iterations per second and per watt. Finally, an implementation using oneAPI is described as well, showcasing a new development environment for FPGA in HPC.","PeriodicalId":174137,"journal":{"name":"Proceedings of the Platform for Advanced Scientific Computing Conference","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Platform for Advanced Scientific Computing Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3592979.3593419","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In the search of more energy efficient computing devices that could be assembled to build future exascale systems, this study proposes a chip to chip comparison between a CPU, a GPU and a FPGA, as well as a scalability study on multiple FPGAs from two of the available vendors. The application considered here has been extracted from a production code in material science. This allows for the benchmarking of different implementations to be performed on a production test case and not just theoretical ones. The core algorithm is a matrix free conjugate gradient that computes the total electrostatic energy with an Ewald summation at each iteration. This paper depicts the original MPI implementation of the application, details a numerical accuracy study and explains the methodology followed as well as the resulting FPGA implementation based on MaxCompiler. The FPGA implementation using 40 bits floating point number representation outperforms the CPU implementation both in terms of computing power and energy usage resulting in an energy efficiency more than 15 times better. Compared to the GPU of the same generation, the FPGA reaches 60% of the GPU performance while the ratio of the performance per watt is still better by a factor of 2. Thanks to its low average power usage, the FPGA bests both fully loaded CPU and GPU in terms of number of conjugate gradient iterations per second and per watt. Finally, an implementation using oneAPI is described as well, showcasing a new development environment for FPGA in HPC.
用于HPC超级电容器仿真的FPGA加速
为了寻找更节能的计算设备,这些设备可以组装成未来的百亿亿级系统,本研究提出了CPU, GPU和FPGA之间的芯片比较,以及来自两个可用供应商的多个FPGA的可扩展性研究。这里考虑的应用程序是从材料科学的生产代码中提取的。这允许在生产测试用例上执行不同实现的基准测试,而不仅仅是理论测试用例。该算法的核心是一个无矩阵共轭梯度,在每次迭代时用埃瓦尔德求和来计算总静电能量。本文描述了该应用程序的原始MPI实现,详细介绍了数值精度研究,并解释了所遵循的方法以及基于MaxCompiler的FPGA实现。使用40位浮点数表示的FPGA实现在计算能力和能源使用方面都优于CPU实现,从而使能效提高15倍以上。与同一代GPU相比,FPGA达到了GPU性能的60%,而每瓦性能比仍然好2倍。由于其低平均功耗,FPGA在每秒和每瓦的共轭梯度迭代次数方面优于满载CPU和GPU。最后,描述了使用oneAPI的实现,展示了HPC中FPGA的新开发环境。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信