LAMMPS中Tersoff电位的现场可编程门阵列加速

IF 1.8 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Quan Deng, Qiang Liu
{"title":"LAMMPS中Tersoff电位的现场可编程门阵列加速","authors":"Quan Deng,&nbsp;Qiang Liu","doi":"10.1002/eng2.12694","DOIUrl":null,"url":null,"abstract":"<p>Molecular dynamics simulation is a common method to help humans understand the microscopic world. The traditional general-purpose high-performance computing platforms are hindered by low computational and power efficiency, constraining the practical application of large-scale and long-time many-body molecular dynamics simulations. In order to address these problems, a novel molecular dynamics accelerator for the <i>Tersoff</i> potential is designed based on field-programmable gate array (FPGA) platforms, which enables the acceleration of LAMMPS using FPGAs. Firstly, an on-the-fly method is proposed to build neighbor lists and reduce storage usage. Besides, multilevel parallelizations are implemented to enable the accelerator to be flexibly deployed on FPGAs of different scales and achieve good performance. Finally, mathematical models of the accelerator are built, and a method for using the models to determine the optimal-performance parameters is proposed. Experimental results show that, when tested on the Xilinx Alveo U200, the proposed accelerator achieves a performance of 9.51 ns/day for the <i>Tersoff</i> simulation in a 55,296-atom system, which is a 2.00<span></span><math>\n <semantics>\n <mrow>\n <mo>×</mo>\n </mrow>\n <annotation>$$ \\times $$</annotation>\n </semantics></math> increase in performance when compared to Intel I7-8700K and 1.70<span></span><math>\n <semantics>\n <mrow>\n <mo>×</mo>\n </mrow>\n <annotation>$$ \\times $$</annotation>\n </semantics></math> to NVIDIA Tesla K40c under the same test case. In addition, in terms of computational efficiency and power efficiency, the proposed accelerator achieves improvements of 2.00<span></span><math>\n <semantics>\n <mrow>\n <mo>×</mo>\n </mrow>\n <annotation>$$ \\times $$</annotation>\n </semantics></math> and 7.19<span></span><math>\n <semantics>\n <mrow>\n <mo>×</mo>\n </mrow>\n <annotation>$$ \\times $$</annotation>\n </semantics></math> compared to Intel I7-8700K, and 4.33<span></span><math>\n <semantics>\n <mrow>\n <mo>×</mo>\n </mrow>\n <annotation>$$ \\times $$</annotation>\n </semantics></math> and 2.11<span></span><math>\n <semantics>\n <mrow>\n <mo>×</mo>\n </mrow>\n <annotation>$$ \\times $$</annotation>\n </semantics></math> compared to NVIDIA Titan Xp, respectively.</p>","PeriodicalId":72922,"journal":{"name":"Engineering reports : open access","volume":"7 1","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2023-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/eng2.12694","citationCount":"0","resultStr":"{\"title\":\"Field-programmable gate array acceleration of the Tersoff potential in LAMMPS\",\"authors\":\"Quan Deng,&nbsp;Qiang Liu\",\"doi\":\"10.1002/eng2.12694\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Molecular dynamics simulation is a common method to help humans understand the microscopic world. The traditional general-purpose high-performance computing platforms are hindered by low computational and power efficiency, constraining the practical application of large-scale and long-time many-body molecular dynamics simulations. In order to address these problems, a novel molecular dynamics accelerator for the <i>Tersoff</i> potential is designed based on field-programmable gate array (FPGA) platforms, which enables the acceleration of LAMMPS using FPGAs. Firstly, an on-the-fly method is proposed to build neighbor lists and reduce storage usage. Besides, multilevel parallelizations are implemented to enable the accelerator to be flexibly deployed on FPGAs of different scales and achieve good performance. Finally, mathematical models of the accelerator are built, and a method for using the models to determine the optimal-performance parameters is proposed. Experimental results show that, when tested on the Xilinx Alveo U200, the proposed accelerator achieves a performance of 9.51 ns/day for the <i>Tersoff</i> simulation in a 55,296-atom system, which is a 2.00<span></span><math>\\n <semantics>\\n <mrow>\\n <mo>×</mo>\\n </mrow>\\n <annotation>$$ \\\\times $$</annotation>\\n </semantics></math> increase in performance when compared to Intel I7-8700K and 1.70<span></span><math>\\n <semantics>\\n <mrow>\\n <mo>×</mo>\\n </mrow>\\n <annotation>$$ \\\\times $$</annotation>\\n </semantics></math> to NVIDIA Tesla K40c under the same test case. In addition, in terms of computational efficiency and power efficiency, the proposed accelerator achieves improvements of 2.00<span></span><math>\\n <semantics>\\n <mrow>\\n <mo>×</mo>\\n </mrow>\\n <annotation>$$ \\\\times $$</annotation>\\n </semantics></math> and 7.19<span></span><math>\\n <semantics>\\n <mrow>\\n <mo>×</mo>\\n </mrow>\\n <annotation>$$ \\\\times $$</annotation>\\n </semantics></math> compared to Intel I7-8700K, and 4.33<span></span><math>\\n <semantics>\\n <mrow>\\n <mo>×</mo>\\n </mrow>\\n <annotation>$$ \\\\times $$</annotation>\\n </semantics></math> and 2.11<span></span><math>\\n <semantics>\\n <mrow>\\n <mo>×</mo>\\n </mrow>\\n <annotation>$$ \\\\times $$</annotation>\\n </semantics></math> compared to NVIDIA Titan Xp, respectively.</p>\",\"PeriodicalId\":72922,\"journal\":{\"name\":\"Engineering reports : open access\",\"volume\":\"7 1\",\"pages\":\"\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2023-05-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/eng2.12694\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Engineering reports : open access\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/eng2.12694\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering reports : open access","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/eng2.12694","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

摘要

分子动力学模拟是帮助人类了解微观世界的常用方法。传统的通用高性能计算平台存在计算效率和功耗低等问题,制约了大规模、长时间多体分子动力学模拟的实际应用。为了解决这些问题,基于现场可编程门阵列(FPGA)平台设计了一种新型的Tersoff势分子动力学加速器,该加速器可以使用FPGA加速LAMMPS。首先,提出了一种动态构建邻居列表的方法,减少了存储空间的使用。此外,为了使加速器能够灵活地部署在不同规模的fpga上,并获得良好的性能,还实现了多电平并行化。最后,建立了加速器的数学模型,并提出了利用模型确定最优性能参数的方法。实验结果表明,在Xilinx Alveo U200上进行测试时,所提出的加速器在55,296个原子的Tersoff系统中达到了9.51 ns/day的性能。在相同的测试用例下,与Intel I7-8700K相比,性能提高了2.00 × $$ \times $$,与NVIDIA Tesla K40c相比,性能提高了1.70 × $$ \times $$。此外,在计算效率和功耗效率方面,与Intel I7-8700K相比,所提出的加速器实现了2.00 × $$ \times $$和7.19 × $$ \times $$的改进。与NVIDIA Titan Xp相比,分别为4.33 × $$ \times $$和2.11 × $$ \times $$。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Field-programmable gate array acceleration of the Tersoff potential in LAMMPS

Field-programmable gate array acceleration of the Tersoff potential in LAMMPS

Molecular dynamics simulation is a common method to help humans understand the microscopic world. The traditional general-purpose high-performance computing platforms are hindered by low computational and power efficiency, constraining the practical application of large-scale and long-time many-body molecular dynamics simulations. In order to address these problems, a novel molecular dynamics accelerator for the Tersoff potential is designed based on field-programmable gate array (FPGA) platforms, which enables the acceleration of LAMMPS using FPGAs. Firstly, an on-the-fly method is proposed to build neighbor lists and reduce storage usage. Besides, multilevel parallelizations are implemented to enable the accelerator to be flexibly deployed on FPGAs of different scales and achieve good performance. Finally, mathematical models of the accelerator are built, and a method for using the models to determine the optimal-performance parameters is proposed. Experimental results show that, when tested on the Xilinx Alveo U200, the proposed accelerator achieves a performance of 9.51 ns/day for the Tersoff simulation in a 55,296-atom system, which is a 2.00 × $$ \times $$ increase in performance when compared to Intel I7-8700K and 1.70 × $$ \times $$ to NVIDIA Tesla K40c under the same test case. In addition, in terms of computational efficiency and power efficiency, the proposed accelerator achieves improvements of 2.00 × $$ \times $$ and 7.19 × $$ \times $$ compared to Intel I7-8700K, and 4.33 × $$ \times $$ and 2.11 × $$ \times $$ compared to NVIDIA Titan Xp, respectively.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
5.10
自引率
0.00%
发文量
0
审稿时长
19 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信