LAMMPS中Tersoff电位的现场可编程门阵列加速

IF 1.8 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Engineering reports : open access Pub Date : 2023-05-29 DOI:10.1002/eng2.12694

Quan Deng, Qiang Liu

{"title":"LAMMPS中Tersoff电位的现场可编程门阵列加速","authors":"Quan Deng, Qiang Liu","doi":"10.1002/eng2.12694","DOIUrl":null,"url":null,"abstract":"Molecular dynamics simulation is a common method to help humans understand the microscopic world. The traditional general-purpose high-performance computing platforms are hindered by low computational and power efficiency, constraining the practical application of large-scale and long-time many-body molecular dynamics simulations. In order to address these problems, a novel molecular dynamics accelerator for the Tersoff potential is designed based on field-programmable gate array (FPGA) platforms, which enables the acceleration of LAMMPS using FPGAs. Firstly, an on-the-fly method is proposed to build neighbor lists and reduce storage usage. Besides, multilevel parallelizations are implemented to enable the accelerator to be flexibly deployed on FPGAs of different scales and achieve good performance. Finally, mathematical models of the accelerator are built, and a method for using the models to determine the optimal-performance parameters is proposed. Experimental results show that, when tested on the Xilinx Alveo U200, the proposed accelerator achieves a performance of 9.51 ns/day for the Tersoff simulation in a 55,296-atom system, which is a 2.00<math>\n <semantics>\n <mrow>\n <mo>×</mo>\n </mrow>\n <annotation>$$ \\times $$</annotation>\n </semantics></math> increase in performance when compared to Intel I7-8700K and 1.70<math>\n <semantics>\n <mrow>\n <mo>×</mo>\n </mrow>\n <annotation>$$ \\times $$</annotation>\n </semantics></math> to NVIDIA Tesla K40c under the same test case. In addition, in terms of computational efficiency and power efficiency, the proposed accelerator achieves improvements of 2.00<math>\n <semantics>\n <mrow>\n <mo>×</mo>\n </mrow>\n <annotation>$$ \\times $$</annotation>\n </semantics></math> and 7.19<math>\n <semantics>\n <mrow>\n <mo>×</mo>\n </mrow>\n <annotation>$$ \\times $$</annotation>\n </semantics></math> compared to Intel I7-8700K, and 4.33<math>\n <semantics>\n <mrow>\n <mo>×</mo>\n </mrow>\n <annotation>$$ \\times $$</annotation>\n </semantics></math> and 2.11<math>\n <semantics>\n <mrow>\n <mo>×</mo>\n </mrow>\n <annotation>$$ \\times $$</annotation>\n </semantics></math> compared to NVIDIA Titan Xp, respectively.","PeriodicalId":72922,"journal":{"name":"Engineering reports : open access","volume":"7 1","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2023-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/eng2.12694","citationCount":"0","resultStr":"{\"title\":\"Field-programmable gate array acceleration of the Tersoff potential in LAMMPS\",\"authors\":\"Quan Deng, Qiang Liu\",\"doi\":\"10.1002/eng2.12694\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Molecular dynamics simulation is a common method to help humans understand the microscopic world. The traditional general-purpose high-performance computing platforms are hindered by low computational and power efficiency, constraining the practical application of large-scale and long-time many-body molecular dynamics simulations. In order to address these problems, a novel molecular dynamics accelerator for the Tersoff potential is designed based on field-programmable gate array (FPGA) platforms, which enables the acceleration of LAMMPS using FPGAs. Firstly, an on-the-fly method is proposed to build neighbor lists and reduce storage usage. Besides, multilevel parallelizations are implemented to enable the accelerator to be flexibly deployed on FPGAs of different scales and achieve good performance. Finally, mathematical models of the accelerator are built, and a method for using the models to determine the optimal-performance parameters is proposed. Experimental results show that, when tested on the Xilinx Alveo U200, the proposed accelerator achieves a performance of 9.51 ns/day for the Tersoff simulation in a 55,296-atom system, which is a 2.00<math>\\n <semantics>\\n <mrow>\\n <mo>×</mo>\\n </mrow>\\n <annotation>$$ \\\\times $$</annotation>\\n </semantics></math> increase in performance when compared to Intel I7-8700K and 1.70<math>\\n <semantics>\\n <mrow>\\n <mo>×</mo>\\n </mrow>\\n <annotation>$$ \\\\times $$</annotation>\\n </semantics></math> to NVIDIA Tesla K40c under the same test case. In addition, in terms of computational efficiency and power efficiency, the proposed accelerator achieves improvements of 2.00<math>\\n <semantics>\\n <mrow>\\n <mo>×</mo>\\n </mrow>\\n <annotation>$$ \\\\times $$</annotation>\\n </semantics></math> and 7.19<math>\\n <semantics>\\n <mrow>\\n <mo>×</mo>\\n </mrow>\\n <annotation>$$ \\\\times $$</annotation>\\n </semantics></math> compared to Intel I7-8700K, and 4.33<math>\\n <semantics>\\n <mrow>\\n <mo>×</mo>\\n </mrow>\\n <annotation>$$ \\\\times $$</annotation>\\n </semantics></math> and 2.11<math>\\n <semantics>\\n <mrow>\\n <mo>×</mo>\\n </mrow>\\n <annotation>$$ \\\\times $$</annotation>\\n </semantics></math> compared to NVIDIA Titan Xp, respectively.\",\"PeriodicalId\":72922,\"journal\":{\"name\":\"Engineering reports : open access\",\"volume\":\"7 1\",\"pages\":\"\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2023-05-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/eng2.12694\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Engineering reports : open access\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/eng2.12694\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering reports : open access","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/eng2.12694","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

分子动力学模拟是帮助人类了解微观世界的常用方法。传统的通用高性能计算平台存在计算效率和功耗低等问题，制约了大规模、长时间多体分子动力学模拟的实际应用。为了解决这些问题，基于现场可编程门阵列（FPGA）平台设计了一种新型的Tersoff势分子动力学加速器，该加速器可以使用FPGA加速LAMMPS。首先，提出了一种动态构建邻居列表的方法，减少了存储空间的使用。此外，为了使加速器能够灵活地部署在不同规模的fpga上，并获得良好的性能，还实现了多电平并行化。最后，建立了加速器的数学模型，并提出了利用模型确定最优性能参数的方法。实验结果表明，在Xilinx Alveo U200上进行测试时，所提出的加速器在55,296个原子的Tersoff系统中达到了9.51 ns/day的性能。在相同的测试用例下，与Intel I7-8700K相比，性能提高了2.00 × $$ \times $$，与NVIDIA Tesla K40c相比，性能提高了1.70 × $$ \times $$。此外，在计算效率和功耗效率方面，与Intel I7-8700K相比，所提出的加速器实现了2.00 × $$ \times $$和7.19 × $$ \times $$的改进。与NVIDIA Titan Xp相比，分别为4.33 × $$ \times $$和2.11 × $$ \times $$。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Field-programmable gate array acceleration of the Tersoff potential in LAMMPS

查看原文本刊更多论文

Field-programmable gate array acceleration of the Tersoff potential in LAMMPS

Molecular dynamics simulation is a common method to help humans understand the microscopic world. The traditional general-purpose high-performance computing platforms are hindered by low computational and power efficiency, constraining the practical application of large-scale and long-time many-body molecular dynamics simulations. In order to address these problems, a novel molecular dynamics accelerator for the Tersoff potential is designed based on field-programmable gate array (FPGA) platforms, which enables the acceleration of LAMMPS using FPGAs. Firstly, an on-the-fly method is proposed to build neighbor lists and reduce storage usage. Besides, multilevel parallelizations are implemented to enable the accelerator to be flexibly deployed on FPGAs of different scales and achieve good performance. Finally, mathematical models of the accelerator are built, and a method for using the models to determine the optimal-performance parameters is proposed. Experimental results show that, when tested on the Xilinx Alveo U200, the proposed accelerator achieves a performance of 9.51 ns/day for the Tersoff simulation in a 55,296-atom system, which is a 2.00 $\times$ increase in performance when compared to Intel I7-8700K and 1.70 $\times$ to NVIDIA Tesla K40c under the same test case. In addition, in terms of computational efficiency and power efficiency, the proposed accelerator achieves improvements of 2.00 $\times$ and 7.19 $\times$ compared to Intel I7-8700K, and 4.33 $\times$ and 2.11 $\times$ compared to NVIDIA Titan Xp, respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Engineering reports : open access

CiteScore

5.10

自引率

0.00%

发文量

审稿时长

19 weeks