Performance Analyses of a Parallel Verlet Neighbor List Algorithm for GPU-Optimized MD Simulations

Tyson J. Lipscomb, Anqi Zou, Samuel S. Cho
{"title":"Performance Analyses of a Parallel Verlet Neighbor List Algorithm for GPU-Optimized MD Simulations","authors":"Tyson J. Lipscomb, Anqi Zou, Samuel S. Cho","doi":"10.1145/2382936.2382977","DOIUrl":null,"url":null,"abstract":"Molecular dynamics (MD) simulations provide a molecular-resolution physical description of the folding and assembly processes, but the size and the timescales of simulations are limited because the underlying algorithm is computationally demanding. We recently introduced a parallel neighbor list algorithm that was specifically optimized for MD simulations on GPUs. In our present study, we analyze the performance of the algorithm in our MD simulation software, and we observe that the major of the overall execution time is spent performing the force calculations and the evaluation of the neighbor list and pair lists. The overall speedup of the GPU-optimized MD simulations as compared to the CPU-optimized version is N-dependent and ~30x for the full 70s ribosome (10,219 beads). The pair and neighbor list evaluations have performance speedups of ~25x and ~55x, respectively. We then make direct How biomolecules fold and assemble into well-defined structures that correspond to cellular functions is a fundamental problem in biophysics with direct biomedical application because some functions lead to diseases such as Alzheimer's, Parkinson's, and cancer. Molecular dynamics (MD) simulations provide a molecular-resolution physical description of the folding and assembly processes, but the computational demands of the algorithms restrict the size and the timescales one can simulate. In a recent study, we introduced a parallel neighbor list algorithm that was specifically optimized for MD simulations on GPUs. We now analyze the performance of our MD simulation code that incorporates the algorithm, and we observe that the force calculations and the evaluation of the neighbor list and pair lists constitutes a majority of the overall execution time. The overall speedup of the GPU-optimized MD simulations as compared to the CPU-optimized version is N-dependent and ~30x for the full 70s ribosome (10,219 beads). The pair and neighbor list evaluations have performance speedups of ~25x and ~55x, respectively. We then make direct comparisons with the performance of our MD simulation code with that of the SOP model implemented in the simulation code of HOOMD, a leading general particle dynamics simulation package that is specifically optimized for GPUs.","PeriodicalId":146495,"journal":{"name":"2012 ASE/IEEE International Conference on BioMedical Computing (BioMedCom)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 ASE/IEEE International Conference on BioMedical Computing (BioMedCom)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2382936.2382977","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16

Abstract

Molecular dynamics (MD) simulations provide a molecular-resolution physical description of the folding and assembly processes, but the size and the timescales of simulations are limited because the underlying algorithm is computationally demanding. We recently introduced a parallel neighbor list algorithm that was specifically optimized for MD simulations on GPUs. In our present study, we analyze the performance of the algorithm in our MD simulation software, and we observe that the major of the overall execution time is spent performing the force calculations and the evaluation of the neighbor list and pair lists. The overall speedup of the GPU-optimized MD simulations as compared to the CPU-optimized version is N-dependent and ~30x for the full 70s ribosome (10,219 beads). The pair and neighbor list evaluations have performance speedups of ~25x and ~55x, respectively. We then make direct How biomolecules fold and assemble into well-defined structures that correspond to cellular functions is a fundamental problem in biophysics with direct biomedical application because some functions lead to diseases such as Alzheimer's, Parkinson's, and cancer. Molecular dynamics (MD) simulations provide a molecular-resolution physical description of the folding and assembly processes, but the computational demands of the algorithms restrict the size and the timescales one can simulate. In a recent study, we introduced a parallel neighbor list algorithm that was specifically optimized for MD simulations on GPUs. We now analyze the performance of our MD simulation code that incorporates the algorithm, and we observe that the force calculations and the evaluation of the neighbor list and pair lists constitutes a majority of the overall execution time. The overall speedup of the GPU-optimized MD simulations as compared to the CPU-optimized version is N-dependent and ~30x for the full 70s ribosome (10,219 beads). The pair and neighbor list evaluations have performance speedups of ~25x and ~55x, respectively. We then make direct comparisons with the performance of our MD simulation code with that of the SOP model implemented in the simulation code of HOOMD, a leading general particle dynamics simulation package that is specifically optimized for GPUs.
gpu优化MD仿真中并行Verlet邻居表算法的性能分析
分子动力学(MD)模拟提供了折叠和组装过程的分子分辨率物理描述,但模拟的大小和时间尺度受到限制,因为底层算法的计算要求很高。我们最近介绍了一种并行邻居列表算法,专门针对gpu上的MD模拟进行了优化。在我们目前的研究中,我们分析了算法在我们的MD仿真软件中的性能,我们观察到总体执行时间的大部分用于执行力计算以及邻居列表和对列表的评估。与cpu优化版本相比,gpu优化版本的MD模拟的总体加速依赖于n,并且对于完整的70个核糖体(10,219个珠子)来说,速度提高了约30倍。对和邻居列表的评估分别有~25倍和~55倍的性能提升。生物分子如何折叠并组装成与细胞功能相对应的定义良好的结构是生物物理学中具有直接生物医学应用的基本问题,因为一些功能导致诸如阿尔茨海默氏症,帕金森病和癌症等疾病。分子动力学(MD)模拟提供了折叠和组装过程的分子分辨率物理描述,但算法的计算需求限制了可以模拟的大小和时间尺度。在最近的一项研究中,我们介绍了一种并行邻居列表算法,该算法专门针对gpu上的MD模拟进行了优化。现在,我们分析了包含该算法的MD仿真代码的性能,我们观察到力计算以及邻居列表和对列表的评估占了总体执行时间的大部分。与cpu优化版本相比,gpu优化版本的MD模拟的总体加速依赖于n,并且对于完整的70个核糖体(10,219个珠子)来说,速度提高了约30倍。对和邻居列表的评估分别有~25倍和~55倍的性能提升。然后,我们将MD仿真代码的性能与HOOMD仿真代码中实现的SOP模型的性能进行直接比较,HOOMD是专门为gpu优化的领先的通用粒子动力学仿真包。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信