Running a Single Instruction Execution Stream to a Massively Parallelized Computational Operations

Nisha Agrawal, Abhishek Das, R. Pathak, M. Modani
{"title":"Running a Single Instruction Execution Stream to a Massively Parallelized Computational Operations","authors":"Nisha Agrawal, Abhishek Das, R. Pathak, M. Modani","doi":"10.1109/temsmet53515.2021.9768703","DOIUrl":null,"url":null,"abstract":"GROMACS for biochemical molecules simulations are being used extensively. GROMACS's performance is optimized over the years on various homogeneous as well as heterogeneous computing architectures. This paper focuses on the study of the behavior of Molecular Dynamics (MD) simulations using GROMACS on the PARAM Siddhi-AI system. The application performance is analyzed on CPUs (AMD EPYC) and GPUs (NVIDIA A100). For CPU-only runs, it is observed that the single-node performance is slightly better with OpenMPI when compared to threaded MPI. The combination of 16 MPI ranks with 8 OpenMP threads shows better single-node performance. The performance of multi-node CPU-only GROMACS runs increases by the factor of 1.1x with the increase in the number of nodes. For single-node GROMACS-GPU runs, all the forces (bonded, non-bonded, and PME) are offloaded to GPUs. However, in the case of multi-node GROMACS GPU runs, only bonded and non-bonded forces are offloaded to GPUs. For single-node runs, GROMACS-GPU shows ~18x better performance when compared to single-node CPU-only runs. Also for single-node runs, GROMACS-GPU performance is approximately ~3x better than that observed from accelerated GROMACS execution on 5 nodes.","PeriodicalId":170546,"journal":{"name":"2021 IEEE 2nd International Conference on Technology, Engineering, Management for Societal impact using Marketing, Entrepreneurship and Talent (TEMSMET)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 2nd International Conference on Technology, Engineering, Management for Societal impact using Marketing, Entrepreneurship and Talent (TEMSMET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/temsmet53515.2021.9768703","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

GROMACS for biochemical molecules simulations are being used extensively. GROMACS's performance is optimized over the years on various homogeneous as well as heterogeneous computing architectures. This paper focuses on the study of the behavior of Molecular Dynamics (MD) simulations using GROMACS on the PARAM Siddhi-AI system. The application performance is analyzed on CPUs (AMD EPYC) and GPUs (NVIDIA A100). For CPU-only runs, it is observed that the single-node performance is slightly better with OpenMPI when compared to threaded MPI. The combination of 16 MPI ranks with 8 OpenMP threads shows better single-node performance. The performance of multi-node CPU-only GROMACS runs increases by the factor of 1.1x with the increase in the number of nodes. For single-node GROMACS-GPU runs, all the forces (bonded, non-bonded, and PME) are offloaded to GPUs. However, in the case of multi-node GROMACS GPU runs, only bonded and non-bonded forces are offloaded to GPUs. For single-node runs, GROMACS-GPU shows ~18x better performance when compared to single-node CPU-only runs. Also for single-node runs, GROMACS-GPU performance is approximately ~3x better than that observed from accelerated GROMACS execution on 5 nodes.
将单个指令执行流运行到大规模并行计算操作
GROMACS在生化分子模拟中的应用越来越广泛。GROMACS的性能多年来在各种同构和异构计算架构上得到了优化。本文主要研究了在PARAM Siddhi-AI系统上使用GROMACS进行分子动力学(MD)模拟的行为。在cpu (AMD EPYC)和gpu (NVIDIA A100)上分析了应用程序的性能。对于仅cpu运行,可以观察到,与线程MPI相比,OpenMPI的单节点性能略好一些。16个MPI排名和8个OpenMP线程的组合显示出更好的单节点性能。多节点纯cpu的GROMACS运行性能随着节点数量的增加而提高1.1倍。单机GROMACS-GPU运行时,所有的力(绑定、非绑定、PME)都被卸载到gpu上。但在运行多节点GROMACS GPU的情况下,仅将绑定力和非绑定力卸载到GPU上。对于单节点运行,GROMACS-GPU的性能比单节点cpu运行好18倍。同样,对于单节点运行,GROMACS- gpu性能比在5个节点上加速GROMACS执行时观察到的性能大约好3倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信