{"title":"Running a Single Instruction Execution Stream to a Massively Parallelized Computational Operations","authors":"Nisha Agrawal, Abhishek Das, R. Pathak, M. Modani","doi":"10.1109/temsmet53515.2021.9768703","DOIUrl":null,"url":null,"abstract":"GROMACS for biochemical molecules simulations are being used extensively. GROMACS's performance is optimized over the years on various homogeneous as well as heterogeneous computing architectures. This paper focuses on the study of the behavior of Molecular Dynamics (MD) simulations using GROMACS on the PARAM Siddhi-AI system. The application performance is analyzed on CPUs (AMD EPYC) and GPUs (NVIDIA A100). For CPU-only runs, it is observed that the single-node performance is slightly better with OpenMPI when compared to threaded MPI. The combination of 16 MPI ranks with 8 OpenMP threads shows better single-node performance. The performance of multi-node CPU-only GROMACS runs increases by the factor of 1.1x with the increase in the number of nodes. For single-node GROMACS-GPU runs, all the forces (bonded, non-bonded, and PME) are offloaded to GPUs. However, in the case of multi-node GROMACS GPU runs, only bonded and non-bonded forces are offloaded to GPUs. For single-node runs, GROMACS-GPU shows ~18x better performance when compared to single-node CPU-only runs. Also for single-node runs, GROMACS-GPU performance is approximately ~3x better than that observed from accelerated GROMACS execution on 5 nodes.","PeriodicalId":170546,"journal":{"name":"2021 IEEE 2nd International Conference on Technology, Engineering, Management for Societal impact using Marketing, Entrepreneurship and Talent (TEMSMET)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 2nd International Conference on Technology, Engineering, Management for Societal impact using Marketing, Entrepreneurship and Talent (TEMSMET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/temsmet53515.2021.9768703","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
GROMACS for biochemical molecules simulations are being used extensively. GROMACS's performance is optimized over the years on various homogeneous as well as heterogeneous computing architectures. This paper focuses on the study of the behavior of Molecular Dynamics (MD) simulations using GROMACS on the PARAM Siddhi-AI system. The application performance is analyzed on CPUs (AMD EPYC) and GPUs (NVIDIA A100). For CPU-only runs, it is observed that the single-node performance is slightly better with OpenMPI when compared to threaded MPI. The combination of 16 MPI ranks with 8 OpenMP threads shows better single-node performance. The performance of multi-node CPU-only GROMACS runs increases by the factor of 1.1x with the increase in the number of nodes. For single-node GROMACS-GPU runs, all the forces (bonded, non-bonded, and PME) are offloaded to GPUs. However, in the case of multi-node GROMACS GPU runs, only bonded and non-bonded forces are offloaded to GPUs. For single-node runs, GROMACS-GPU shows ~18x better performance when compared to single-node CPU-only runs. Also for single-node runs, GROMACS-GPU performance is approximately ~3x better than that observed from accelerated GROMACS execution on 5 nodes.