MapReduce-Based RESTMD: Enabling Large-Scale Sampling Tasks with Distributed HPC Systems

Praveenkumar Kondikoppa, Richard Platania, Seung-Jong Park, Tom Keyes, Jaegil Kim, Nayong Kim, Joohyun Kim, Shuju Bai
{"title":"MapReduce-Based RESTMD: Enabling Large-Scale Sampling Tasks with Distributed HPC Systems","authors":"Praveenkumar Kondikoppa, Richard Platania, Seung-Jong Park, Tom Keyes, Jaegil Kim, Nayong Kim, Joohyun Kim, Shuju Bai","doi":"10.1109/IWSG.2014.12","DOIUrl":null,"url":null,"abstract":"A novel implementation of Replica Exchange Statistical Temperature Molecular Dynamics (RESTMD), belonging to a generalized ensemble method and also known as parallel tempering, is presented. Our implementation employs the MapReduce (MR)-based iterative framework for launching RESTMD over high performance computing (HPC) clusters including our test bed system, Cyber-infrastructure for Reconfigurable Optical Networks (CRON) simulating a network-connected distributed system. Our main contribution is a new implementation of STMD plugged into the well-known CHARMM molecular dynamics package as well as the RESTMD implementation powered by the Hadoop that scales out in a cluster and across distributed systems effectively. To address challenges for the use of Hadoop MapReduce, we examined contributing factors on the performance of the proposed framework with various runtime analysis experiments with two biological systems that differ in size and over different types of HPC resources. Many advantages with the use of RESTMD suggest its effectiveness for enhanced sampling, one of grand challenges in a variety of areas of studies ranging from chemical systems to statistical inference. Lastly, with its support for scale-across capacity over distributed computing infrastructure (DCI) and the use of Hadoop for coarse-grained task-level parallelism, MapReduce-based RESTMD represents truly a good example of the next-generation of applications whose provision is increasingly becoming demanded by science gateway projects, in particular, backed by IaaS clouds.","PeriodicalId":342494,"journal":{"name":"2014 6th International Workshop on Science Gateways","volume":"69 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 6th International Workshop on Science Gateways","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IWSG.2014.12","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

A novel implementation of Replica Exchange Statistical Temperature Molecular Dynamics (RESTMD), belonging to a generalized ensemble method and also known as parallel tempering, is presented. Our implementation employs the MapReduce (MR)-based iterative framework for launching RESTMD over high performance computing (HPC) clusters including our test bed system, Cyber-infrastructure for Reconfigurable Optical Networks (CRON) simulating a network-connected distributed system. Our main contribution is a new implementation of STMD plugged into the well-known CHARMM molecular dynamics package as well as the RESTMD implementation powered by the Hadoop that scales out in a cluster and across distributed systems effectively. To address challenges for the use of Hadoop MapReduce, we examined contributing factors on the performance of the proposed framework with various runtime analysis experiments with two biological systems that differ in size and over different types of HPC resources. Many advantages with the use of RESTMD suggest its effectiveness for enhanced sampling, one of grand challenges in a variety of areas of studies ranging from chemical systems to statistical inference. Lastly, with its support for scale-across capacity over distributed computing infrastructure (DCI) and the use of Hadoop for coarse-grained task-level parallelism, MapReduce-based RESTMD represents truly a good example of the next-generation of applications whose provision is increasingly becoming demanded by science gateway projects, in particular, backed by IaaS clouds.
基于mapreduce的RESTMD:支持分布式HPC系统的大规模采样任务
提出了一种新的副本交换统计温度分子动力学(RESTMD)实现方法,属于广义集成方法,也称为并行回火。我们的实现采用基于MapReduce (MR)的迭代框架,在高性能计算(HPC)集群上启动RESTMD,包括我们的测试平台系统,可重构光网络的网络基础设施(CRON),模拟网络连接的分布式系统。我们的主要贡献是将STMD的新实现插入到著名的CHARMM分子动力学包中,以及由Hadoop支持的RESTMD实现,该实现可以在集群和分布式系统中有效地向外扩展。为了解决使用Hadoop MapReduce所面临的挑战,我们在两个不同大小和不同类型HPC资源的生物系统上进行了各种运行时分析实验,研究了影响所提议框架性能的因素。使用RESTMD的许多优点表明它在增强采样方面的有效性,这是从化学系统到统计推断等各种研究领域的重大挑战之一。最后,基于mapreduce的RESTMD支持分布式计算基础设施(DCI)上的跨容量扩展,并使用Hadoop实现粗粒度的任务级并行,它代表了下一代应用程序的一个很好的例子,这些应用程序的提供越来越多地成为科学网关项目的需求,特别是由IaaS云支持的应用程序。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信