K. Bowers, Edmond Chow, Huafeng Xu, R. Dror, M. Eastwood, Brent A. Gregersen, J. L. Klepeis, I. Kolossváry, Mark A. Moraes, Federico D. Sacerdoti, J. Salmon, Yibing Shan, D. Shaw
{"title":"Scalable Algorithms for Molecular Dynamics Simulations on Commodity Clusters","authors":"K. Bowers, Edmond Chow, Huafeng Xu, R. Dror, M. Eastwood, Brent A. Gregersen, J. L. Klepeis, I. Kolossváry, Mark A. Moraes, Federico D. Sacerdoti, J. Salmon, Yibing Shan, D. Shaw","doi":"10.1145/1188455.1188544","DOIUrl":null,"url":null,"abstract":"Although molecular dynamics (MD) simulations of biomolecular systems often run for days to months, many events of great scientific interest and pharmaceutical relevance occur on long time scales that remain beyond reach. We present several new algorithms and implementation techniques that significantly accelerate parallel MD simulations compared with current state-of-the-art codes. These include a novel parallel decomposition method and message-passing techniques that reduce communication requirements, as well as novel communication primitives that further reduce communication time. We have also developed numerical techniques that maintain high accuracy while using single precision computation in order to exploit processor-level vector instructions. These methods are embodied in a newly developed MD code called Desmond that achieves unprecedented simulation throughput and parallel scalability on commodity clusters. Our results suggest that Desmond's parallel performance substantially surpasses that of any previously described code. For example, on a standard benchmark, Desmond's performance on a conventional Opteron cluster with 2K processors slightly exceeded the reported performance of IBM's Blue Gene/L machine with 32K processors running its Blue Matter MD code","PeriodicalId":333909,"journal":{"name":"ACM/IEEE SC 2006 Conference (SC'06)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2234","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM/IEEE SC 2006 Conference (SC'06)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1188455.1188544","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2234
Abstract
Although molecular dynamics (MD) simulations of biomolecular systems often run for days to months, many events of great scientific interest and pharmaceutical relevance occur on long time scales that remain beyond reach. We present several new algorithms and implementation techniques that significantly accelerate parallel MD simulations compared with current state-of-the-art codes. These include a novel parallel decomposition method and message-passing techniques that reduce communication requirements, as well as novel communication primitives that further reduce communication time. We have also developed numerical techniques that maintain high accuracy while using single precision computation in order to exploit processor-level vector instructions. These methods are embodied in a newly developed MD code called Desmond that achieves unprecedented simulation throughput and parallel scalability on commodity clusters. Our results suggest that Desmond's parallel performance substantially surpasses that of any previously described code. For example, on a standard benchmark, Desmond's performance on a conventional Opteron cluster with 2K processors slightly exceeded the reported performance of IBM's Blue Gene/L machine with 32K processors running its Blue Matter MD code
尽管生物分子系统的分子动力学(MD)模拟通常运行数天至数月,但许多具有重大科学意义和与制药相关的事件发生在很长的时间尺度上,仍然遥不可及。我们提出了几种新的算法和实现技术,与当前最先进的代码相比,它们显著加快了并行MD模拟。其中包括一种新的并行分解方法和减少通信需求的消息传递技术,以及进一步减少通信时间的新的通信原语。我们还开发了数值技术,在使用单精度计算的同时保持高精度,以便利用处理器级矢量指令。这些方法体现在一个名为Desmond的新开发的MD代码中,该代码在商品集群上实现了前所未有的仿真吞吐量和并行可扩展性。我们的结果表明,Desmond的并行性能大大超过了以前描述的任何代码。例如,在标准基准测试中,Desmond在使用2K处理器的传统Opteron集群上的性能略高于使用32K处理器运行Blue Matter MD代码的IBM Blue Gene/L机器的性能