Chao Mei, Yanhua Sun, G. Zheng, Eric J. Bohm, L. Kalé, James C. Phillips, Christopher B. Harrison
{"title":"在具有多核优化的消息驱动运行时的千万亿次机器上启用和缩放1亿个原子的生物分子模拟","authors":"Chao Mei, Yanhua Sun, G. Zheng, Eric J. Bohm, L. Kalé, James C. Phillips, Christopher B. Harrison","doi":"10.1145/2063384.2063466","DOIUrl":null,"url":null,"abstract":"A 100-million-atom biomolecular simulation with NAMD is one of the three benchmarks for the NSF-funded sustainable petascale machine. Simulating this large molecular system on a petascale machine presents great challenges, including handling I/O, large memory footprint and getting good strong-scaling results. In this paper, we present parallel I/O techniques to enable the simulation. A new SMP model is designed to efficiently utilize ubiquitous wide multicore clusters by extending the Charm++ asynchronous message-driven runtime. We exploit node-aware techniques to optimize both the application and the underlying SMP runtime. Hierarchical load balancing is further exploited to scale NAMD to the full Jaguar PF Cray XT5 (224,076 cores) at Oak Ridge National Laboratory, both with and without PME full electrostatics, achieving 93% parallel efficiency (vs 6720 cores) at 9 ms per step for a simple cutoff calculation. Excellent scaling is also obtained on 65,536 cores of the Intrepid Blue Gene/P at Argonne National Laboratory.","PeriodicalId":358797,"journal":{"name":"2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"66","resultStr":"{\"title\":\"Enabling and scaling biomolecular simulations of 100 million atoms on petascale machines with a multicore-optimized message-driven runtime\",\"authors\":\"Chao Mei, Yanhua Sun, G. Zheng, Eric J. Bohm, L. Kalé, James C. Phillips, Christopher B. Harrison\",\"doi\":\"10.1145/2063384.2063466\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A 100-million-atom biomolecular simulation with NAMD is one of the three benchmarks for the NSF-funded sustainable petascale machine. Simulating this large molecular system on a petascale machine presents great challenges, including handling I/O, large memory footprint and getting good strong-scaling results. In this paper, we present parallel I/O techniques to enable the simulation. A new SMP model is designed to efficiently utilize ubiquitous wide multicore clusters by extending the Charm++ asynchronous message-driven runtime. We exploit node-aware techniques to optimize both the application and the underlying SMP runtime. Hierarchical load balancing is further exploited to scale NAMD to the full Jaguar PF Cray XT5 (224,076 cores) at Oak Ridge National Laboratory, both with and without PME full electrostatics, achieving 93% parallel efficiency (vs 6720 cores) at 9 ms per step for a simple cutoff calculation. Excellent scaling is also obtained on 65,536 cores of the Intrepid Blue Gene/P at Argonne National Laboratory.\",\"PeriodicalId\":358797,\"journal\":{\"name\":\"2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC)\",\"volume\":\"72 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-11-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"66\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2063384.2063466\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2063384.2063466","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Enabling and scaling biomolecular simulations of 100 million atoms on petascale machines with a multicore-optimized message-driven runtime
A 100-million-atom biomolecular simulation with NAMD is one of the three benchmarks for the NSF-funded sustainable petascale machine. Simulating this large molecular system on a petascale machine presents great challenges, including handling I/O, large memory footprint and getting good strong-scaling results. In this paper, we present parallel I/O techniques to enable the simulation. A new SMP model is designed to efficiently utilize ubiquitous wide multicore clusters by extending the Charm++ asynchronous message-driven runtime. We exploit node-aware techniques to optimize both the application and the underlying SMP runtime. Hierarchical load balancing is further exploited to scale NAMD to the full Jaguar PF Cray XT5 (224,076 cores) at Oak Ridge National Laboratory, both with and without PME full electrostatics, achieving 93% parallel efficiency (vs 6720 cores) at 9 ms per step for a simple cutoff calculation. Excellent scaling is also obtained on 65,536 cores of the Intrepid Blue Gene/P at Argonne National Laboratory.