{"title":"新型 Sunway 超级计算机的 3500 万核上实现了具有 Ab Initio 精确度的 290 亿原子分子动力学模拟","authors":"Xun Wang;Xiangyu Meng;Zhuoqiang Guo;Mingzhen Li;Lijun Liu;Mingfan Li;Qian Xiao;Tong Zhao;Ninghui Sun;Guangming Tan;Weile Jia","doi":"10.1109/TC.2025.3540646","DOIUrl":null,"url":null,"abstract":"Physical phenomena such as bond breaking and phase transitions require molecular dynamics (MD) with <i>ab initio</i> accuracy, involving up to billions of atoms and over nanosecond timescales. Previous state-of-the-art work has demonstrated that neural network molecular dynamics (NNMD) like deep potential molecular dynamics (DeePMD), can successfully extend the temporal and spatial scales of MD with <i>ab initio</i> accuracy on both ARM and GPU platforms. However, the DeePMD-kit package is currently unable to fully exploit the computational potential of the new Sunway supercomputer due to its unique many-core architecture, memory hierarchy, and low precision capability. In this paper, we re-design the DeePMD-kit to harness the massive computing power of the new Sunway, enabling the MD with over ten billion atoms. We first design a large-scale parallelization scheme to exploit the massive parallelism of the new Sunway. Then we devise specialized optimizations for the time-consuming operators. Finally, we design a novel mixed precision method for DeePMD-kit customized operators to leverage the low precision computing power of the new Sunway. The optimized DeePMD-kit achieves 67.6 / 56.5 <inline-formula><tex-math>$\\boldsymbol{\\times}$</tex-math></inline-formula> speedup for water / copper systems on the new Sunway. Meanwhile, it can perform 29 billion atoms simulation for the water system on 35 million cores (i.e., 90,000 computing nodes, around 84% of the whole supercomputer) with a peak performance of 57.1 PFLOPs, which is 7.9<inline-formula><tex-math>$\\boldsymbol{\\times}$</tex-math></inline-formula> bigger and 1.2<inline-formula><tex-math>$\\boldsymbol{\\times}$</tex-math></inline-formula> faster than state-of-the-art results. This paves the way for investigating more realistic scenarios, such as studying the mechanical properties of metals, semiconductor devices, batteries, and other materials and physical systems.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 5","pages":"1634-1648"},"PeriodicalIF":3.6000,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"29-Billion Atoms Molecular Dynamics Simulation With Ab Initio Accuracy on 35 Million Cores of New Sunway Supercomputer\",\"authors\":\"Xun Wang;Xiangyu Meng;Zhuoqiang Guo;Mingzhen Li;Lijun Liu;Mingfan Li;Qian Xiao;Tong Zhao;Ninghui Sun;Guangming Tan;Weile Jia\",\"doi\":\"10.1109/TC.2025.3540646\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Physical phenomena such as bond breaking and phase transitions require molecular dynamics (MD) with <i>ab initio</i> accuracy, involving up to billions of atoms and over nanosecond timescales. Previous state-of-the-art work has demonstrated that neural network molecular dynamics (NNMD) like deep potential molecular dynamics (DeePMD), can successfully extend the temporal and spatial scales of MD with <i>ab initio</i> accuracy on both ARM and GPU platforms. However, the DeePMD-kit package is currently unable to fully exploit the computational potential of the new Sunway supercomputer due to its unique many-core architecture, memory hierarchy, and low precision capability. In this paper, we re-design the DeePMD-kit to harness the massive computing power of the new Sunway, enabling the MD with over ten billion atoms. We first design a large-scale parallelization scheme to exploit the massive parallelism of the new Sunway. Then we devise specialized optimizations for the time-consuming operators. Finally, we design a novel mixed precision method for DeePMD-kit customized operators to leverage the low precision computing power of the new Sunway. The optimized DeePMD-kit achieves 67.6 / 56.5 <inline-formula><tex-math>$\\\\boldsymbol{\\\\times}$</tex-math></inline-formula> speedup for water / copper systems on the new Sunway. Meanwhile, it can perform 29 billion atoms simulation for the water system on 35 million cores (i.e., 90,000 computing nodes, around 84% of the whole supercomputer) with a peak performance of 57.1 PFLOPs, which is 7.9<inline-formula><tex-math>$\\\\boldsymbol{\\\\times}$</tex-math></inline-formula> bigger and 1.2<inline-formula><tex-math>$\\\\boldsymbol{\\\\times}$</tex-math></inline-formula> faster than state-of-the-art results. This paves the way for investigating more realistic scenarios, such as studying the mechanical properties of metals, semiconductor devices, batteries, and other materials and physical systems.\",\"PeriodicalId\":13087,\"journal\":{\"name\":\"IEEE Transactions on Computers\",\"volume\":\"74 5\",\"pages\":\"1634-1648\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2025-02-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Computers\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10880101/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computers","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10880101/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
29-Billion Atoms Molecular Dynamics Simulation With Ab Initio Accuracy on 35 Million Cores of New Sunway Supercomputer
Physical phenomena such as bond breaking and phase transitions require molecular dynamics (MD) with ab initio accuracy, involving up to billions of atoms and over nanosecond timescales. Previous state-of-the-art work has demonstrated that neural network molecular dynamics (NNMD) like deep potential molecular dynamics (DeePMD), can successfully extend the temporal and spatial scales of MD with ab initio accuracy on both ARM and GPU platforms. However, the DeePMD-kit package is currently unable to fully exploit the computational potential of the new Sunway supercomputer due to its unique many-core architecture, memory hierarchy, and low precision capability. In this paper, we re-design the DeePMD-kit to harness the massive computing power of the new Sunway, enabling the MD with over ten billion atoms. We first design a large-scale parallelization scheme to exploit the massive parallelism of the new Sunway. Then we devise specialized optimizations for the time-consuming operators. Finally, we design a novel mixed precision method for DeePMD-kit customized operators to leverage the low precision computing power of the new Sunway. The optimized DeePMD-kit achieves 67.6 / 56.5 $\boldsymbol{\times}$ speedup for water / copper systems on the new Sunway. Meanwhile, it can perform 29 billion atoms simulation for the water system on 35 million cores (i.e., 90,000 computing nodes, around 84% of the whole supercomputer) with a peak performance of 57.1 PFLOPs, which is 7.9$\boldsymbol{\times}$ bigger and 1.2$\boldsymbol{\times}$ faster than state-of-the-art results. This paves the way for investigating more realistic scenarios, such as studying the mechanical properties of metals, semiconductor devices, batteries, and other materials and physical systems.
期刊介绍:
The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field. It publishes papers on research in areas of current interest to the readers. These areas include, but are not limited to, the following: a) computer organizations and architectures; b) operating systems, software systems, and communication protocols; c) real-time systems and embedded systems; d) digital devices, computer components, and interconnection networks; e) specification, design, prototyping, and testing methods and tools; f) performance, fault tolerance, reliability, security, and testability; g) case studies and experimental and theoretical evaluations; and h) new and important applications and trends.