Qingcai Jiang, Jielan Li, Junshi Chen, Xinming Qin, Lingyun Wan, Jinlong Yang, Jie Liu, Wei Hu, Hong An
{"title":"Accelerating Parallel First-Principles Excited-State Calculation by Low-Rank Approximation with K-Means Clustering","authors":"Qingcai Jiang, Jielan Li, Junshi Chen, Xinming Qin, Lingyun Wan, Jinlong Yang, Jie Liu, Wei Hu, Hong An","doi":"10.1145/3545008.3545092","DOIUrl":null,"url":null,"abstract":"First-principles time-dependent density functional theory (TDDFT) is a powerful tool to accurately describe the excited-state properties of molecules and solids in condensed matter physics, computational chemistry and materials science. However, a perceived drawback in TDDFT calculations is its ultrahigh computational cost and large memory usage especially for plane-wave basis set, confining its applications to large systems containing thousands of atoms. Here, we present a massively parallel implementation of linear-response TDDFT (LR-TDDFT) and reduce the complexity to by combining K-Means clustering based low-rank approximation with iterative eigensolve algorithm. Furthermore, we carefully design the parallel data and task distribution schemes to accommodate with the physical nature in different steps of the computation, also, several optimization methods are employed to effectively handle the matrix operations and data communications of constructing and diagonalizing the LR-TDDFT Hamiltonian. In particular, our method can significantly reduce the cost of computation and memory by nearly 2 orders of magnitude compared to conventional LR-TDDFT calculations. Numerical results demonstrate that our implementation can gain an overall speedup of 10x and efficiently scale up to 12,288 CPU cores for large systems up to 4,096 atoms within dozens of seconds.","PeriodicalId":360504,"journal":{"name":"Proceedings of the 51st International Conference on Parallel Processing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 51st International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3545008.3545092","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
First-principles time-dependent density functional theory (TDDFT) is a powerful tool to accurately describe the excited-state properties of molecules and solids in condensed matter physics, computational chemistry and materials science. However, a perceived drawback in TDDFT calculations is its ultrahigh computational cost and large memory usage especially for plane-wave basis set, confining its applications to large systems containing thousands of atoms. Here, we present a massively parallel implementation of linear-response TDDFT (LR-TDDFT) and reduce the complexity to by combining K-Means clustering based low-rank approximation with iterative eigensolve algorithm. Furthermore, we carefully design the parallel data and task distribution schemes to accommodate with the physical nature in different steps of the computation, also, several optimization methods are employed to effectively handle the matrix operations and data communications of constructing and diagonalizing the LR-TDDFT Hamiltonian. In particular, our method can significantly reduce the cost of computation and memory by nearly 2 orders of magnitude compared to conventional LR-TDDFT calculations. Numerical results demonstrate that our implementation can gain an overall speedup of 10x and efficiently scale up to 12,288 CPU cores for large systems up to 4,096 atoms within dozens of seconds.