Renzhe Li, Jiaqi Wang, Akksay Singh, Bai Li, Zichen Song, Chuan Zhou, Lei Li
{"title":"使用梯度提升决策算法为原子中心神经网络电位自动选择特征","authors":"Renzhe Li, Jiaqi Wang, Akksay Singh, Bai Li, Zichen Song, Chuan Zhou, Lei Li","doi":"10.1021/acs.jctc.4c01176","DOIUrl":null,"url":null,"abstract":"<p><p>Atom-centered neural network (ANN) potentials have shown high accuracy and computational efficiency in modeling atomic systems. A crucial step in developing reliable ANN potentials is the proper selection of atom-centered symmetry functions (ACSFs), also known as atomic features, to describe atomic environments. Inappropriate selection of ACSFs can lead to poor-quality ANN potentials. Here, we propose a gradient boosting decision tree (GBDT)-based framework for the automatic selection of optimal ACSFs. This framework takes uniformly distributed sets of ACSFs as input and evaluates their relative importance. The ACSFs with high average importance scores are selected and used to train an ANN potential. We applied this method to the Ge system, resulting in an ANN potential with root-mean-square errors (RMSE) of 10.2 meV/atom for energy and 84.8 meV/Å for force predictions, utilizing only 18 ACSFs to achieve a balance between accuracy and computational efficiency. The framework is validated using the grid searching method, demonstrating that ACSFs selected with our framework are in the optimal region. Furthermore, we also compared our method with commonly used feature selection algorithms. The results show that our algorithm outperforms the others in terms of effectiveness and accuracy. This study highlights the significance of the ACSF parameter effect on the ANN performance and presents a promising method for automatic ACSF selection, facilitating the development of machine learning potentials.</p>","PeriodicalId":45,"journal":{"name":"Journal of Chemical Theory and Computation","volume":" ","pages":""},"PeriodicalIF":5.7000,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Automatic Feature Selection for Atom-Centered Neural Network Potentials Using a Gradient Boosting Decision Algorithm.\",\"authors\":\"Renzhe Li, Jiaqi Wang, Akksay Singh, Bai Li, Zichen Song, Chuan Zhou, Lei Li\",\"doi\":\"10.1021/acs.jctc.4c01176\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Atom-centered neural network (ANN) potentials have shown high accuracy and computational efficiency in modeling atomic systems. A crucial step in developing reliable ANN potentials is the proper selection of atom-centered symmetry functions (ACSFs), also known as atomic features, to describe atomic environments. Inappropriate selection of ACSFs can lead to poor-quality ANN potentials. Here, we propose a gradient boosting decision tree (GBDT)-based framework for the automatic selection of optimal ACSFs. This framework takes uniformly distributed sets of ACSFs as input and evaluates their relative importance. The ACSFs with high average importance scores are selected and used to train an ANN potential. We applied this method to the Ge system, resulting in an ANN potential with root-mean-square errors (RMSE) of 10.2 meV/atom for energy and 84.8 meV/Å for force predictions, utilizing only 18 ACSFs to achieve a balance between accuracy and computational efficiency. The framework is validated using the grid searching method, demonstrating that ACSFs selected with our framework are in the optimal region. Furthermore, we also compared our method with commonly used feature selection algorithms. The results show that our algorithm outperforms the others in terms of effectiveness and accuracy. This study highlights the significance of the ACSF parameter effect on the ANN performance and presents a promising method for automatic ACSF selection, facilitating the development of machine learning potentials.</p>\",\"PeriodicalId\":45,\"journal\":{\"name\":\"Journal of Chemical Theory and Computation\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":5.7000,\"publicationDate\":\"2024-11-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Chemical Theory and Computation\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://doi.org/10.1021/acs.jctc.4c01176\",\"RegionNum\":1,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"CHEMISTRY, PHYSICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Theory and Computation","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.jctc.4c01176","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0
摘要
以原子为中心的神经网络(ANN)势在原子系统建模中表现出很高的准确性和计算效率。开发可靠的原子中心神经网络势的关键步骤是正确选择原子中心对称函数(ACSF),也称为原子特征,以描述原子环境。不恰当地选择 ACSFs 会导致劣质的 ANN 电位。在此,我们提出了一种基于梯度提升决策树 (GBDT) 的框架,用于自动选择最佳 ACSF。该框架将均匀分布的 ACSF 作为输入,并评估它们的相对重要性。平均重要度得分高的 ACSF 将被选中并用于训练 ANN 势。我们将这一方法应用于 Ge 系统,结果只用了 18 个 ACSF,就得到了能量均方根误差 (RMSE) 为 10.2 meV/原子和力预测均方根误差 (RMSE) 为 84.8 meV/Å的 ANN 电位,从而实现了准确性和计算效率之间的平衡。我们使用网格搜索法对该框架进行了验证,结果表明用我们的框架选择的 ACSF 都处于最佳区域。此外,我们还将我们的方法与常用的特征选择算法进行了比较。结果表明,我们的算法在有效性和准确性方面都优于其他算法。这项研究强调了 ACSF 参数对 ANN 性能影响的重要性,并提出了一种很有前途的自动 ACSF 选择方法,促进了机器学习潜力的开发。
Automatic Feature Selection for Atom-Centered Neural Network Potentials Using a Gradient Boosting Decision Algorithm.
Atom-centered neural network (ANN) potentials have shown high accuracy and computational efficiency in modeling atomic systems. A crucial step in developing reliable ANN potentials is the proper selection of atom-centered symmetry functions (ACSFs), also known as atomic features, to describe atomic environments. Inappropriate selection of ACSFs can lead to poor-quality ANN potentials. Here, we propose a gradient boosting decision tree (GBDT)-based framework for the automatic selection of optimal ACSFs. This framework takes uniformly distributed sets of ACSFs as input and evaluates their relative importance. The ACSFs with high average importance scores are selected and used to train an ANN potential. We applied this method to the Ge system, resulting in an ANN potential with root-mean-square errors (RMSE) of 10.2 meV/atom for energy and 84.8 meV/Å for force predictions, utilizing only 18 ACSFs to achieve a balance between accuracy and computational efficiency. The framework is validated using the grid searching method, demonstrating that ACSFs selected with our framework are in the optimal region. Furthermore, we also compared our method with commonly used feature selection algorithms. The results show that our algorithm outperforms the others in terms of effectiveness and accuracy. This study highlights the significance of the ACSF parameter effect on the ANN performance and presents a promising method for automatic ACSF selection, facilitating the development of machine learning potentials.
期刊介绍:
The Journal of Chemical Theory and Computation invites new and original contributions with the understanding that, if accepted, they will not be published elsewhere. Papers reporting new theories, methodology, and/or important applications in quantum electronic structure, molecular dynamics, and statistical mechanics are appropriate for submission to this Journal. Specific topics include advances in or applications of ab initio quantum mechanics, density functional theory, design and properties of new materials, surface science, Monte Carlo simulations, solvation models, QM/MM calculations, biomolecular structure prediction, and molecular dynamics in the broadest sense including gas-phase dynamics, ab initio dynamics, biomolecular dynamics, and protein folding. The Journal does not consider papers that are straightforward applications of known methods including DFT and molecular dynamics. The Journal favors submissions that include advances in theory or methodology with applications to compelling problems.