Feature selection for high-dimensional neural network potentials with the adaptive group lasso

IF 4.6 2区物理与天体物理 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Learning Science and Technology Pub Date : 2024-05-16 DOI:10.1088/2632-2153/ad450e

Johannes Sandberg, Thomas Voigtmann, Emilie Devijver and Noel Jakse

{"title":"Feature selection for high-dimensional neural network potentials with the adaptive group lasso","authors":"Johannes Sandberg, Thomas Voigtmann, Emilie Devijver and Noel Jakse","doi":"10.1088/2632-2153/ad450e","DOIUrl":null,"url":null,"abstract":"Neural network potentials are a powerful tool for atomistic simulations, allowing to accurately reproduce ab initio potential energy surfaces with computational performance approaching classical force fields. A central component of such potentials is the transformation of atomic positions into a set of atomic features in a most efficient and informative way. In this work, a feature selection method is introduced for high dimensional neural network potentials, based on the adaptive group lasso (AGL) approach. It is shown that the use of an embedded method, taking into account the interplay between features and their action in the estimator, is necessary to optimize the number of features. The method’s efficiency is tested on three different monoatomic systems, including Lennard–Jones as a simple test case, Aluminium as a system characterized by predominantly radial interactions, and Boron as representative of a system with strongly directional components in the interactions. The AGL is compared with unsupervised filter methods and found to perform consistently better in reducing the number of features needed to reproduce the reference simulation data at a similar level of accuracy as the starting feature set. In particular, our results show the importance of taking into account model predictions in feature selection for interatomic potentials.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":"51 1","pages":""},"PeriodicalIF":4.6000,"publicationDate":"2024-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Learning Science and Technology","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.1088/2632-2153/ad450e","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Neural network potentials are a powerful tool for atomistic simulations, allowing to accurately reproduce ab initio potential energy surfaces with computational performance approaching classical force fields. A central component of such potentials is the transformation of atomic positions into a set of atomic features in a most efficient and informative way. In this work, a feature selection method is introduced for high dimensional neural network potentials, based on the adaptive group lasso (AGL) approach. It is shown that the use of an embedded method, taking into account the interplay between features and their action in the estimator, is necessary to optimize the number of features. The method’s efficiency is tested on three different monoatomic systems, including Lennard–Jones as a simple test case, Aluminium as a system characterized by predominantly radial interactions, and Boron as representative of a system with strongly directional components in the interactions. The AGL is compared with unsupervised filter methods and found to perform consistently better in reducing the number of features needed to reproduce the reference simulation data at a similar level of accuracy as the starting feature set. In particular, our results show the importance of taking into account model predictions in feature selection for interatomic potentials.

查看原文本刊更多论文

利用自适应群套索为高维神经网络电位进行特征选择

神经网络势能是原子模拟的强大工具，可以精确再现原子势能面，计算性能接近经典力场。这类势能的一个核心组成部分是以最有效、信息量最大的方式将原子位置转换成一组原子特征。在这项工作中，基于自适应群套索（AGL）方法，为高维神经网络势能引入了一种特征选择方法。研究表明，考虑到特征之间的相互作用及其在估计器中的作用，有必要使用嵌入式方法来优化特征数量。该方法的效率在三个不同的单原子系统上进行了测试，包括作为简单测试案例的伦纳德-琼斯系统、主要以径向相互作用为特征的铝系统，以及代表相互作用具有强烈方向性成分的硼系统。我们将 AGL 与无监督滤波方法进行了比较，结果发现 AGL 在减少重现参考模拟数据所需的特征数量方面一直表现较好，其准确度与起始特征集的准确度相当。特别是，我们的结果表明了在原子间电位的特征选择中考虑模型预测的重要性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Machine Learning Science and Technology Computer Science-Artificial Intelligence

CiteScore

9.10

自引率

4.40%

发文量

审稿时长

5 weeks

期刊介绍： Machine Learning Science and Technology is a multidisciplinary open access journal that bridges the application of machine learning across the sciences with advances in machine learning methods and theory as motivated by physical insights. Specifically, articles must fall into one of the following categories: advance the state of machine learning-driven applications in the sciences or make conceptual, methodological or theoretical advances in machine learning with applications to, inspiration from, or motivated by scientific problems.