可解释性加速:基于代用模型的集体变量用于增强采样

IF 5.5 1区 化学 Q2 CHEMISTRY, PHYSICAL
Sompriya Chatterjee,  and , Dhiman Ray*, 
{"title":"可解释性加速:基于代用模型的集体变量用于增强采样","authors":"Sompriya Chatterjee,&nbsp; and ,&nbsp;Dhiman Ray*,&nbsp;","doi":"10.1021/acs.jctc.4c0160310.1021/acs.jctc.4c01603","DOIUrl":null,"url":null,"abstract":"<p >Most enhanced sampling methods facilitate the exploration of molecular free energy landscapes by applying a bias potential along a reduced dimensional collective variable (CV) space. The success of these methods depends on the ability of the CVs to follow the relevant slow modes of the system. Intuitive CVs, such as distances or contacts, often prove inadequate, particularly in biological systems involving many coupled degrees of freedom. Machine learning algorithms, especially neural networks (NN), can automate the process of CV discovery by combining a large number of molecular descriptors and often outperform intuitive CVs in sampling efficiency. However, their lack of interpretability and high cost of evaluation during trajectory propagation make NN-CVs difficult to apply to large biomolecular processes. Here, we introduce a surrogate model approach using lasso regression to express the output of a neural network as a linear combination of an automatically chosen subset of the input descriptors. We demonstrate successful applications of our surrogate model CVs in the enhanced sampling simulation of the conformational landscape of alanine dipeptide and chignolin mini-protein. In addition to providing mechanistic insights due to their explainable nature, the surrogate model CVs showed a negligible loss in efficiency and accuracy, compared to the NN-CVs, in reconstructing the underlying free energy surface. Moreover, due to their simplified functional forms, these CVs are better at extrapolating to unseen regions of the conformational space, e.g., saddle points. Surrogate model CVs are also less expensive to evaluate compared to their NN counterparts, making them suitable for enhanced sampling simulation of large and complex biomolecular processes.</p>","PeriodicalId":45,"journal":{"name":"Journal of Chemical Theory and Computation","volume":"21 4","pages":"1561–1571 1561–1571"},"PeriodicalIF":5.5000,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Acceleration with Interpretability: A Surrogate Model-Based Collective Variable for Enhanced Sampling\",\"authors\":\"Sompriya Chatterjee,&nbsp; and ,&nbsp;Dhiman Ray*,&nbsp;\",\"doi\":\"10.1021/acs.jctc.4c0160310.1021/acs.jctc.4c01603\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p >Most enhanced sampling methods facilitate the exploration of molecular free energy landscapes by applying a bias potential along a reduced dimensional collective variable (CV) space. The success of these methods depends on the ability of the CVs to follow the relevant slow modes of the system. Intuitive CVs, such as distances or contacts, often prove inadequate, particularly in biological systems involving many coupled degrees of freedom. Machine learning algorithms, especially neural networks (NN), can automate the process of CV discovery by combining a large number of molecular descriptors and often outperform intuitive CVs in sampling efficiency. However, their lack of interpretability and high cost of evaluation during trajectory propagation make NN-CVs difficult to apply to large biomolecular processes. Here, we introduce a surrogate model approach using lasso regression to express the output of a neural network as a linear combination of an automatically chosen subset of the input descriptors. We demonstrate successful applications of our surrogate model CVs in the enhanced sampling simulation of the conformational landscape of alanine dipeptide and chignolin mini-protein. In addition to providing mechanistic insights due to their explainable nature, the surrogate model CVs showed a negligible loss in efficiency and accuracy, compared to the NN-CVs, in reconstructing the underlying free energy surface. Moreover, due to their simplified functional forms, these CVs are better at extrapolating to unseen regions of the conformational space, e.g., saddle points. Surrogate model CVs are also less expensive to evaluate compared to their NN counterparts, making them suitable for enhanced sampling simulation of large and complex biomolecular processes.</p>\",\"PeriodicalId\":45,\"journal\":{\"name\":\"Journal of Chemical Theory and Computation\",\"volume\":\"21 4\",\"pages\":\"1561–1571 1561–1571\"},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2025-02-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Chemical Theory and Computation\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://pubs.acs.org/doi/10.1021/acs.jctc.4c01603\",\"RegionNum\":1,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"CHEMISTRY, PHYSICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Theory and Computation","FirstCategoryId":"92","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acs.jctc.4c01603","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0

摘要

大多数增强的采样方法通过沿降维集体变量(CV)空间施加偏置电位来促进分子自由能景观的探索。这些方法的成功取决于CVs遵循系统相关慢速模式的能力。直观的cv,如距离或接触,往往被证明是不够的,特别是在涉及许多耦合自由度的生物系统中。机器学习算法,特别是神经网络(NN),可以通过结合大量分子描述符来自动化CV发现过程,并且通常在采样效率上优于直观的CV。然而,它们在轨迹传播过程中缺乏可解释性和高评估成本使得nn - cv难以应用于大型生物分子过程。在这里,我们引入了一种使用lasso回归的代理模型方法,将神经网络的输出表示为输入描述符的自动选择子集的线性组合。我们展示了我们的替代模型CVs在丙氨酸二肽和毛木质素迷你蛋白构象景观的增强采样模拟中的成功应用。由于其可解释的性质,除了提供机理见解外,与nn - cv相比,替代模型cv在重建底层自由能面方面的效率和准确性损失可以忽略不计。此外,由于其简化的功能形式,这些cv可以更好地外推到构象空间的未见区域,例如鞍点。与神经网络相比,替代模型CVs的评估成本也更低,这使得它们适用于大型复杂生物分子过程的增强采样模拟。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Acceleration with Interpretability: A Surrogate Model-Based Collective Variable for Enhanced Sampling

Acceleration with Interpretability: A Surrogate Model-Based Collective Variable for Enhanced Sampling

Most enhanced sampling methods facilitate the exploration of molecular free energy landscapes by applying a bias potential along a reduced dimensional collective variable (CV) space. The success of these methods depends on the ability of the CVs to follow the relevant slow modes of the system. Intuitive CVs, such as distances or contacts, often prove inadequate, particularly in biological systems involving many coupled degrees of freedom. Machine learning algorithms, especially neural networks (NN), can automate the process of CV discovery by combining a large number of molecular descriptors and often outperform intuitive CVs in sampling efficiency. However, their lack of interpretability and high cost of evaluation during trajectory propagation make NN-CVs difficult to apply to large biomolecular processes. Here, we introduce a surrogate model approach using lasso regression to express the output of a neural network as a linear combination of an automatically chosen subset of the input descriptors. We demonstrate successful applications of our surrogate model CVs in the enhanced sampling simulation of the conformational landscape of alanine dipeptide and chignolin mini-protein. In addition to providing mechanistic insights due to their explainable nature, the surrogate model CVs showed a negligible loss in efficiency and accuracy, compared to the NN-CVs, in reconstructing the underlying free energy surface. Moreover, due to their simplified functional forms, these CVs are better at extrapolating to unseen regions of the conformational space, e.g., saddle points. Surrogate model CVs are also less expensive to evaluate compared to their NN counterparts, making them suitable for enhanced sampling simulation of large and complex biomolecular processes.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Chemical Theory and Computation
Journal of Chemical Theory and Computation 化学-物理:原子、分子和化学物理
CiteScore
9.90
自引率
16.40%
发文量
568
审稿时长
1 months
期刊介绍: The Journal of Chemical Theory and Computation invites new and original contributions with the understanding that, if accepted, they will not be published elsewhere. Papers reporting new theories, methodology, and/or important applications in quantum electronic structure, molecular dynamics, and statistical mechanics are appropriate for submission to this Journal. Specific topics include advances in or applications of ab initio quantum mechanics, density functional theory, design and properties of new materials, surface science, Monte Carlo simulations, solvation models, QM/MM calculations, biomolecular structure prediction, and molecular dynamics in the broadest sense including gas-phase dynamics, ab initio dynamics, biomolecular dynamics, and protein folding. The Journal does not consider papers that are straightforward applications of known methods including DFT and molecular dynamics. The Journal favors submissions that include advances in theory or methodology with applications to compelling problems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信