AMUSET-TICA: A Tensor-Based Approach for Identifying Slow Collective Variables in Biomolecular Dynamics.

IF 5.7 1区 化学 Q2 CHEMISTRY, PHYSICAL
Siqin Cao,Feliks Nüske,Bojun Liu,Micheline B Soley,Xuhui Huang
{"title":"AMUSET-TICA: A Tensor-Based Approach for Identifying Slow Collective Variables in Biomolecular Dynamics.","authors":"Siqin Cao,Feliks Nüske,Bojun Liu,Micheline B Soley,Xuhui Huang","doi":"10.1021/acs.jctc.5c00076","DOIUrl":null,"url":null,"abstract":"Elucidating collective variables (CVs) for biomolecular dynamics is crucial for understanding numerous biological processes. By leveraging the tensor-train data structure, a multilinear version of the AMUSE (Algorithm for Multiple Unknown Signals) algorithm for Koopman approximation (AMUSEt) was recently developed to identify CVs for biomolecular dynamics. To find slow CVs, AMUSEt transforms input features (e.g., pairwise atomic distances) into nonlinear basis functions (e.g., Gaussian functions) and encodes these nonlinear basis functions within a tensor-train structure via time-lagged correlation functions. Due to the need to fit these tensor-train data structures into computer memory, AMUSEt can handle only a limited number of input features. Consequently, AMUSEt relies on manually selecting and ranking features based on physical intuition to fully capture the slow dynamics. However, when applied to complex biological systems with numerous features, this selection and ranking process becomes increasingly challenging. To address this challenge, here we present AMUSET-TICA (AMUSEt-based Time-lagged Independent Component Analysis), a CV-identification method using time-structure-independent components (tICs) as the input features for AMUSEt. The key insight of AMUSET-TICA lies in its highly effective embedding of high-dimensional atomistic protein conformations, achieved by expanding orthogonal tICs into overlapping Gaussian basis functions through a tensor-product data structure. This eliminates the need for manually selecting and ranking input features for a wide range of biomolecular systems. We demonstrate that AMUSET-TICA consistently and significantly outperforms AMUSEt and tICA in identifying slow CVs for three different biomolecular systems: alanine dipeptide, the N-terminal domain of L9 (NTL9), and the FIP35 WW domain. For all these systems, the CVs generated by AMUSET-TICA accurately describe the slowest dynamical modes underlying these biological conformational changes. Furthermore, we show that AMUSET-TICA achieves performance comparable to deep-learning approaches like VAMPnets in identifying the slowest dynamical modes, while being significantly more computationally efficient in terms of CPU time. In addition, the CVs yielded by AMUSET-TICA provide insights into the folding mechanisms of NTL9 and the FIP35 WW domain, including CV3 and CV4 of the WW domain, which capture its two parallel folding pathways. We expect AMUSET-TICA can be widely applied to facilitate the investigation of biomolecular dynamics.","PeriodicalId":45,"journal":{"name":"Journal of Chemical Theory and Computation","volume":"91 1","pages":""},"PeriodicalIF":5.7000,"publicationDate":"2025-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Theory and Computation","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.jctc.5c00076","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0

Abstract

Elucidating collective variables (CVs) for biomolecular dynamics is crucial for understanding numerous biological processes. By leveraging the tensor-train data structure, a multilinear version of the AMUSE (Algorithm for Multiple Unknown Signals) algorithm for Koopman approximation (AMUSEt) was recently developed to identify CVs for biomolecular dynamics. To find slow CVs, AMUSEt transforms input features (e.g., pairwise atomic distances) into nonlinear basis functions (e.g., Gaussian functions) and encodes these nonlinear basis functions within a tensor-train structure via time-lagged correlation functions. Due to the need to fit these tensor-train data structures into computer memory, AMUSEt can handle only a limited number of input features. Consequently, AMUSEt relies on manually selecting and ranking features based on physical intuition to fully capture the slow dynamics. However, when applied to complex biological systems with numerous features, this selection and ranking process becomes increasingly challenging. To address this challenge, here we present AMUSET-TICA (AMUSEt-based Time-lagged Independent Component Analysis), a CV-identification method using time-structure-independent components (tICs) as the input features for AMUSEt. The key insight of AMUSET-TICA lies in its highly effective embedding of high-dimensional atomistic protein conformations, achieved by expanding orthogonal tICs into overlapping Gaussian basis functions through a tensor-product data structure. This eliminates the need for manually selecting and ranking input features for a wide range of biomolecular systems. We demonstrate that AMUSET-TICA consistently and significantly outperforms AMUSEt and tICA in identifying slow CVs for three different biomolecular systems: alanine dipeptide, the N-terminal domain of L9 (NTL9), and the FIP35 WW domain. For all these systems, the CVs generated by AMUSET-TICA accurately describe the slowest dynamical modes underlying these biological conformational changes. Furthermore, we show that AMUSET-TICA achieves performance comparable to deep-learning approaches like VAMPnets in identifying the slowest dynamical modes, while being significantly more computationally efficient in terms of CPU time. In addition, the CVs yielded by AMUSET-TICA provide insights into the folding mechanisms of NTL9 and the FIP35 WW domain, including CV3 and CV4 of the WW domain, which capture its two parallel folding pathways. We expect AMUSET-TICA can be widely applied to facilitate the investigation of biomolecular dynamics.
一种基于张量的方法来识别生物分子动力学中的慢集体变量。
阐明生物分子动力学的集体变量(CVs)对于理解许多生物过程至关重要。利用张量训练数据结构,最近开发了用于库普曼近似(AMUSEt)的多线性版本的AMUSE(多未知信号算法)算法,用于识别生物分子动力学的CVs。为了找到慢cv, AMUSEt将输入特征(例如,成对原子距离)转换为非线性基函数(例如,高斯函数),并通过滞后相关函数将这些非线性基函数编码在张量序列结构中。由于需要将这些张量训练数据结构装入计算机存储器,AMUSEt只能处理有限数量的输入特征。因此,AMUSEt依赖于基于物理直觉的手动选择和排序特征来充分捕捉缓慢的动态。然而,当应用于具有众多特征的复杂生物系统时,这种选择和排序过程变得越来越具有挑战性。为了解决这一挑战,我们提出了一种基于AMUSEt的滞后独立分量分析方法(AMUSEt-based time- lag Independent Component Analysis),这是一种使用时结构无关分量(tic)作为AMUSEt输入特征的cv识别方法。AMUSET-TICA的关键在于其高维原子蛋白质构象的高效嵌入,通过张量积数据结构将正交tic扩展为重叠的高斯基函数来实现。这消除了为广泛的生物分子系统手动选择和排序输入特征的需要。我们证明,在识别三种不同生物分子系统(丙氨酸二肽、L9的n端结构域(NTL9)和FIP35 WW结构域)的慢cv方面,AMUSEt - tICA持续且显著优于AMUSEt和tICA。对于所有这些系统,由AMUSET-TICA生成的cv准确地描述了这些生物构象变化背后最慢的动态模式。此外,我们表明,在识别最慢的动态模式方面,AMUSET-TICA实现了与VAMPnets等深度学习方法相当的性能,同时在CPU时间方面具有更高的计算效率。此外,由AMUSET-TICA获得的CVs提供了对NTL9和FIP35 WW结构域的折叠机制的深入了解,包括WW结构域的CV3和CV4,它们捕获了其两个平行的折叠途径。我们期望该方法能够广泛应用于生物分子动力学的研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Chemical Theory and Computation
Journal of Chemical Theory and Computation 化学-物理:原子、分子和化学物理
CiteScore
9.90
自引率
16.40%
发文量
568
审稿时长
1 months
期刊介绍: The Journal of Chemical Theory and Computation invites new and original contributions with the understanding that, if accepted, they will not be published elsewhere. Papers reporting new theories, methodology, and/or important applications in quantum electronic structure, molecular dynamics, and statistical mechanics are appropriate for submission to this Journal. Specific topics include advances in or applications of ab initio quantum mechanics, density functional theory, design and properties of new materials, surface science, Monte Carlo simulations, solvation models, QM/MM calculations, biomolecular structure prediction, and molecular dynamics in the broadest sense including gas-phase dynamics, ab initio dynamics, biomolecular dynamics, and protein folding. The Journal does not consider papers that are straightforward applications of known methods including DFT and molecular dynamics. The Journal favors submissions that include advances in theory or methodology with applications to compelling problems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信