Improved Data-Driven Collective Variables for Biased Sampling through Iteration on Biased Data

IF 2.9 2区 化学 Q3 CHEMISTRY, PHYSICAL
Subarna Sasmal, Martin McCullagh* and Glen M. Hocky*, 
{"title":"Improved Data-Driven Collective Variables for Biased Sampling through Iteration on Biased Data","authors":"Subarna Sasmal,&nbsp;Martin McCullagh* and Glen M. Hocky*,&nbsp;","doi":"10.1021/acs.jpcb.5c02164","DOIUrl":null,"url":null,"abstract":"<p >Our ability to efficiently sample conformational transitions between two known states of a biomolecule using collective variable (CV)-based sampling depends strongly on the choice of the CV. We previously reported a data-driven approach to clustering biomolecular configurations with a probabilistic clustering model termed shapeGMM. ShapeGMM is a Gaussian mixture model in Cartesian coordinates, with means and covariances in each cluster representing the harmonic approximation to the conformational ensemble around a metastable state. We subsequently showed that linear discriminant analysis on positions (posLDA) produces good reaction coordinates to characterize the transition between two of these states, and moreover, they can be biased to produce transitions between the states using metadynamics-like approaches. However, the quality of these posLDA coordinates depends on the amount of data used to characterize the states, and here, we demonstrate the ability to systematically improve them using enhanced sampling data. Specifically, we demonstrate that improved CVs for sampling can be generated by iteratively performing biased sampling along a posLDA coordinate and then generating a new shapeGMM model from biased data from the previous iteration. The new coordinates derived from our iterative approach show a substantial improvement in being able to induce transitions between metastable states and to converge a free energy surface.</p>","PeriodicalId":60,"journal":{"name":"The Journal of Physical Chemistry B","volume":"129 25","pages":"6163–6171"},"PeriodicalIF":2.9000,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12207592/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Journal of Physical Chemistry B","FirstCategoryId":"1","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acs.jpcb.5c02164","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0

Abstract

Our ability to efficiently sample conformational transitions between two known states of a biomolecule using collective variable (CV)-based sampling depends strongly on the choice of the CV. We previously reported a data-driven approach to clustering biomolecular configurations with a probabilistic clustering model termed shapeGMM. ShapeGMM is a Gaussian mixture model in Cartesian coordinates, with means and covariances in each cluster representing the harmonic approximation to the conformational ensemble around a metastable state. We subsequently showed that linear discriminant analysis on positions (posLDA) produces good reaction coordinates to characterize the transition between two of these states, and moreover, they can be biased to produce transitions between the states using metadynamics-like approaches. However, the quality of these posLDA coordinates depends on the amount of data used to characterize the states, and here, we demonstrate the ability to systematically improve them using enhanced sampling data. Specifically, we demonstrate that improved CVs for sampling can be generated by iteratively performing biased sampling along a posLDA coordinate and then generating a new shapeGMM model from biased data from the previous iteration. The new coordinates derived from our iterative approach show a substantial improvement in being able to induce transitions between metastable states and to converge a free energy surface.

基于有偏数据迭代的改进数据驱动的有偏抽样集体变量。
我们使用基于集体变量(CV)的采样有效地采样生物分子两种已知状态之间的构象转变的能力在很大程度上取决于CV的选择。我们之前报道了一种数据驱动的方法,用一种称为shapeGMM的概率聚类模型来聚类生物分子构型。ShapeGMM是笛卡尔坐标系下的高斯混合模型,每个簇的均值和协方差表示亚稳态周围构象系综的谐波近似。我们随后表明,对位置的线性判别分析(posLDA)产生了很好的反应坐标来表征这两种状态之间的转变,而且,它们可以使用元动力学类方法来产生状态之间的转变。然而,这些posLDA坐标的质量取决于用于表征状态的数据量,在这里,我们展示了使用增强采样数据系统地改进它们的能力。具体来说,我们证明了改进的采样CVs可以通过沿着posLDA坐标迭代执行有偏差采样,然后从前一次迭代的有偏差数据生成新的shapeGMM模型来生成。从我们的迭代方法中得到的新坐标在诱导亚稳态之间的跃迁和收敛自由能表面方面有了实质性的改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
5.80
自引率
9.10%
发文量
965
审稿时长
1.6 months
期刊介绍: An essential criterion for acceptance of research articles in the journal is that they provide new physical insight. Please refer to the New Physical Insights virtual issue on what constitutes new physical insight. Manuscripts that are essentially reporting data or applications of data are, in general, not suitable for publication in JPC B.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信