Improved Data-Driven Collective Variables for Biased Sampling through Iteration on Biased Data

IF 2.9 2区化学 Q3 CHEMISTRY, PHYSICAL

The Journal of Physical Chemistry B Pub Date : 2025-06-12 DOI:10.1021/acs.jpcb.5c02164

Subarna Sasmal, Martin McCullagh* and Glen M. Hocky*,

{"title":"Improved Data-Driven Collective Variables for Biased Sampling through Iteration on Biased Data","authors":"Subarna Sasmal, Martin McCullagh* and Glen M. Hocky*, ","doi":"10.1021/acs.jpcb.5c02164","DOIUrl":null,"url":null,"abstract":"<p >Our ability to efficiently sample conformational transitions between two known states of a biomolecule using collective variable (CV)-based sampling depends strongly on the choice of the CV. We previously reported a data-driven approach to clustering biomolecular configurations with a probabilistic clustering model termed shapeGMM. ShapeGMM is a Gaussian mixture model in Cartesian coordinates, with means and covariances in each cluster representing the harmonic approximation to the conformational ensemble around a metastable state. We subsequently showed that linear discriminant analysis on positions (posLDA) produces good reaction coordinates to characterize the transition between two of these states, and moreover, they can be biased to produce transitions between the states using metadynamics-like approaches. However, the quality of these posLDA coordinates depends on the amount of data used to characterize the states, and here, we demonstrate the ability to systematically improve them using enhanced sampling data. Specifically, we demonstrate that improved CVs for sampling can be generated by iteratively performing biased sampling along a posLDA coordinate and then generating a new shapeGMM model from biased data from the previous iteration. The new coordinates derived from our iterative approach show a substantial improvement in being able to induce transitions between metastable states and to converge a free energy surface.</p>","PeriodicalId":60,"journal":{"name":"The Journal of Physical Chemistry B","volume":"129 25","pages":"6163–6171"},"PeriodicalIF":2.9000,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12207592/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Journal of Physical Chemistry B","FirstCategoryId":"1","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acs.jpcb.5c02164","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}

引用次数: 0

Abstract

Our ability to efficiently sample conformational transitions between two known states of a biomolecule using collective variable (CV)-based sampling depends strongly on the choice of the CV. We previously reported a data-driven approach to clustering biomolecular configurations with a probabilistic clustering model termed shapeGMM. ShapeGMM is a Gaussian mixture model in Cartesian coordinates, with means and covariances in each cluster representing the harmonic approximation to the conformational ensemble around a metastable state. We subsequently showed that linear discriminant analysis on positions (posLDA) produces good reaction coordinates to characterize the transition between two of these states, and moreover, they can be biased to produce transitions between the states using metadynamics-like approaches. However, the quality of these posLDA coordinates depends on the amount of data used to characterize the states, and here, we demonstrate the ability to systematically improve them using enhanced sampling data. Specifically, we demonstrate that improved CVs for sampling can be generated by iteratively performing biased sampling along a posLDA coordinate and then generating a new shapeGMM model from biased data from the previous iteration. The new coordinates derived from our iterative approach show a substantial improvement in being able to induce transitions between metastable states and to converge a free energy surface.

查看原文本刊更多论文

基于有偏数据迭代的改进数据驱动的有偏抽样集体变量。

我们使用基于集体变量（CV）的采样有效地采样生物分子两种已知状态之间的构象转变的能力在很大程度上取决于CV的选择。我们之前报道了一种数据驱动的方法，用一种称为shapeGMM的概率聚类模型来聚类生物分子构型。ShapeGMM是笛卡尔坐标系下的高斯混合模型，每个簇的均值和协方差表示亚稳态周围构象系综的谐波近似。我们随后表明，对位置的线性判别分析（posLDA）产生了很好的反应坐标来表征这两种状态之间的转变，而且，它们可以使用元动力学类方法来产生状态之间的转变。然而，这些posLDA坐标的质量取决于用于表征状态的数据量，在这里，我们展示了使用增强采样数据系统地改进它们的能力。具体来说，我们证明了改进的采样CVs可以通过沿着posLDA坐标迭代执行有偏差采样，然后从前一次迭代的有偏差数据生成新的shapeGMM模型来生成。从我们的迭代方法中得到的新坐标在诱导亚稳态之间的跃迁和收敛自由能表面方面有了实质性的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

The Journal of Physical Chemistry B 化学-物理化学

CiteScore

5.80

自引率

9.10%

发文量

965

审稿时长

1.6 months

期刊介绍： An essential criterion for acceptance of research articles in the journal is that they provide new physical insight. Please refer to the New Physical Insights virtual issue on what constitutes new physical insight. Manuscripts that are essentially reporting data or applications of data are, in general, not suitable for publication in JPC B.