Soojung Yang, Juno Nam, Johannes C B Dietschreit, Rafael Gómez-Bombarelli
{"title":"Learning Collective Variables with Synthetic Data Augmentation through Physics-Inspired Geodesic Interpolation.","authors":"Soojung Yang, Juno Nam, Johannes C B Dietschreit, Rafael Gómez-Bombarelli","doi":"10.1021/acs.jctc.4c00435","DOIUrl":null,"url":null,"abstract":"<p><p>In molecular dynamics simulations, rare events, such as protein folding, are typically studied using enhanced sampling techniques, most of which are based on the definition of a collective variable (CV) along which acceleration occurs. Obtaining an expressive CV is crucial, but often hindered by the lack of information about the particular event, e.g., the transition from unfolded to folded conformation. We propose a simulation-free data augmentation strategy using physics-inspired metrics to generate geodesic interpolations resembling protein folding transitions, thereby improving sampling efficiency without true transition state samples. This new data can be used to improve the accuracy of classifier-based methods. Alternatively, a regression-based learning scheme for CV models can be adopted by leveraging the interpolation progress parameter.</p>","PeriodicalId":45,"journal":{"name":"Journal of Chemical Theory and Computation","volume":null,"pages":null},"PeriodicalIF":5.7000,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Theory and Computation","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.jctc.4c00435","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/7/29 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0
Abstract
In molecular dynamics simulations, rare events, such as protein folding, are typically studied using enhanced sampling techniques, most of which are based on the definition of a collective variable (CV) along which acceleration occurs. Obtaining an expressive CV is crucial, but often hindered by the lack of information about the particular event, e.g., the transition from unfolded to folded conformation. We propose a simulation-free data augmentation strategy using physics-inspired metrics to generate geodesic interpolations resembling protein folding transitions, thereby improving sampling efficiency without true transition state samples. This new data can be used to improve the accuracy of classifier-based methods. Alternatively, a regression-based learning scheme for CV models can be adopted by leveraging the interpolation progress parameter.
期刊介绍:
The Journal of Chemical Theory and Computation invites new and original contributions with the understanding that, if accepted, they will not be published elsewhere. Papers reporting new theories, methodology, and/or important applications in quantum electronic structure, molecular dynamics, and statistical mechanics are appropriate for submission to this Journal. Specific topics include advances in or applications of ab initio quantum mechanics, density functional theory, design and properties of new materials, surface science, Monte Carlo simulations, solvation models, QM/MM calculations, biomolecular structure prediction, and molecular dynamics in the broadest sense including gas-phase dynamics, ab initio dynamics, biomolecular dynamics, and protein folding. The Journal does not consider papers that are straightforward applications of known methods including DFT and molecular dynamics. The Journal favors submissions that include advances in theory or methodology with applications to compelling problems.