Tutorial on Molecular Latent Space Simulators (LSSs): Spatially and Temporally Continuous Data-Driven Surrogate Dynamical Models of Molecular Systems.

IF 2.7 2区 化学 Q3 CHEMISTRY, PHYSICAL
Michael S Jones, Kirill Shmilovich, Andrew L Ferguson
{"title":"Tutorial on Molecular Latent Space Simulators (LSSs): Spatially and Temporally Continuous Data-Driven Surrogate Dynamical Models of Molecular Systems.","authors":"Michael S Jones, Kirill Shmilovich, Andrew L Ferguson","doi":"10.1021/acs.jpca.4c05389","DOIUrl":null,"url":null,"abstract":"<p><p>The inherently serial nature and requirement for short integration time steps in the numerical integration of molecular dynamics (MD) calculations place strong limitations on the accessible simulation time scales and statistical uncertainties in sampling slowly relaxing dynamical modes and rare events. Molecular latent space simulators (LSSs) are a data-driven approach to learning a surrogate dynamical model of the molecular system from modest MD training trajectories that can generate synthetic trajectories at a fraction of the computational cost. The training data may comprise single long trajectories or multiple short, discontinuous trajectories collected over, for example, distributed computing resources. Provided the training data provide sufficient sampling of the relevant thermodynamic states and dynamical transitions to robustly learn the underlying microscopic propagator, an LSS furnishes a global model of the dynamics capable of producing temporally and spatially continuous molecular trajectories. Trained LSS models have produced simulation trajectories at up to 6 orders of magnitude lower cost than standard MD to enable dense sampling of molecular phase space and large reduction of the statistical errors in structural, thermodynamic, and kinetic observables. The LSS employs three deep learning architectures to solve three independent learning problems over the training data: (i) an encoding of the high-dimensional MD into a low-dimensional slow latent space using state-free reversible VAMPnets (SRVs), (ii) a propagator of the microscopic dynamics within the low-dimensional latent space using mixture density networks (MDNs), and (iii) a generative decoding of the low-dimensional latent coordinates back to the original high-dimensional molecular configuration space using conditional Wasserstein generative adversarial networks (cWGANs) or denoising diffusion probability models (DDPMs). In this software tutorial, we introduce the mathematical and numerical background and theory of LSS and present example applications of a user-friendly Python package software implementation to alanine dipeptide and a 28-residue beta-beta-alpha (BBA) protein within simple Google Colab notebooks.</p>","PeriodicalId":59,"journal":{"name":"The Journal of Physical Chemistry A","volume":null,"pages":null},"PeriodicalIF":2.7000,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Journal of Physical Chemistry A","FirstCategoryId":"1","ListUrlMain":"https://doi.org/10.1021/acs.jpca.4c05389","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0

Abstract

The inherently serial nature and requirement for short integration time steps in the numerical integration of molecular dynamics (MD) calculations place strong limitations on the accessible simulation time scales and statistical uncertainties in sampling slowly relaxing dynamical modes and rare events. Molecular latent space simulators (LSSs) are a data-driven approach to learning a surrogate dynamical model of the molecular system from modest MD training trajectories that can generate synthetic trajectories at a fraction of the computational cost. The training data may comprise single long trajectories or multiple short, discontinuous trajectories collected over, for example, distributed computing resources. Provided the training data provide sufficient sampling of the relevant thermodynamic states and dynamical transitions to robustly learn the underlying microscopic propagator, an LSS furnishes a global model of the dynamics capable of producing temporally and spatially continuous molecular trajectories. Trained LSS models have produced simulation trajectories at up to 6 orders of magnitude lower cost than standard MD to enable dense sampling of molecular phase space and large reduction of the statistical errors in structural, thermodynamic, and kinetic observables. The LSS employs three deep learning architectures to solve three independent learning problems over the training data: (i) an encoding of the high-dimensional MD into a low-dimensional slow latent space using state-free reversible VAMPnets (SRVs), (ii) a propagator of the microscopic dynamics within the low-dimensional latent space using mixture density networks (MDNs), and (iii) a generative decoding of the low-dimensional latent coordinates back to the original high-dimensional molecular configuration space using conditional Wasserstein generative adversarial networks (cWGANs) or denoising diffusion probability models (DDPMs). In this software tutorial, we introduce the mathematical and numerical background and theory of LSS and present example applications of a user-friendly Python package software implementation to alanine dipeptide and a 28-residue beta-beta-alpha (BBA) protein within simple Google Colab notebooks.

分子潜空间模拟器(LSSs)教程:分子系统的时空连续数据驱动替代动力学模型。
分子动力学(MD)计算的固有串行性质和对短积分时间步长的要求,对可访问的模拟时间尺度以及对缓慢松弛动力学模式和罕见事件采样的统计不确定性造成了很大限制。分子潜空间模拟器(LSS)是一种数据驱动方法,可从适度的 MD 训练轨迹中学习分子系统的替代动力学模型,从而以较低的计算成本生成合成轨迹。训练数据可包括通过分布式计算资源等收集的单个长轨迹或多个短的、不连续的轨迹。只要训练数据能提供足够的相关热力学状态和动力学转换采样,从而稳健地学习底层微观传播者,LSS 就能提供一个全局动力学模型,并能生成时间和空间上连续的分子轨迹。训练有素的 LSS 模型生成模拟轨迹的成本比标准 MD 低达 6 个数量级,从而能够对分子相空间进行密集采样,并大幅降低结构、热力学和动力学观测值的统计误差。LSS 采用三种深度学习架构来解决训练数据中的三个独立学习问题:(i) 使用无状态可逆 VAMPnet(SRV)将高维 MD 编码为低维慢潜空间;(ii) 使用混合密度网络(MDN)在低维潜空间内传播微观动力学;(iii) 使用条件瓦瑟斯坦生成对抗网络(cWGAN)或去噪扩散概率模型(DDPM)将低维潜坐标生成解码回原始高维分子构型空间。在本软件教程中,我们将介绍 LSS 的数学和数值背景及理论,并在简单的 Google Colab 笔记本中介绍用户友好型 Python 软件包在丙氨酸二肽和 28 位元 beta-beta-α (BBA) 蛋白质中的应用实例。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
The Journal of Physical Chemistry A
The Journal of Physical Chemistry A 化学-物理:原子、分子和化学物理
CiteScore
5.20
自引率
10.30%
发文量
922
审稿时长
1.3 months
期刊介绍: The Journal of Physical Chemistry A is devoted to reporting new and original experimental and theoretical basic research of interest to physical chemists, biophysical chemists, and chemical physicists.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信