Sequence determinants of protein phase separation and recognition by protein phase-separated condensates through molecular dynamics and active learning

IF 4.3 3区 材料科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC
Arya Changiarath Sivadasan, Aayush Arya, Vasileios A. Xenidis, Jan Padeken, Lukas S. Stelzl
{"title":"Sequence determinants of protein phase separation and recognition by protein phase-separated condensates through molecular dynamics and active learning","authors":"Arya Changiarath Sivadasan, Aayush Arya, Vasileios A. Xenidis, Jan Padeken, Lukas S. Stelzl","doi":"10.1039/d4fd00099d","DOIUrl":null,"url":null,"abstract":"Elucidating how protein sequence determines the properties of disordered proteins and their phase-separated condensates is a great challenge in computational chemistry, biology, and biophysics. Quantitative molecular dynamics simulations and derived free energy values can in principle capture how a sequence encodes the chemical and biological properties of a protein. These calculations are, however, computationally demanding, even after reducing the representation by coarse-graining; exploring the large spaces of potentially relevant sequences remains a formidable task. We employ an \"active learning\" scheme introduced by Yang et al.(bioRxiv 2022.08.05.502972) to reduce the number of labelled examples needed from simulations, where a neural network-based model suggests the most useful examples for the next training cycle. Applying this Bayesian Optimisation framework, we determine properties of protein sequences with coarse-grained molecular dynamics, which enables the network to establish sequence-property relationships for disordered proteins and their self-interactions and their interactions in phase-separated condensates. We show how iterative training with second virial coefficients derived from the simulations of disordered protein sequences leads to a rapid improvement in predicting peptide self-interactions. We employ this Bayesian approach to efficiently search for new sequences that bind to condensates of disordered C-terminal domain (CTD) of RNA Polymerase II, by simulating molecular recognition of peptides to phase-separated condensates in coarse-grained molecular dynamics. By searching for protein sequences which prefer to self-interact rather than interact with another protein sequence we are able to shape the morphology of protein condensates and design multiphasic protein condensates.","PeriodicalId":3,"journal":{"name":"ACS Applied Electronic Materials","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Electronic Materials","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1039/d4fd00099d","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Elucidating how protein sequence determines the properties of disordered proteins and their phase-separated condensates is a great challenge in computational chemistry, biology, and biophysics. Quantitative molecular dynamics simulations and derived free energy values can in principle capture how a sequence encodes the chemical and biological properties of a protein. These calculations are, however, computationally demanding, even after reducing the representation by coarse-graining; exploring the large spaces of potentially relevant sequences remains a formidable task. We employ an "active learning" scheme introduced by Yang et al.(bioRxiv 2022.08.05.502972) to reduce the number of labelled examples needed from simulations, where a neural network-based model suggests the most useful examples for the next training cycle. Applying this Bayesian Optimisation framework, we determine properties of protein sequences with coarse-grained molecular dynamics, which enables the network to establish sequence-property relationships for disordered proteins and their self-interactions and their interactions in phase-separated condensates. We show how iterative training with second virial coefficients derived from the simulations of disordered protein sequences leads to a rapid improvement in predicting peptide self-interactions. We employ this Bayesian approach to efficiently search for new sequences that bind to condensates of disordered C-terminal domain (CTD) of RNA Polymerase II, by simulating molecular recognition of peptides to phase-separated condensates in coarse-grained molecular dynamics. By searching for protein sequences which prefer to self-interact rather than interact with another protein sequence we are able to shape the morphology of protein condensates and design multiphasic protein condensates.
通过分子动力学和主动学习研究蛋白质相分离和蛋白质相分离凝聚物识别的序列决定因素
阐明蛋白质序列如何决定无序蛋白质及其相分离凝聚物的特性,是计算化学、生物学和生物物理学的一大挑战。定量分子动力学模拟和推导出的自由能值原则上可以捕捉序列如何编码蛋白质的化学和生物特性。然而,这些计算对计算要求很高,即使在通过粗粒化减少表征之后也是如此;探索潜在相关序列的巨大空间仍然是一项艰巨的任务。我们采用了杨等人提出的 "主动学习 "方案(bioRxiv 2022.08.05.502972)来减少模拟所需的标记示例数量,其中基于神经网络的模型为下一个训练周期提出了最有用的示例。通过应用这种贝叶斯优化框架,我们用粗粒度分子动力学确定了蛋白质序列的属性,从而使网络能够建立无序蛋白质的序列属性关系及其在相分离凝聚体中的自我相互作用和相互作用。我们展示了如何利用从无序蛋白质序列模拟中得出的第二病毒系数进行迭代训练,从而快速提高肽自相互作用的预测能力。我们采用这种贝叶斯方法,通过在粗粒度分子动力学中模拟分子识别肽与相分离凝聚物的过程,有效地搜索与 RNA 聚合酶 II 的无序 C 端结构域 (CTD) 凝聚物结合的新序列。通过寻找更倾向于自我相互作用而不是与另一个蛋白质序列相互作用的蛋白质序列,我们能够塑造蛋白质凝聚物的形态并设计多相蛋白质凝聚物。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.20
自引率
4.30%
发文量
567
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信