Rohith Krishna, Jue Wang, Woody Ahern, Pascal Sturmfels, Preetham Venkatesh, Indrek Kalvet, Gyu Rie Lee, Felix S Morey-Burrows, Ivan Anishchenko, Ian R Humphreys, Ryan McHugh, Dionne Vafeados, Xinting Li, George A Sutherland, Andrew Hitchcock, C Neil Hunter, Minkyung Baek, Frank DiMaio, David Baker
{"title":"Generalized Biomolecular Modeling and Design with RoseTTAFold All-Atom","authors":"Rohith Krishna, Jue Wang, Woody Ahern, Pascal Sturmfels, Preetham Venkatesh, Indrek Kalvet, Gyu Rie Lee, Felix S Morey-Burrows, Ivan Anishchenko, Ian R Humphreys, Ryan McHugh, Dionne Vafeados, Xinting Li, George A Sutherland, Andrew Hitchcock, C Neil Hunter, Minkyung Baek, Frank DiMaio, David Baker","doi":"10.1101/2023.10.09.561603","DOIUrl":null,"url":null,"abstract":"Abstract Although AlphaFold2 (AF2) and RoseTTAFold (RF) have transformed structural biology by enabling high-accuracy protein structure modeling, they are unable to model covalent modifications or interactions with small molecules and other non-protein molecules that can play key roles in biological function. Here, we describe RoseTTAFold All-Atom (RFAA), a deep network capable of modeling full biological assemblies containing proteins, nucleic acids, small molecules, metals, and covalent modifications given the sequences of the polymers and the atomic bonded geometry of the small molecules and covalent modifications. Following training on structures of full biological assemblies in the Protein Data Bank (PDB), RFAA has comparable protein structure prediction accuracy to AF2, excellent performance in CAMEO for flexible backbone small molecule docking, and reasonable prediction accuracy for protein covalent modifications and assemblies of proteins with multiple nucleic acid chains and small molecules which, to our knowledge, no existing method can model simultaneously. By fine-tuning on diffusive denoising tasks, we develop RFdiffusion All-Atom (RFdiffusionAA ) , which generates binding pockets by directly building protein structures around small molecules and other non-protein molecules. Starting from random distributions of amino acid residues surrounding target small molecules, we design and experimentally validate proteins that bind the cardiac disease therapeutic digoxigenin, the enzymatic cofactor heme, and optically active bilin molecules with potential for expanding the range of wavelengths captured by photosynthesis. We anticipate that RFAA and RFdiffusionAA will be widely useful for modeling and designing complex biomolecular systems.","PeriodicalId":486943,"journal":{"name":"bioRxiv (Cold Spring Harbor Laboratory)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv (Cold Spring Harbor Laboratory)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2023.10.09.561603","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Abstract Although AlphaFold2 (AF2) and RoseTTAFold (RF) have transformed structural biology by enabling high-accuracy protein structure modeling, they are unable to model covalent modifications or interactions with small molecules and other non-protein molecules that can play key roles in biological function. Here, we describe RoseTTAFold All-Atom (RFAA), a deep network capable of modeling full biological assemblies containing proteins, nucleic acids, small molecules, metals, and covalent modifications given the sequences of the polymers and the atomic bonded geometry of the small molecules and covalent modifications. Following training on structures of full biological assemblies in the Protein Data Bank (PDB), RFAA has comparable protein structure prediction accuracy to AF2, excellent performance in CAMEO for flexible backbone small molecule docking, and reasonable prediction accuracy for protein covalent modifications and assemblies of proteins with multiple nucleic acid chains and small molecules which, to our knowledge, no existing method can model simultaneously. By fine-tuning on diffusive denoising tasks, we develop RFdiffusion All-Atom (RFdiffusionAA ) , which generates binding pockets by directly building protein structures around small molecules and other non-protein molecules. Starting from random distributions of amino acid residues surrounding target small molecules, we design and experimentally validate proteins that bind the cardiac disease therapeutic digoxigenin, the enzymatic cofactor heme, and optically active bilin molecules with potential for expanding the range of wavelengths captured by photosynthesis. We anticipate that RFAA and RFdiffusionAA will be widely useful for modeling and designing complex biomolecular systems.
虽然AlphaFold2 (AF2)和RoseTTAFold (RF)通过实现高精度的蛋白质结构建模改变了结构生物学,但它们无法模拟共价修饰或与小分子和其他非蛋白质分子的相互作用,而这些共价修饰或相互作用在生物学功能中发挥关键作用。在这里,我们描述了RoseTTAFold全原子(RFAA),这是一个深度网络,能够模拟包含蛋白质、核酸、小分子、金属和共价修饰的完整生物组件,给定聚合物的序列和小分子的原子键合几何形状和共价修饰。在蛋白质数据库(Protein Data Bank, PDB)中对完整生物组装体的结构进行训练后,RFAA具有与AF2相当的蛋白质结构预测精度,在灵活骨架小分子对接的CAMEO中表现优异,对蛋白质共价修饰和多核酸链和小分子蛋白质组装的预测精度合理,据我们所知,现有方法无法同时建模。通过对扩散去噪任务进行微调,我们开发了RFdiffusion All-Atom (RFdiffusionAA),它通过直接在小分子和其他非蛋白质分子周围构建蛋白质结构来产生结合口袋。从目标小分子周围氨基酸残基的随机分布开始,我们设计并实验验证了结合心脏病治疗性地高辛、酶辅助因子血红素和光学活性胆磷脂分子的蛋白质,这些蛋白质具有扩大光合作用捕获的波长范围的潜力。我们期望RFAA和RFdiffusionAA将广泛应用于复杂生物分子系统的建模和设计。