Zero-shot transfer of protein sequence likelihood models to thermostability prediction

IF 18.8 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Nature Machine Intelligence Pub Date : 2024-09-20 DOI:10.1038/s42256-024-00887-7

Shawn Reeves, Subha Kalyaanamoorthy

{"title":"Zero-shot transfer of protein sequence likelihood models to thermostability prediction","authors":"Shawn Reeves, Subha Kalyaanamoorthy","doi":"10.1038/s42256-024-00887-7","DOIUrl":null,"url":null,"abstract":"Protein sequence likelihood models (PSLMs) are an emerging class of self-supervised deep learning algorithms that learn probability distributions over amino acid identities conditioned on structural or evolutionary context. Recently, PSLMs have demonstrated impressive performance in predicting the relative fitness of variant sequences without any task-specific training, but their potential to address a central goal of protein engineering—enhancing stability—remains underexplored. Here we comprehensively analyse the capacity for zero-shot transfer of eight PSLMs towards prediction of relative thermostability for variants of hundreds of heterogeneous proteins across several quantitative datasets. PSLMs are compared with popular task-specific stability models, and we show that some PSLMs have competitive performance when the appropriate statistics are considered. We highlight relative strengths and weaknesses of PSLMs and examine their complementarity with task-specific models, specifically focusing our analyses on stability-engineering applications. Our results indicate that all PSLMs can appreciably augment the predictions of existing methods by integrating insights from their disparate training objectives, suggesting a path forward in the stagnating field of computational stability prediction. Stabilization of proteins is a key task in protein engineering; however, current methods to predict mutant stability face a number of limitations. Reeves and Kalyaanamoorthy study the performance of self-supervised protein sequence likelihood models for stability prediction and find that combining them with task-specific supervised models can lead to appreciable practical gains.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"6 9","pages":"1063-1076"},"PeriodicalIF":18.8000,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Machine Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.nature.com/articles/s42256-024-00887-7","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Protein sequence likelihood models (PSLMs) are an emerging class of self-supervised deep learning algorithms that learn probability distributions over amino acid identities conditioned on structural or evolutionary context. Recently, PSLMs have demonstrated impressive performance in predicting the relative fitness of variant sequences without any task-specific training, but their potential to address a central goal of protein engineering—enhancing stability—remains underexplored. Here we comprehensively analyse the capacity for zero-shot transfer of eight PSLMs towards prediction of relative thermostability for variants of hundreds of heterogeneous proteins across several quantitative datasets. PSLMs are compared with popular task-specific stability models, and we show that some PSLMs have competitive performance when the appropriate statistics are considered. We highlight relative strengths and weaknesses of PSLMs and examine their complementarity with task-specific models, specifically focusing our analyses on stability-engineering applications. Our results indicate that all PSLMs can appreciably augment the predictions of existing methods by integrating insights from their disparate training objectives, suggesting a path forward in the stagnating field of computational stability prediction. Stabilization of proteins is a key task in protein engineering; however, current methods to predict mutant stability face a number of limitations. Reeves and Kalyaanamoorthy study the performance of self-supervised protein sequence likelihood models for stability prediction and find that combining them with task-specific supervised models can lead to appreciable practical gains.

Abstract Image

查看原文本刊更多论文

蛋白质序列似然模型在热稳定性预测中的零点转移

蛋白质序列似然模型（PSLM）是一类新兴的自我监督深度学习算法，它以结构或进化背景为条件学习氨基酸同一性的概率分布。最近，PSLMs 在预测变异序列的相对适合度方面表现出了令人印象深刻的性能，而无需任何特定任务的训练，但它们在实现蛋白质工程的核心目标--增强稳定性--方面的潜力仍未得到充分开发。在这里，我们全面分析了八种 PSLMs 在多个定量数据集上预测数百种异质蛋白质变体相对热稳定性的零点转移能力。我们将 PSLM 与流行的特定任务稳定性模型进行了比较，结果表明，如果考虑到适当的统计数据，一些 PSLM 的性能具有竞争力。我们强调了 PSLM 的相对优缺点，并研究了它们与特定任务模型的互补性，特别是将分析重点放在稳定性工程应用上。我们的研究结果表明，所有 PSLM 都能通过整合不同训练目标的见解，显著增强现有方法的预测能力，为停滞不前的计算稳定性预测领域指明了前进的道路。蛋白质的稳定性是蛋白质工程中的一项关键任务；然而，目前预测突变体稳定性的方法面临着许多限制。Reeves 和 Kalyaanamoorthy 研究了用于稳定性预测的自监督蛋白质序列似然模型的性能，发现将它们与特定任务的监督模型相结合可以带来显著的实际收益。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Nature Machine Intelligence Multiple-

CiteScore

36.90

自引率

2.10%

发文量

127

期刊介绍： Nature Machine Intelligence is a distinguished publication that presents original research and reviews on various topics in machine learning, robotics, and AI. Our focus extends beyond these fields, exploring their profound impact on other scientific disciplines, as well as societal and industrial aspects. We recognize limitless possibilities wherein machine intelligence can augment human capabilities and knowledge in domains like scientific exploration, healthcare, medical diagnostics, and the creation of safe and sustainable cities, transportation, and agriculture. Simultaneously, we acknowledge the emergence of ethical, social, and legal concerns due to the rapid pace of advancements. To foster interdisciplinary discussions on these far-reaching implications, Nature Machine Intelligence serves as a platform for dialogue facilitated through Comments, News Features, News & Views articles, and Correspondence. Our goal is to encourage a comprehensive examination of these subjects. Similar to all Nature-branded journals, Nature Machine Intelligence operates under the guidance of a team of skilled editors. We adhere to a fair and rigorous peer-review process, ensuring high standards of copy-editing and production, swift publication, and editorial independence.