HaloClass: Salt-Tolerant Protein Classification with Protein Language Models

IF 1.9 4区 生物学 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY
Kush Narang, Abhigyan Nath, William Hemstrom, Simon K. S. Chu
{"title":"HaloClass: Salt-Tolerant Protein Classification with Protein Language Models","authors":"Kush Narang,&nbsp;Abhigyan Nath,&nbsp;William Hemstrom,&nbsp;Simon K. S. Chu","doi":"10.1007/s10930-024-10236-7","DOIUrl":null,"url":null,"abstract":"<div><p>Salt-tolerant proteins, also known as halophilic proteins, have unique adaptations to function in high-salinity environments. These proteins have naturally evolved in extremophilic organisms, and more recently, are being increasingly applied as enzymes in industrial processes. Due to an abundance of salt-tolerant sequences and a simultaneous lack of experimental structures, most computational methods to predict stability are sequence-based only. These approaches, however, are hindered by a lack of structural understanding of these proteins. Here, we present HaloClass, an SVM classifier that leverages ESM-2 protein language model embeddings to accurately identify salt-tolerant proteins. On a newer and larger test dataset, HaloClass outperforms existing approaches when predicting the stability of never-before-seen proteins that are distal to its training set. Finally, on a mutation study that evaluated changes in salt tolerance based on single- and multiple-point mutants, HaloClass outperforms existing approaches, suggesting applications in the guided design of salt-tolerant enzymes.</p></div>","PeriodicalId":793,"journal":{"name":"The Protein Journal","volume":null,"pages":null},"PeriodicalIF":1.9000,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10930-024-10236-7.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Protein Journal","FirstCategoryId":"2","ListUrlMain":"https://link.springer.com/article/10.1007/s10930-024-10236-7","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Salt-tolerant proteins, also known as halophilic proteins, have unique adaptations to function in high-salinity environments. These proteins have naturally evolved in extremophilic organisms, and more recently, are being increasingly applied as enzymes in industrial processes. Due to an abundance of salt-tolerant sequences and a simultaneous lack of experimental structures, most computational methods to predict stability are sequence-based only. These approaches, however, are hindered by a lack of structural understanding of these proteins. Here, we present HaloClass, an SVM classifier that leverages ESM-2 protein language model embeddings to accurately identify salt-tolerant proteins. On a newer and larger test dataset, HaloClass outperforms existing approaches when predicting the stability of never-before-seen proteins that are distal to its training set. Finally, on a mutation study that evaluated changes in salt tolerance based on single- and multiple-point mutants, HaloClass outperforms existing approaches, suggesting applications in the guided design of salt-tolerant enzymes.

HaloClass:利用蛋白质语言模型进行耐盐蛋白质分类
耐盐蛋白质又称嗜卤蛋白质,具有在高盐度环境中发挥作用的独特适应性。这些蛋白质是在嗜极端生物中自然进化而来的,近来正越来越多地被用作工业流程中的酶。由于存在大量耐盐序列,同时又缺乏实验结构,因此大多数预测稳定性的计算方法都是基于序列的。然而,这些方法因缺乏对这些蛋白质结构的了解而受到阻碍。在这里,我们介绍一种 SVM 分类器 HaloClass,它利用 ESM-2 蛋白语言模型嵌入来准确识别耐盐蛋白质。在一个更新、更大的测试数据集上,HaloClass 在预测与其训练集相距甚远的从未见过的蛋白质的稳定性时,表现优于现有方法。最后,在一项评估基于单点和多点突变体的耐盐性变化的突变研究中,HaloClass 的表现优于现有方法,这表明它可应用于耐盐酶的指导设计。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
The Protein Journal
The Protein Journal 生物-生化与分子生物学
CiteScore
5.20
自引率
0.00%
发文量
57
审稿时长
12 months
期刊介绍: The Protein Journal (formerly the Journal of Protein Chemistry) publishes original research work on all aspects of proteins and peptides. These include studies concerned with covalent or three-dimensional structure determination (X-ray, NMR, cryoEM, EPR/ESR, optical methods, etc.), computational aspects of protein structure and function, protein folding and misfolding, assembly, genetics, evolution, proteomics, molecular biology, protein engineering, protein nanotechnology, protein purification and analysis and peptide synthesis, as well as the elucidation and interpretation of the molecular bases of biological activities of proteins and peptides. We accept original research papers, reviews, mini-reviews, hypotheses, opinion papers, and letters to the editor.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信