ProCeSa: Contrast-Enhanced Structure-Aware Network for Thermostability Prediction with Protein Language Models.

IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL
Feixiang Zhou, Shuo Zhang, Huifeng Zhang, Jian K Liu
{"title":"ProCeSa: Contrast-Enhanced Structure-Aware Network for Thermostability Prediction with Protein Language Models.","authors":"Feixiang Zhou, Shuo Zhang, Huifeng Zhang, Jian K Liu","doi":"10.1021/acs.jcim.4c01752","DOIUrl":null,"url":null,"abstract":"<p><p>Proteins play a fundamental role in biology, and their thermostability is essential for their proper functionality. The precise measurement of thermostability is crucial, traditionally relying on resource-intensive experiments. Recent advances in deep learning, particularly in protein language models (PLMs), have significantly accelerated the progress in protein thermostability prediction. These models utilize various biological characteristics or deep representations generated by PLMs to represent the protein sequences. However, effectively incorporating structural information, based on the PLM embeddings, while not considering atomic protein structures, remains an open and formidable challenge. Here, we propose a novel Protein Contrast-enhanced Structure-Aware (ProCeSa) model that seamlessly integrates both sequence and structural information extracted from PLMs to enhance thermostability prediction. Our model employs a contrastive learning scheme guided by the categories of amino acid residues, allowing it to discern intricate patterns within protein sequences. Rigorous experiments conducted on publicly available data sets establish the superiority of our method over state-of-the-art approaches, excelling in both classification and regression tasks. Our results demonstrate that ProCeSa addresses the complex challenge of predicting protein thermostability by utilizing PLM-derived sequence embeddings, without requiring access to atomic structural data.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6000,"publicationDate":"2025-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Information and Modeling ","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.jcim.4c01752","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}
引用次数: 0

Abstract

Proteins play a fundamental role in biology, and their thermostability is essential for their proper functionality. The precise measurement of thermostability is crucial, traditionally relying on resource-intensive experiments. Recent advances in deep learning, particularly in protein language models (PLMs), have significantly accelerated the progress in protein thermostability prediction. These models utilize various biological characteristics or deep representations generated by PLMs to represent the protein sequences. However, effectively incorporating structural information, based on the PLM embeddings, while not considering atomic protein structures, remains an open and formidable challenge. Here, we propose a novel Protein Contrast-enhanced Structure-Aware (ProCeSa) model that seamlessly integrates both sequence and structural information extracted from PLMs to enhance thermostability prediction. Our model employs a contrastive learning scheme guided by the categories of amino acid residues, allowing it to discern intricate patterns within protein sequences. Rigorous experiments conducted on publicly available data sets establish the superiority of our method over state-of-the-art approaches, excelling in both classification and regression tasks. Our results demonstrate that ProCeSa addresses the complex challenge of predicting protein thermostability by utilizing PLM-derived sequence embeddings, without requiring access to atomic structural data.

蛋白质在生物学中起着基础性作用,其热稳定性对其正常功能至关重要。热稳定性的精确测量至关重要,传统上依赖于资源密集型实验。深度学习,尤其是蛋白质语言模型(PLMs)的最新进展大大加快了蛋白质耐热性预测的进程。这些模型利用 PLM 生成的各种生物特征或深度表征来表示蛋白质序列。然而,在不考虑蛋白质原子结构的情况下,如何在 PLM 嵌入的基础上有效地整合结构信息,仍然是一个开放而艰巨的挑战。在此,我们提出了一种新颖的蛋白质对比增强结构感知(ProCeSa)模型,该模型无缝整合了从 PLMs 提取的序列和结构信息,以增强热稳定性预测。我们的模型采用了以氨基酸残基类别为指导的对比学习方案,使其能够识别蛋白质序列中错综复杂的模式。在公开数据集上进行的严格实验证明,我们的方法优于最先进的方法,在分类和回归任务中均表现出色。我们的研究结果表明,ProCeSa 通过利用 PLM 衍生的序列嵌入,解决了预测蛋白质热稳定性这一复杂难题,而无需访问原子结构数据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
9.80
自引率
10.70%
发文量
529
审稿时长
1.4 months
期刊介绍: The Journal of Chemical Information and Modeling publishes papers reporting new methodology and/or important applications in the fields of chemical informatics and molecular modeling. Specific topics include the representation and computer-based searching of chemical databases, molecular modeling, computer-aided molecular design of new materials, catalysts, or ligands, development of new computational methods or efficient algorithms for chemical software, and biopharmaceutical chemistry including analyses of biological activity and other issues related to drug discovery. Astute chemists, computer scientists, and information specialists look to this monthly’s insightful research studies, programming innovations, and software reviews to keep current with advances in this integral, multidisciplinary field. As a subscriber you’ll stay abreast of database search systems, use of graph theory in chemical problems, substructure search systems, pattern recognition and clustering, analysis of chemical and physical data, molecular modeling, graphics and natural language interfaces, bibliometric and citation analysis, and synthesis design and reactions databases.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信