MegSite: an accurate nucleic acid-binding residue prediction method based on multimodal protein language model.

IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS
Feng Hu, Wenwu Zeng, Shaoliang Peng
{"title":"MegSite: an accurate nucleic acid-binding residue prediction method based on multimodal protein language model.","authors":"Feng Hu, Wenwu Zeng, Shaoliang Peng","doi":"10.1093/bib/bbaf524","DOIUrl":null,"url":null,"abstract":"<p><p>Accurate identification of nucleic acid-binding residues is crucial for understanding protein-nucleic acid interactions, which play a key role in gene expression research and the discovery of regulatory mechanisms. Despite numerous computational efforts to address this challenge, achieving high accuracy remains difficult due to the complexity of extracting meaningful insights from proteins. Here, we introduce MegSite, a novel multimodal protein language model-informed method that integrates discriminative knowledge from protein sequence, structure, and function. This work presents the first integration of ESM3 multimodal features for nucleic acid-binding site prediction. MegSite significantly outperforms existing prediction methods, as evidenced by its performance on multiple independent test sets. The Matthews correlation coefficient values achieved by MegSite on DNA-129_Test, DNA-181_Test, RNA-117_Test, and RNA-285_Test are 0.567, 0.444, 0.411, and 0.421, representing the improvements of 2.72%, 7.66%, 1.22% and 6.58% over the second-best method separately. Notably, MegSite demonstrates robust performance even on proteins with low structural similarity, surpassing the previous structure-based methods. Furthermore, this method is seamlessly extendable to the predicted protein structure and a newly released RNA-binding residue test set with high accuracy, highlighting its broad applicability. Comprehensive experimental results reveal that the superior performance of MegSite is attributed to its effective integration of multimodal protein knowledge.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 5","pages":""},"PeriodicalIF":7.7000,"publicationDate":"2025-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12496013/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbaf524","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Accurate identification of nucleic acid-binding residues is crucial for understanding protein-nucleic acid interactions, which play a key role in gene expression research and the discovery of regulatory mechanisms. Despite numerous computational efforts to address this challenge, achieving high accuracy remains difficult due to the complexity of extracting meaningful insights from proteins. Here, we introduce MegSite, a novel multimodal protein language model-informed method that integrates discriminative knowledge from protein sequence, structure, and function. This work presents the first integration of ESM3 multimodal features for nucleic acid-binding site prediction. MegSite significantly outperforms existing prediction methods, as evidenced by its performance on multiple independent test sets. The Matthews correlation coefficient values achieved by MegSite on DNA-129_Test, DNA-181_Test, RNA-117_Test, and RNA-285_Test are 0.567, 0.444, 0.411, and 0.421, representing the improvements of 2.72%, 7.66%, 1.22% and 6.58% over the second-best method separately. Notably, MegSite demonstrates robust performance even on proteins with low structural similarity, surpassing the previous structure-based methods. Furthermore, this method is seamlessly extendable to the predicted protein structure and a newly released RNA-binding residue test set with high accuracy, highlighting its broad applicability. Comprehensive experimental results reveal that the superior performance of MegSite is attributed to its effective integration of multimodal protein knowledge.

MegSite:基于多模态蛋白语言模型的核酸结合残基精确预测方法。
核酸结合残基的准确鉴定对于理解蛋白质与核酸的相互作用至关重要,而蛋白质与核酸的相互作用在基因表达研究和调控机制的发现中起着关键作用。尽管有大量的计算努力来解决这一挑战,但由于从蛋白质中提取有意义的见解的复杂性,实现高精度仍然很困难。在这里,我们介绍了一种新的多模态蛋白质语言模型信息方法MegSite,它集成了来自蛋白质序列、结构和功能的判别知识。这项工作首次将ESM3多模态特征整合到核酸结合位点预测中。MegSite在多个独立测试集上的表现明显优于现有的预测方法。MegSite对DNA-129_Test、DNA-181_Test、RNA-117_Test和RNA-285_Test的马修斯相关系数值分别为0.567、0.444、0.411和0.421,分别比次优方法提高了2.72%、7.66%、1.22%和6.58%。值得注意的是,MegSite即使在结构相似性较低的蛋白质上也表现出稳健的性能,超过了以前基于结构的方法。此外,该方法可无缝扩展到预测的蛋白质结构和新发布的rna结合残基测试集,准确性高,突出了其广泛的适用性。综合实验结果表明,MegSite的优越性能归功于其对多模态蛋白质知识的有效整合。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Briefings in bioinformatics
Briefings in bioinformatics 生物-生化研究方法
CiteScore
13.20
自引率
13.70%
发文量
549
审稿时长
6 months
期刊介绍: Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data. The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信