{"title":"MegSite: an accurate nucleic acid-binding residue prediction method based on multimodal protein language model.","authors":"Feng Hu, Wenwu Zeng, Shaoliang Peng","doi":"10.1093/bib/bbaf524","DOIUrl":null,"url":null,"abstract":"<p><p>Accurate identification of nucleic acid-binding residues is crucial for understanding protein-nucleic acid interactions, which play a key role in gene expression research and the discovery of regulatory mechanisms. Despite numerous computational efforts to address this challenge, achieving high accuracy remains difficult due to the complexity of extracting meaningful insights from proteins. Here, we introduce MegSite, a novel multimodal protein language model-informed method that integrates discriminative knowledge from protein sequence, structure, and function. This work presents the first integration of ESM3 multimodal features for nucleic acid-binding site prediction. MegSite significantly outperforms existing prediction methods, as evidenced by its performance on multiple independent test sets. The Matthews correlation coefficient values achieved by MegSite on DNA-129_Test, DNA-181_Test, RNA-117_Test, and RNA-285_Test are 0.567, 0.444, 0.411, and 0.421, representing the improvements of 2.72%, 7.66%, 1.22% and 6.58% over the second-best method separately. Notably, MegSite demonstrates robust performance even on proteins with low structural similarity, surpassing the previous structure-based methods. Furthermore, this method is seamlessly extendable to the predicted protein structure and a newly released RNA-binding residue test set with high accuracy, highlighting its broad applicability. Comprehensive experimental results reveal that the superior performance of MegSite is attributed to its effective integration of multimodal protein knowledge.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 5","pages":""},"PeriodicalIF":7.7000,"publicationDate":"2025-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12496013/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbaf524","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate identification of nucleic acid-binding residues is crucial for understanding protein-nucleic acid interactions, which play a key role in gene expression research and the discovery of regulatory mechanisms. Despite numerous computational efforts to address this challenge, achieving high accuracy remains difficult due to the complexity of extracting meaningful insights from proteins. Here, we introduce MegSite, a novel multimodal protein language model-informed method that integrates discriminative knowledge from protein sequence, structure, and function. This work presents the first integration of ESM3 multimodal features for nucleic acid-binding site prediction. MegSite significantly outperforms existing prediction methods, as evidenced by its performance on multiple independent test sets. The Matthews correlation coefficient values achieved by MegSite on DNA-129_Test, DNA-181_Test, RNA-117_Test, and RNA-285_Test are 0.567, 0.444, 0.411, and 0.421, representing the improvements of 2.72%, 7.66%, 1.22% and 6.58% over the second-best method separately. Notably, MegSite demonstrates robust performance even on proteins with low structural similarity, surpassing the previous structure-based methods. Furthermore, this method is seamlessly extendable to the predicted protein structure and a newly released RNA-binding residue test set with high accuracy, highlighting its broad applicability. Comprehensive experimental results reveal that the superior performance of MegSite is attributed to its effective integration of multimodal protein knowledge.
期刊介绍:
Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data.
The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.