门控gps:利用可扩展学习和不平衡感知优化增强蛋白质相互作用位点预测。

IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS
Xin Gao, Hanqun Cao, Jinpeng Li, Jiezhong Qiu, Guangyong Chen, Pheng-Ann Heng
{"title":"门控gps:利用可扩展学习和不平衡感知优化增强蛋白质相互作用位点预测。","authors":"Xin Gao, Hanqun Cao, Jinpeng Li, Jiezhong Qiu, Guangyong Chen, Pheng-Ann Heng","doi":"10.1093/bib/bbaf248","DOIUrl":null,"url":null,"abstract":"<p><p>In protein-protein interaction site (PPIS) prediction, existing machine learning models struggle with small datasets, limiting their predictive accuracy for unseen proteins. Additionally, class imbalance in protein complexes, where binding residues constitute a small fraction of all residues, hinders model performance. To address these challenges, we constructed a training dataset 9$\\times $ larger than previous benchmarks by filtering the latest protein-protein complex data, improving diversity and generalization. We propose Gated-GPS, a Graph Transformer model with a novel gating mechanism designed to effectively leverage this expanded dataset. Additionally, we integrate cross-entropy loss with Tversky Loss to adjust sensitivity to positive and negative samples, mitigating class imbalance by emphasizing underrepresented binding residues. Experimental results show that Gated-GPS outperforms state-of-the-art (SOTA) models across four test sets. Notably, on the UBTest dataset, designed to evaluate generalization on unbounded proteins, our method improves MCC and AUPRC by 18.5% and 21.4%, respectively, over the previous SOTA. In a case study of snake venom toxin-protein interactions, our model accurately identified interaction sites, demonstrating its potential for therapeutic design and advancing the understanding of complex protein interactions.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12133684/pdf/","citationCount":"0","resultStr":"{\"title\":\"Gated-GPS: enhancing protein-protein interaction site prediction with scalable learning and imbalance-aware optimization.\",\"authors\":\"Xin Gao, Hanqun Cao, Jinpeng Li, Jiezhong Qiu, Guangyong Chen, Pheng-Ann Heng\",\"doi\":\"10.1093/bib/bbaf248\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>In protein-protein interaction site (PPIS) prediction, existing machine learning models struggle with small datasets, limiting their predictive accuracy for unseen proteins. Additionally, class imbalance in protein complexes, where binding residues constitute a small fraction of all residues, hinders model performance. To address these challenges, we constructed a training dataset 9$\\\\times $ larger than previous benchmarks by filtering the latest protein-protein complex data, improving diversity and generalization. We propose Gated-GPS, a Graph Transformer model with a novel gating mechanism designed to effectively leverage this expanded dataset. Additionally, we integrate cross-entropy loss with Tversky Loss to adjust sensitivity to positive and negative samples, mitigating class imbalance by emphasizing underrepresented binding residues. Experimental results show that Gated-GPS outperforms state-of-the-art (SOTA) models across four test sets. Notably, on the UBTest dataset, designed to evaluate generalization on unbounded proteins, our method improves MCC and AUPRC by 18.5% and 21.4%, respectively, over the previous SOTA. In a case study of snake venom toxin-protein interactions, our model accurately identified interaction sites, demonstrating its potential for therapeutic design and advancing the understanding of complex protein interactions.</p>\",\"PeriodicalId\":9209,\"journal\":{\"name\":\"Briefings in bioinformatics\",\"volume\":\"26 3\",\"pages\":\"\"},\"PeriodicalIF\":6.8000,\"publicationDate\":\"2025-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12133684/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Briefings in bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/bib/bbaf248\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbaf248","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

摘要

在蛋白质-蛋白质相互作用位点(PPIS)预测中,现有的机器学习模型与小数据集作斗争,限制了它们对未知蛋白质的预测准确性。此外,蛋白质复合物的类不平衡,其中结合残基占所有残基的一小部分,阻碍了模型的性能。为了解决这些挑战,我们通过过滤最新的蛋白质-蛋白质复合物数据,构建了一个比以前的基准大9倍的训练数据集,提高了多样性和泛化。我们提出了gate - gps,这是一个具有新颖门控机制的图形转换器模型,旨在有效地利用这个扩展的数据集。此外,我们将交叉熵损失与Tversky损失相结合,以调整对正样本和负样本的敏感性,通过强调未充分代表的结合残基来减轻类不平衡。实验结果表明,门控gps在四个测试集上的性能都优于最先进的SOTA模型。值得注意的是,在设计用于评估无界蛋白泛化的UBTest数据集上,我们的方法比之前的SOTA分别提高了18.5%和21.4%的MCC和AUPRC。在蛇毒毒素-蛋白质相互作用的案例研究中,我们的模型准确地识别了相互作用位点,证明了其治疗设计的潜力,并促进了对复杂蛋白质相互作用的理解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Gated-GPS: enhancing protein-protein interaction site prediction with scalable learning and imbalance-aware optimization.

In protein-protein interaction site (PPIS) prediction, existing machine learning models struggle with small datasets, limiting their predictive accuracy for unseen proteins. Additionally, class imbalance in protein complexes, where binding residues constitute a small fraction of all residues, hinders model performance. To address these challenges, we constructed a training dataset 9$\times $ larger than previous benchmarks by filtering the latest protein-protein complex data, improving diversity and generalization. We propose Gated-GPS, a Graph Transformer model with a novel gating mechanism designed to effectively leverage this expanded dataset. Additionally, we integrate cross-entropy loss with Tversky Loss to adjust sensitivity to positive and negative samples, mitigating class imbalance by emphasizing underrepresented binding residues. Experimental results show that Gated-GPS outperforms state-of-the-art (SOTA) models across four test sets. Notably, on the UBTest dataset, designed to evaluate generalization on unbounded proteins, our method improves MCC and AUPRC by 18.5% and 21.4%, respectively, over the previous SOTA. In a case study of snake venom toxin-protein interactions, our model accurately identified interaction sites, demonstrating its potential for therapeutic design and advancing the understanding of complex protein interactions.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Briefings in bioinformatics
Briefings in bioinformatics 生物-生化研究方法
CiteScore
13.20
自引率
13.70%
发文量
549
审稿时长
6 months
期刊介绍: Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data. The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信