ProteinRPN: Towards Accurate Protein Function Prediction with Graph-Based Region Proposals

arXiv - QuanBio - Quantitative Methods Pub Date : 2024-09-01 DOI:arxiv-2409.00610

Shania Mitra, Lei Huang, Manolis Kellis

{"title":"ProteinRPN: Towards Accurate Protein Function Prediction with Graph-Based Region Proposals","authors":"Shania Mitra, Lei Huang, Manolis Kellis","doi":"arxiv-2409.00610","DOIUrl":null,"url":null,"abstract":"Protein function prediction is a crucial task in bioinformatics, with\nsignificant implications for understanding biological processes and disease\nmechanisms. While the relationship between sequence and function has been\nextensively explored, translating protein structure to function continues to\npresent substantial challenges. Various models, particularly, CNN and\ngraph-based deep learning approaches that integrate structural and functional\ndata, have been proposed to address these challenges. However, these methods\noften fall short in elucidating the functional significance of key residues\nessential for protein functionality, as they predominantly adopt a\nretrospective perspective, leading to suboptimal performance. Inspired by region proposal networks in computer vision, we introduce the\nProtein Region Proposal Network (ProteinRPN) for accurate protein function\nprediction. Specifically, the region proposal module component of ProteinRPN\nidentifies potential functional regions (anchors) which are refined through the\nhierarchy-aware node drop pooling layer favoring nodes with defined secondary\nstructures and spatial proximity. The representations of the predicted\nfunctional nodes are enriched using attention mechanisms and subsequently fed\ninto a Graph Multiset Transformer, which is trained with supervised contrastive\n(SupCon) and InfoNCE losses on perturbed protein structures. Our model\ndemonstrates significant improvements in predicting Gene Ontology (GO) terms,\neffectively localizing functional residues within protein structures. The\nproposed framework provides a robust, scalable solution for protein function\nannotation, advancing the understanding of protein structure-function\nrelationships in computational biology.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"203 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Quantitative Methods","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.00610","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Protein function prediction is a crucial task in bioinformatics, with significant implications for understanding biological processes and disease mechanisms. While the relationship between sequence and function has been extensively explored, translating protein structure to function continues to present substantial challenges. Various models, particularly, CNN and graph-based deep learning approaches that integrate structural and functional data, have been proposed to address these challenges. However, these methods often fall short in elucidating the functional significance of key residues essential for protein functionality, as they predominantly adopt a retrospective perspective, leading to suboptimal performance. Inspired by region proposal networks in computer vision, we introduce the Protein Region Proposal Network (ProteinRPN) for accurate protein function prediction. Specifically, the region proposal module component of ProteinRPN identifies potential functional regions (anchors) which are refined through the hierarchy-aware node drop pooling layer favoring nodes with defined secondary structures and spatial proximity. The representations of the predicted functional nodes are enriched using attention mechanisms and subsequently fed into a Graph Multiset Transformer, which is trained with supervised contrastive (SupCon) and InfoNCE losses on perturbed protein structures. Our model demonstrates significant improvements in predicting Gene Ontology (GO) terms, effectively localizing functional residues within protein structures. The proposed framework provides a robust, scalable solution for protein function annotation, advancing the understanding of protein structure-function relationships in computational biology.

查看原文本刊更多论文

ProteinRPN：利用基于图形的区域建议实现准确的蛋白质功能预测

蛋白质功能预测是生物信息学的一项重要任务，对了解生物过程和疾病机制具有重要意义。虽然序列与功能之间的关系已被广泛探索，但将蛋白质结构转化为功能仍面临巨大挑战。为了应对这些挑战，人们提出了各种模型，特别是整合了结构和功能数据的 CNN 和基于图谱的深度学习方法。然而，这些方法在阐明对蛋白质功能至关重要的关键残基的功能意义方面往往存在不足，因为它们主要采用的是回顾性视角，导致性能不理想。受计算机视觉中区域提议网络的启发，我们引入了用于准确预测蛋白质功能的蛋白质区域提议网络（ProteinRPN）。具体来说，ProteinRPN 的区域建议模块组件识别潜在的功能区域（锚点），并通过层级感知的节点丢弃池层（node drop pooling layer）对这些锚点进行细化，优先选择具有确定次级结构和空间邻近性的节点。预测功能节点的表征通过注意力机制得到丰富，随后输入到图形多集变换器中，该变换器通过对扰动蛋白质结构的监督对比（SupCon）和 InfoNCE 损失进行训练。我们的模型证明了在预测基因本体（GO）术语方面的显著改进，有效地定位了蛋白质结构中的功能残基。所提出的框架为蛋白质功能注释提供了一个稳健、可扩展的解决方案，推动了计算生物学对蛋白质结构-功能关系的理解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - QuanBio - Quantitative Methods

自引率

0.00%

发文量