{"title":"ProteinRPN: Towards Accurate Protein Function Prediction with Graph-Based Region Proposals","authors":"Shania Mitra, Lei Huang, Manolis Kellis","doi":"arxiv-2409.00610","DOIUrl":null,"url":null,"abstract":"Protein function prediction is a crucial task in bioinformatics, with\nsignificant implications for understanding biological processes and disease\nmechanisms. While the relationship between sequence and function has been\nextensively explored, translating protein structure to function continues to\npresent substantial challenges. Various models, particularly, CNN and\ngraph-based deep learning approaches that integrate structural and functional\ndata, have been proposed to address these challenges. However, these methods\noften fall short in elucidating the functional significance of key residues\nessential for protein functionality, as they predominantly adopt a\nretrospective perspective, leading to suboptimal performance. Inspired by region proposal networks in computer vision, we introduce the\nProtein Region Proposal Network (ProteinRPN) for accurate protein function\nprediction. Specifically, the region proposal module component of ProteinRPN\nidentifies potential functional regions (anchors) which are refined through the\nhierarchy-aware node drop pooling layer favoring nodes with defined secondary\nstructures and spatial proximity. The representations of the predicted\nfunctional nodes are enriched using attention mechanisms and subsequently fed\ninto a Graph Multiset Transformer, which is trained with supervised contrastive\n(SupCon) and InfoNCE losses on perturbed protein structures. Our model\ndemonstrates significant improvements in predicting Gene Ontology (GO) terms,\neffectively localizing functional residues within protein structures. The\nproposed framework provides a robust, scalable solution for protein function\nannotation, advancing the understanding of protein structure-function\nrelationships in computational biology.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Quantitative Methods","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.00610","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Protein function prediction is a crucial task in bioinformatics, with
significant implications for understanding biological processes and disease
mechanisms. While the relationship between sequence and function has been
extensively explored, translating protein structure to function continues to
present substantial challenges. Various models, particularly, CNN and
graph-based deep learning approaches that integrate structural and functional
data, have been proposed to address these challenges. However, these methods
often fall short in elucidating the functional significance of key residues
essential for protein functionality, as they predominantly adopt a
retrospective perspective, leading to suboptimal performance. Inspired by region proposal networks in computer vision, we introduce the
Protein Region Proposal Network (ProteinRPN) for accurate protein function
prediction. Specifically, the region proposal module component of ProteinRPN
identifies potential functional regions (anchors) which are refined through the
hierarchy-aware node drop pooling layer favoring nodes with defined secondary
structures and spatial proximity. The representations of the predicted
functional nodes are enriched using attention mechanisms and subsequently fed
into a Graph Multiset Transformer, which is trained with supervised contrastive
(SupCon) and InfoNCE losses on perturbed protein structures. Our model
demonstrates significant improvements in predicting Gene Ontology (GO) terms,
effectively localizing functional residues within protein structures. The
proposed framework provides a robust, scalable solution for protein function
annotation, advancing the understanding of protein structure-function
relationships in computational biology.