{"title":"ProteinRPN:利用基于图形的区域建议实现准确的蛋白质功能预测","authors":"Shania Mitra, Lei Huang, Manolis Kellis","doi":"arxiv-2409.00610","DOIUrl":null,"url":null,"abstract":"Protein function prediction is a crucial task in bioinformatics, with\nsignificant implications for understanding biological processes and disease\nmechanisms. While the relationship between sequence and function has been\nextensively explored, translating protein structure to function continues to\npresent substantial challenges. Various models, particularly, CNN and\ngraph-based deep learning approaches that integrate structural and functional\ndata, have been proposed to address these challenges. However, these methods\noften fall short in elucidating the functional significance of key residues\nessential for protein functionality, as they predominantly adopt a\nretrospective perspective, leading to suboptimal performance. Inspired by region proposal networks in computer vision, we introduce the\nProtein Region Proposal Network (ProteinRPN) for accurate protein function\nprediction. Specifically, the region proposal module component of ProteinRPN\nidentifies potential functional regions (anchors) which are refined through the\nhierarchy-aware node drop pooling layer favoring nodes with defined secondary\nstructures and spatial proximity. The representations of the predicted\nfunctional nodes are enriched using attention mechanisms and subsequently fed\ninto a Graph Multiset Transformer, which is trained with supervised contrastive\n(SupCon) and InfoNCE losses on perturbed protein structures. Our model\ndemonstrates significant improvements in predicting Gene Ontology (GO) terms,\neffectively localizing functional residues within protein structures. The\nproposed framework provides a robust, scalable solution for protein function\nannotation, advancing the understanding of protein structure-function\nrelationships in computational biology.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ProteinRPN: Towards Accurate Protein Function Prediction with Graph-Based Region Proposals\",\"authors\":\"Shania Mitra, Lei Huang, Manolis Kellis\",\"doi\":\"arxiv-2409.00610\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Protein function prediction is a crucial task in bioinformatics, with\\nsignificant implications for understanding biological processes and disease\\nmechanisms. While the relationship between sequence and function has been\\nextensively explored, translating protein structure to function continues to\\npresent substantial challenges. Various models, particularly, CNN and\\ngraph-based deep learning approaches that integrate structural and functional\\ndata, have been proposed to address these challenges. However, these methods\\noften fall short in elucidating the functional significance of key residues\\nessential for protein functionality, as they predominantly adopt a\\nretrospective perspective, leading to suboptimal performance. Inspired by region proposal networks in computer vision, we introduce the\\nProtein Region Proposal Network (ProteinRPN) for accurate protein function\\nprediction. Specifically, the region proposal module component of ProteinRPN\\nidentifies potential functional regions (anchors) which are refined through the\\nhierarchy-aware node drop pooling layer favoring nodes with defined secondary\\nstructures and spatial proximity. The representations of the predicted\\nfunctional nodes are enriched using attention mechanisms and subsequently fed\\ninto a Graph Multiset Transformer, which is trained with supervised contrastive\\n(SupCon) and InfoNCE losses on perturbed protein structures. Our model\\ndemonstrates significant improvements in predicting Gene Ontology (GO) terms,\\neffectively localizing functional residues within protein structures. The\\nproposed framework provides a robust, scalable solution for protein function\\nannotation, advancing the understanding of protein structure-function\\nrelationships in computational biology.\",\"PeriodicalId\":501266,\"journal\":{\"name\":\"arXiv - QuanBio - Quantitative Methods\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - QuanBio - Quantitative Methods\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.00610\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Quantitative Methods","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.00610","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
蛋白质功能预测是生物信息学的一项重要任务,对了解生物过程和疾病机制具有重要意义。虽然序列与功能之间的关系已被广泛探索,但将蛋白质结构转化为功能仍面临巨大挑战。为了应对这些挑战,人们提出了各种模型,特别是整合了结构和功能数据的 CNN 和基于图谱的深度学习方法。然而,这些方法在阐明对蛋白质功能至关重要的关键残基的功能意义方面往往存在不足,因为它们主要采用的是回顾性视角,导致性能不理想。受计算机视觉中区域提议网络的启发,我们引入了用于准确预测蛋白质功能的蛋白质区域提议网络(ProteinRPN)。具体来说,ProteinRPN 的区域建议模块组件识别潜在的功能区域(锚点),并通过层级感知的节点丢弃池层(node drop pooling layer)对这些锚点进行细化,优先选择具有确定次级结构和空间邻近性的节点。预测功能节点的表征通过注意力机制得到丰富,随后输入到图形多集变换器中,该变换器通过对扰动蛋白质结构的监督对比(SupCon)和 InfoNCE 损失进行训练。我们的模型证明了在预测基因本体(GO)术语方面的显著改进,有效地定位了蛋白质结构中的功能残基。所提出的框架为蛋白质功能注释提供了一个稳健、可扩展的解决方案,推动了计算生物学对蛋白质结构-功能关系的理解。
ProteinRPN: Towards Accurate Protein Function Prediction with Graph-Based Region Proposals
Protein function prediction is a crucial task in bioinformatics, with
significant implications for understanding biological processes and disease
mechanisms. While the relationship between sequence and function has been
extensively explored, translating protein structure to function continues to
present substantial challenges. Various models, particularly, CNN and
graph-based deep learning approaches that integrate structural and functional
data, have been proposed to address these challenges. However, these methods
often fall short in elucidating the functional significance of key residues
essential for protein functionality, as they predominantly adopt a
retrospective perspective, leading to suboptimal performance. Inspired by region proposal networks in computer vision, we introduce the
Protein Region Proposal Network (ProteinRPN) for accurate protein function
prediction. Specifically, the region proposal module component of ProteinRPN
identifies potential functional regions (anchors) which are refined through the
hierarchy-aware node drop pooling layer favoring nodes with defined secondary
structures and spatial proximity. The representations of the predicted
functional nodes are enriched using attention mechanisms and subsequently fed
into a Graph Multiset Transformer, which is trained with supervised contrastive
(SupCon) and InfoNCE losses on perturbed protein structures. Our model
demonstrates significant improvements in predicting Gene Ontology (GO) terms,
effectively localizing functional residues within protein structures. The
proposed framework provides a robust, scalable solution for protein function
annotation, advancing the understanding of protein structure-function
relationships in computational biology.