通过对共享 GO 术语的组合分析得出全局网络排列的精确 p 值

IF 2.2 4区数学 Q2 BIOLOGY

Journal of Mathematical Biology Pub Date : 2024-03-29 DOI:10.1007/s00285-024-02058-z

Wayne B. Hayes

{"title":"通过对共享 GO 术语的组合分析得出全局网络排列的精确 p 值","authors":"Wayne B. Hayes","doi":"10.1007/s00285-024-02058-z","DOIUrl":null,"url":null,"abstract":"Network alignment aims to uncover topologically similar regions in the protein–protein interaction (PPI) networks of two or more species under the assumption that topologically similar regions tend to perform similar functions. Although there exist a plethora of both network alignment algorithms and measures of topological similarity, currently no “gold standard” exists for evaluating how well either is able to uncover functionally similar regions. Here we propose a formal, mathematically and statistically rigorous method for evaluating the statistical significance of shared GO terms in a global, 1-to-1 alignment between two PPI networks. Given an alignment in which k aligned protein pairs share a particular GO term g, we use a combinatorial argument to precisely quantify the p-value of that alignment with respect to g compared to a random alignment. The p-value of the alignment with respect to all GO terms, including their inter-relationships, is approximated using the Empirical Brown’s Method. We note that, just as with BLAST’s p-values, this method is not designed to guide an alignment algorithm towards a solution; instead, just as with BLAST, an alignment is guided by a scoring matrix or function; the p-values herein are computed after the fact, providing independent feedback to the user on the biological quality of the alignment that was generated by optimizing the scoring function. Importantly, we demonstrate that among all GO-based measures of network alignments, ours is the only one that correlates with the precision of GO annotation predictions, paving the way for network alignment-based protein function prediction.","PeriodicalId":50148,"journal":{"name":"Journal of Mathematical Biology","volume":"111 1","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2024-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Exact p-values for global network alignments via combinatorial analysis of shared GO terms\",\"authors\":\"Wayne B. Hayes\",\"doi\":\"10.1007/s00285-024-02058-z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Network alignment aims to uncover topologically similar regions in the protein–protein interaction (PPI) networks of two or more species under the assumption that topologically similar regions tend to perform similar functions. Although there exist a plethora of both network alignment algorithms and measures of topological similarity, currently no “gold standard” exists for evaluating how well either is able to uncover functionally similar regions. Here we propose a formal, mathematically and statistically rigorous method for evaluating the statistical significance of shared GO terms in a global, 1-to-1 alignment between two PPI networks. Given an alignment in which k aligned protein pairs share a particular GO term g, we use a combinatorial argument to precisely quantify the p-value of that alignment with respect to g compared to a random alignment. The p-value of the alignment with respect to all GO terms, including their inter-relationships, is approximated using the Empirical Brown’s Method. We note that, just as with BLAST’s p-values, this method is not designed to guide an alignment algorithm towards a solution; instead, just as with BLAST, an alignment is guided by a scoring matrix or function; the p-values herein are computed after the fact, providing independent feedback to the user on the biological quality of the alignment that was generated by optimizing the scoring function. Importantly, we demonstrate that among all GO-based measures of network alignments, ours is the only one that correlates with the precision of GO annotation predictions, paving the way for network alignment-based protein function prediction.\",\"PeriodicalId\":50148,\"journal\":{\"name\":\"Journal of Mathematical Biology\",\"volume\":\"111 1\",\"pages\":\"\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2024-03-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Mathematical Biology\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1007/s00285-024-02058-z\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Mathematical Biology","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1007/s00285-024-02058-z","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

网络配准旨在发现两个或多个物种的蛋白质-蛋白质相互作用（PPI）网络中拓扑相似的区域，其假设是拓扑相似的区域往往具有相似的功能。尽管存在大量的网络配准算法和拓扑相似性测量方法，但目前还没有 "黄金标准 "来评估这两种方法发现功能相似区域的能力。在这里，我们提出了一种正式的、数学上和统计学上严格的方法，用于评估两个 PPI 网络之间全局 1 对 1 配对中共享 GO 术语的统计意义。给定一个配对，其中 k 个配对蛋白质对共享一个特定的 GO 术语 g，我们使用一个组合论证来精确量化该配对与随机配对相比与 g 有关的 p 值。我们使用经验布朗法（Empirical Brown's Method）来近似计算与所有 GO 术语（包括它们之间的相互关系）有关的配准 p 值。我们注意到，就像 BLAST 的 p 值一样，这种方法并不是用来指导配准算法找到解决方案的；相反，就像 BLAST 一样，配准是由一个评分矩阵或函数来指导的；这里的 p 值是事后计算的，为用户提供了关于通过优化评分函数生成的配准的生物学质量的独立反馈。重要的是，我们证明了在所有基于 GO 的网络配准测量方法中，我们的方法是唯一能与 GO 注释预测精度相关联的方法，为基于网络配准的蛋白质功能预测铺平了道路。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Exact p-values for global network alignments via combinatorial analysis of shared GO terms

查看原文本刊更多论文

Exact p-values for global network alignments via combinatorial analysis of shared GO terms

Network alignment aims to uncover topologically similar regions in the protein–protein interaction (PPI) networks of two or more species under the assumption that topologically similar regions tend to perform similar functions. Although there exist a plethora of both network alignment algorithms and measures of topological similarity, currently no “gold standard” exists for evaluating how well either is able to uncover functionally similar regions. Here we propose a formal, mathematically and statistically rigorous method for evaluating the statistical significance of shared GO terms in a global, 1-to-1 alignment between two PPI networks. Given an alignment in which k aligned protein pairs share a particular GO term g, we use a combinatorial argument to precisely quantify the p-value of that alignment with respect to g compared to a random alignment. The p-value of the alignment with respect to all GO terms, including their inter-relationships, is approximated using the Empirical Brown’s Method. We note that, just as with BLAST’s p-values, this method is not designed to guide an alignment algorithm towards a solution; instead, just as with BLAST, an alignment is guided by a scoring matrix or function; the p-values herein are computed after the fact, providing independent feedback to the user on the biological quality of the alignment that was generated by optimizing the scoring function. Importantly, we demonstrate that among all GO-based measures of network alignments, ours is the only one that correlates with the precision of GO annotation predictions, paving the way for network alignment-based protein function prediction.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Mathematical Biology 生物-生物学

CiteScore

3.30

自引率

5.30%

发文量

120

审稿时长

6 months

期刊介绍： The Journal of Mathematical Biology focuses on mathematical biology - work that uses mathematical approaches to gain biological understanding or explain biological phenomena. Areas of biology covered include, but are not restricted to, cell biology, physiology, development, neurobiology, genetics and population genetics, population biology, ecology, behavioural biology, evolution, epidemiology, immunology, molecular biology, biofluids, DNA and protein structure and function. All mathematical approaches including computational and visualization approaches are appropriate.