{"title":"Exact p-values for global network alignments via combinatorial analysis of shared GO terms","authors":"Wayne B. Hayes","doi":"10.1007/s00285-024-02058-z","DOIUrl":null,"url":null,"abstract":"<p>Network alignment aims to uncover topologically similar regions in the protein–protein interaction (PPI) networks of two or more species under the assumption that topologically similar regions tend to perform similar functions. Although there exist a plethora of both network alignment algorithms and measures of topological similarity, currently no “gold standard” exists for evaluating how well either is able to uncover functionally similar regions. Here we propose a formal, mathematically and statistically rigorous method for evaluating the statistical significance of shared GO terms in a global, 1-to-1 alignment between two PPI networks. Given an alignment in which <i>k</i> aligned protein pairs share a particular GO term <i>g</i>, we use a combinatorial argument to precisely quantify the <i>p</i>-value of that alignment with respect to <i>g</i> compared to a random alignment. The <i>p</i>-value of the alignment with respect to <i>all</i> GO terms, including their inter-relationships, is approximated using the <i>Empirical Brown’s Method</i>. We note that, just as with BLAST’s <i>p</i>-values, this method is <i>not</i> designed to guide an alignment algorithm towards a solution; instead, just as with BLAST, an alignment is guided by a <i>scoring matrix or function</i>; the <i>p</i>-values herein are computed <i>after the fact</i>, providing independent feedback to the user on the <i>biological</i> quality of the alignment that was generated by optimizing the scoring function. Importantly, we demonstrate that among all GO-based measures of network alignments, ours is the only one that correlates with the precision of GO annotation <i>predictions</i>, paving the way for network alignment-based protein function prediction.</p>","PeriodicalId":2,"journal":{"name":"ACS Applied Bio Materials","volume":null,"pages":null},"PeriodicalIF":4.6000,"publicationDate":"2024-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Bio Materials","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1007/s00285-024-02058-z","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, BIOMATERIALS","Score":null,"Total":0}
引用次数: 0
Abstract
Network alignment aims to uncover topologically similar regions in the protein–protein interaction (PPI) networks of two or more species under the assumption that topologically similar regions tend to perform similar functions. Although there exist a plethora of both network alignment algorithms and measures of topological similarity, currently no “gold standard” exists for evaluating how well either is able to uncover functionally similar regions. Here we propose a formal, mathematically and statistically rigorous method for evaluating the statistical significance of shared GO terms in a global, 1-to-1 alignment between two PPI networks. Given an alignment in which k aligned protein pairs share a particular GO term g, we use a combinatorial argument to precisely quantify the p-value of that alignment with respect to g compared to a random alignment. The p-value of the alignment with respect to all GO terms, including their inter-relationships, is approximated using the Empirical Brown’s Method. We note that, just as with BLAST’s p-values, this method is not designed to guide an alignment algorithm towards a solution; instead, just as with BLAST, an alignment is guided by a scoring matrix or function; the p-values herein are computed after the fact, providing independent feedback to the user on the biological quality of the alignment that was generated by optimizing the scoring function. Importantly, we demonstrate that among all GO-based measures of network alignments, ours is the only one that correlates with the precision of GO annotation predictions, paving the way for network alignment-based protein function prediction.
网络配准旨在发现两个或多个物种的蛋白质-蛋白质相互作用(PPI)网络中拓扑相似的区域,其假设是拓扑相似的区域往往具有相似的功能。尽管存在大量的网络配准算法和拓扑相似性测量方法,但目前还没有 "黄金标准 "来评估这两种方法发现功能相似区域的能力。在这里,我们提出了一种正式的、数学上和统计学上严格的方法,用于评估两个 PPI 网络之间全局 1 对 1 配对中共享 GO 术语的统计意义。给定一个配对,其中 k 个配对蛋白质对共享一个特定的 GO 术语 g,我们使用一个组合论证来精确量化该配对与随机配对相比与 g 有关的 p 值。我们使用经验布朗法(Empirical Brown's Method)来近似计算与所有 GO 术语(包括它们之间的相互关系)有关的配准 p 值。我们注意到,就像 BLAST 的 p 值一样,这种方法并不是用来指导配准算法找到解决方案的;相反,就像 BLAST 一样,配准是由一个评分矩阵或函数来指导的;这里的 p 值是事后计算的,为用户提供了关于通过优化评分函数生成的配准的生物学质量的独立反馈。重要的是,我们证明了在所有基于 GO 的网络配准测量方法中,我们的方法是唯一能与 GO 注释预测精度相关联的方法,为基于网络配准的蛋白质功能预测铺平了道路。