Exact p-values for global network alignments via combinatorial analysis of shared GO terms

IF 2.2 4区 数学 Q2 BIOLOGY
Wayne B. Hayes
{"title":"Exact p-values for global network alignments via combinatorial analysis of shared GO terms","authors":"Wayne B. Hayes","doi":"10.1007/s00285-024-02058-z","DOIUrl":null,"url":null,"abstract":"<p>Network alignment aims to uncover topologically similar regions in the protein–protein interaction (PPI) networks of two or more species under the assumption that topologically similar regions tend to perform similar functions. Although there exist a plethora of both network alignment algorithms and measures of topological similarity, currently no “gold standard” exists for evaluating how well either is able to uncover functionally similar regions. Here we propose a formal, mathematically and statistically rigorous method for evaluating the statistical significance of shared GO terms in a global, 1-to-1 alignment between two PPI networks. Given an alignment in which <i>k</i> aligned protein pairs share a particular GO term <i>g</i>, we use a combinatorial argument to precisely quantify the <i>p</i>-value of that alignment with respect to <i>g</i> compared to a random alignment. The <i>p</i>-value of the alignment with respect to <i>all</i> GO terms, including their inter-relationships, is approximated using the <i>Empirical Brown’s Method</i>. We note that, just as with BLAST’s <i>p</i>-values, this method is <i>not</i> designed to guide an alignment algorithm towards a solution; instead, just as with BLAST, an alignment is guided by a <i>scoring matrix or function</i>; the <i>p</i>-values herein are computed <i>after the fact</i>, providing independent feedback to the user on the <i>biological</i> quality of the alignment that was generated by optimizing the scoring function. Importantly, we demonstrate that among all GO-based measures of network alignments, ours is the only one that correlates with the precision of GO annotation <i>predictions</i>, paving the way for network alignment-based protein function prediction.</p>","PeriodicalId":50148,"journal":{"name":"Journal of Mathematical Biology","volume":null,"pages":null},"PeriodicalIF":2.2000,"publicationDate":"2024-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Mathematical Biology","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1007/s00285-024-02058-z","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Network alignment aims to uncover topologically similar regions in the protein–protein interaction (PPI) networks of two or more species under the assumption that topologically similar regions tend to perform similar functions. Although there exist a plethora of both network alignment algorithms and measures of topological similarity, currently no “gold standard” exists for evaluating how well either is able to uncover functionally similar regions. Here we propose a formal, mathematically and statistically rigorous method for evaluating the statistical significance of shared GO terms in a global, 1-to-1 alignment between two PPI networks. Given an alignment in which k aligned protein pairs share a particular GO term g, we use a combinatorial argument to precisely quantify the p-value of that alignment with respect to g compared to a random alignment. The p-value of the alignment with respect to all GO terms, including their inter-relationships, is approximated using the Empirical Brown’s Method. We note that, just as with BLAST’s p-values, this method is not designed to guide an alignment algorithm towards a solution; instead, just as with BLAST, an alignment is guided by a scoring matrix or function; the p-values herein are computed after the fact, providing independent feedback to the user on the biological quality of the alignment that was generated by optimizing the scoring function. Importantly, we demonstrate that among all GO-based measures of network alignments, ours is the only one that correlates with the precision of GO annotation predictions, paving the way for network alignment-based protein function prediction.

Abstract Image

通过对共享 GO 术语的组合分析得出全局网络排列的精确 p 值
网络配准旨在发现两个或多个物种的蛋白质-蛋白质相互作用(PPI)网络中拓扑相似的区域,其假设是拓扑相似的区域往往具有相似的功能。尽管存在大量的网络配准算法和拓扑相似性测量方法,但目前还没有 "黄金标准 "来评估这两种方法发现功能相似区域的能力。在这里,我们提出了一种正式的、数学上和统计学上严格的方法,用于评估两个 PPI 网络之间全局 1 对 1 配对中共享 GO 术语的统计意义。给定一个配对,其中 k 个配对蛋白质对共享一个特定的 GO 术语 g,我们使用一个组合论证来精确量化该配对与随机配对相比与 g 有关的 p 值。我们使用经验布朗法(Empirical Brown's Method)来近似计算与所有 GO 术语(包括它们之间的相互关系)有关的配准 p 值。我们注意到,就像 BLAST 的 p 值一样,这种方法并不是用来指导配准算法找到解决方案的;相反,就像 BLAST 一样,配准是由一个评分矩阵或函数来指导的;这里的 p 值是事后计算的,为用户提供了关于通过优化评分函数生成的配准的生物学质量的独立反馈。重要的是,我们证明了在所有基于 GO 的网络配准测量方法中,我们的方法是唯一能与 GO 注释预测精度相关联的方法,为基于网络配准的蛋白质功能预测铺平了道路。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
3.30
自引率
5.30%
发文量
120
审稿时长
6 months
期刊介绍: The Journal of Mathematical Biology focuses on mathematical biology - work that uses mathematical approaches to gain biological understanding or explain biological phenomena. Areas of biology covered include, but are not restricted to, cell biology, physiology, development, neurobiology, genetics and population genetics, population biology, ecology, behavioural biology, evolution, epidemiology, immunology, molecular biology, biofluids, DNA and protein structure and function. All mathematical approaches including computational and visualization approaches are appropriate.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信