Estimating support for protein-protein interaction data with applications to function prediction.

Computational systems bioinformatics. Computational Systems Bioinformatics Conference Pub Date : 2008-08-01 DOI:10.1142/9781848162648_0007

Erliang Zeng, C. Ding, G. Narasimhan, S. Holbrook

{"title":"Estimating support for protein-protein interaction data with applications to function prediction.","authors":"Erliang Zeng, C. Ding, G. Narasimhan, S. Holbrook","doi":"10.1142/9781848162648_0007","DOIUrl":null,"url":null,"abstract":"Almost every cellular process requires the interactions of pairs or larger complexes of proteins. High throughput protein-protein interaction (PPI) data have been generated using techniques such as the yeast two-hybrid systems, mass spectrometry method, and many more. Such data provide us with a new perspective to predict protein functions and to generate protein-protein interaction networks, and many recent algorithms have been developed for this purpose. However, PPI data generated using high throughput techniques contain a large number of false positives. In this paper, we have proposed a novel method to evaluate the support for PPI data based on gene ontology information. If the semantic similarity between genes is computed using gene ontology information and using Resnik's formula, then our results show that we can model the PPI data as a mixture model predicated on the assumption that true protein-protein interactions will have higher support than the false positives in the data. Thus semantic similarity between genes serves as a metric of support for PPI data. Taking it one step further, new function prediction approaches are also being proposed with the help of the proposed metric of the support for the PPI data. These new function prediction approaches outperform their conventional counterparts. New evaluation methods are also proposed.","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"7 1","pages":"73-84"},"PeriodicalIF":0.0000,"publicationDate":"2008-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/9781848162648_0007","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

Abstract

Almost every cellular process requires the interactions of pairs or larger complexes of proteins. High throughput protein-protein interaction (PPI) data have been generated using techniques such as the yeast two-hybrid systems, mass spectrometry method, and many more. Such data provide us with a new perspective to predict protein functions and to generate protein-protein interaction networks, and many recent algorithms have been developed for this purpose. However, PPI data generated using high throughput techniques contain a large number of false positives. In this paper, we have proposed a novel method to evaluate the support for PPI data based on gene ontology information. If the semantic similarity between genes is computed using gene ontology information and using Resnik's formula, then our results show that we can model the PPI data as a mixture model predicated on the assumption that true protein-protein interactions will have higher support than the false positives in the data. Thus semantic similarity between genes serves as a metric of support for PPI data. Taking it one step further, new function prediction approaches are also being proposed with the help of the proposed metric of the support for the PPI data. These new function prediction approaches outperform their conventional counterparts. New evaluation methods are also proposed.

查看原文本刊更多论文

估计蛋白质-蛋白质相互作用数据在功能预测中的应用支持度。

几乎每一个细胞过程都需要蛋白质对或更大的复合物的相互作用。高通量蛋白质-蛋白质相互作用(PPI)数据已经使用酵母双杂交系统、质谱法等技术生成。这些数据为我们预测蛋白质功能和生成蛋白质-蛋白质相互作用网络提供了一个新的视角，许多最近的算法已经为此目的而开发。然而，使用高通量技术生成的PPI数据包含大量假阳性。本文提出了一种基于基因本体信息的PPI数据支持度评价方法。如果使用基因本体信息和Resnik公式计算基因之间的语义相似度，那么我们的结果表明，我们可以将PPI数据建模为混合模型，该模型基于真实蛋白质-蛋白质相互作用比数据中的假阳性具有更高的支持度的假设。因此，基因之间的语义相似性作为支持PPI数据的度量。更进一步，新的功能预测方法也在PPI数据支持度指标的帮助下被提出。这些新的函数预测方法优于传统的方法。提出了新的评价方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computational systems bioinformatics. Computational Systems Bioinformatics Conference

自引率

0.00%

发文量