A max-flow based approach to the identification of protein complexes using protein interaction and microarray data.

Computational systems bioinformatics. Computational Systems Bioinformatics Conference Pub Date : 2008-01-01

Jianxing Feng, Rui Jiang, Tao Jiang

{"title":"A max-flow based approach to the identification of protein complexes using protein interaction and microarray data.","authors":"Jianxing Feng, Rui Jiang, Tao Jiang","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>The emergence of high-throughput technologies leads to abundant protein-protein interaction (PPI) data and microarray gene expression profiles, and provides a great opportunity for the identification of novel protein complexes using computational methods. Although it has been demonstrated in the literature that methods using protein-protein interaction data alone can successfully predict a large number of protein complexes, the incorporation of gene expression profiles could help refine the putative complexes and hence improve the accuracy of the computational methods. By combining protein-protein interaction data and microarray gene expression profiles, we propose a novel Graph Fragmentation Algorithm (GFA) for protein complex identification. Adapted from a classical max-flow algorithm for finding the (weighted) densest subgraphs, GFA first finds large (weighted) dense subgraphs in a protein-protein interaction network and then breaks each such subgraph into fragments iteratively by weighting its nodes appropriately in terms of their corresponding log fold changes in the microarray data, until the fragment subgraphs are sufficiently small. Our extensive tests on three widely used protein-protein interaction datasets and comparisons with the latest methods for protein complex identification demonstrate the superior performance of our method in terms of accuracy, efficiency, and capability in predicting novel protein complexes. Given the high specificity (or precision) that our method has achieved, we conjecture that our prediction results imply more than 200 novel protein complexes.</p>","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"7 ","pages":"51-62"},"PeriodicalIF":0.0000,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The emergence of high-throughput technologies leads to abundant protein-protein interaction (PPI) data and microarray gene expression profiles, and provides a great opportunity for the identification of novel protein complexes using computational methods. Although it has been demonstrated in the literature that methods using protein-protein interaction data alone can successfully predict a large number of protein complexes, the incorporation of gene expression profiles could help refine the putative complexes and hence improve the accuracy of the computational methods. By combining protein-protein interaction data and microarray gene expression profiles, we propose a novel Graph Fragmentation Algorithm (GFA) for protein complex identification. Adapted from a classical max-flow algorithm for finding the (weighted) densest subgraphs, GFA first finds large (weighted) dense subgraphs in a protein-protein interaction network and then breaks each such subgraph into fragments iteratively by weighting its nodes appropriately in terms of their corresponding log fold changes in the microarray data, until the fragment subgraphs are sufficiently small. Our extensive tests on three widely used protein-protein interaction datasets and comparisons with the latest methods for protein complex identification demonstrate the superior performance of our method in terms of accuracy, efficiency, and capability in predicting novel protein complexes. Given the high specificity (or precision) that our method has achieved, we conjecture that our prediction results imply more than 200 novel protein complexes.

本刊更多论文

利用蛋白质相互作用和微阵列数据，基于最大流量的方法来鉴定蛋白质复合物。

高通量技术的出现带来了丰富的蛋白质-蛋白质相互作用（PPI）数据和微阵列基因表达谱，并为使用计算方法鉴定新的蛋白质复合物提供了很大的机会。虽然文献已经证明，仅使用蛋白质-蛋白质相互作用数据的方法可以成功地预测大量蛋白质复合物，但基因表达谱的结合可以帮助改进假定的复合物，从而提高计算方法的准确性。通过结合蛋白质-蛋白质相互作用数据和微阵列基因表达谱，我们提出了一种新的用于蛋白质复合物识别的图碎片算法（GFA）。GFA改编自寻找（加权）最密集子图的经典最大流算法，首先在蛋白质-蛋白质相互作用网络中找到大的（加权）密集子图，然后根据微阵列数据中相应的对数折叠变化对其节点进行适当加权，迭代地将每个这样的子图分解为片段，直到片段子图足够小。我们对三种广泛使用的蛋白质-蛋白质相互作用数据集进行了广泛的测试，并与最新的蛋白质复合物鉴定方法进行了比较，证明了我们的方法在预测新型蛋白质复合物的准确性、效率和能力方面具有优越的性能。鉴于我们的方法已经达到的高特异性（或精度），我们推测我们的预测结果意味着超过200种新的蛋白质复合物。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computational systems bioinformatics. Computational Systems Bioinformatics Conference

自引率

0.00%

发文量