Fast Parallel Markov Clustering in Bioinformatics Using Massively Parallel Graphics Processing Unit Computing

Infinity Pub Date : 2010-09-30 DOI:10.1109/PDMC-HIBI.2010.23

A. Bustamam, K. Burrage, N. Hamilton

{"title":"Fast Parallel Markov Clustering in Bioinformatics Using Massively Parallel Graphics Processing Unit Computing","authors":"A. Bustamam, K. Burrage, N. Hamilton","doi":"10.1109/PDMC-HIBI.2010.23","DOIUrl":null,"url":null,"abstract":"Markov clustering is becoming a key algorithm with in bioinformatics for determining clusters in networks. For instance, clustering protein interaction networks is helping find genes implicated in diseases such as cancer. However, with fast sequencing and other technologies generating vast amounts of data on biological networks, performance and scalability issues are becoming a critical limiting factorin applications. Meanwhile, Graphics Processing (GPU)computing, which uses a massively parallel computing environment in the GPU card, is becoming a very powerful, efficient and low cost option to achieve substantial performance gains over CPU approaches. This paper introduces a very fast Markov clustering algorithm (MCL) based on massive parallel computing in GPU. We use the Compute Unified Device Architecture (CUDA) to allow the GPU to perform parallel sparse matrix-matrix computations and parallel sparse Markov matrix normalizations, which are at the heart of the clustering algorithm. The key to optimizing our CUDA Markov Clustering (CUDAMCL) was utilizing ELLACK-R sparse data format to allow the effective and fine-grain massively parallel processing to cope with the sparse nature of interaction networks datasets in bioinformatics applications. CUDA also allows us to use on-chip memory on the GPU efficiently, to lower the latency time thus circumventing a major issue in other parallel computing environments, such as Message Passing Interface (MPI). Here we describe the GPU algorithm and its application to several real world problems as well as to artificial datasets. We find that the principle factor causing variation in performance of the GPU approach is the relative sparseness of networks. Comparing GPU computation times against a modern quad-core CPU on the published(relatively sparse) standard BIOGRID protein interaction networks with 5156 and 23175 nodes, speed factors of 4times and 9 were obtained, respectively. On the Human Protein Reference Database, the speed of clustering of19599 proteins was improved by a factor of 7 by the GPU algorithm. However, on artificially generated densely connected networks with 1600 to 4800 nodes, speedups by a factor in the range 40 to 120 times were readily obtained. As the results show, in all cases the GPU implementation is significantly faster than the original MCL running on CPU. Such approaches are allowing large-scale parallel computation on off-the-shelf desktop machines that were previously only possible on super-computing architectures, and have the potential to significantly change the way bioinformaticians and biologists compute and interact with their data.","PeriodicalId":31175,"journal":{"name":"Infinity","volume":"58 1","pages":"116-125"},"PeriodicalIF":0.0000,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Infinity","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDMC-HIBI.2010.23","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

Markov clustering is becoming a key algorithm with in bioinformatics for determining clusters in networks. For instance, clustering protein interaction networks is helping find genes implicated in diseases such as cancer. However, with fast sequencing and other technologies generating vast amounts of data on biological networks, performance and scalability issues are becoming a critical limiting factorin applications. Meanwhile, Graphics Processing (GPU)computing, which uses a massively parallel computing environment in the GPU card, is becoming a very powerful, efficient and low cost option to achieve substantial performance gains over CPU approaches. This paper introduces a very fast Markov clustering algorithm (MCL) based on massive parallel computing in GPU. We use the Compute Unified Device Architecture (CUDA) to allow the GPU to perform parallel sparse matrix-matrix computations and parallel sparse Markov matrix normalizations, which are at the heart of the clustering algorithm. The key to optimizing our CUDA Markov Clustering (CUDAMCL) was utilizing ELLACK-R sparse data format to allow the effective and fine-grain massively parallel processing to cope with the sparse nature of interaction networks datasets in bioinformatics applications. CUDA also allows us to use on-chip memory on the GPU efficiently, to lower the latency time thus circumventing a major issue in other parallel computing environments, such as Message Passing Interface (MPI). Here we describe the GPU algorithm and its application to several real world problems as well as to artificial datasets. We find that the principle factor causing variation in performance of the GPU approach is the relative sparseness of networks. Comparing GPU computation times against a modern quad-core CPU on the published(relatively sparse) standard BIOGRID protein interaction networks with 5156 and 23175 nodes, speed factors of 4times and 9 were obtained, respectively. On the Human Protein Reference Database, the speed of clustering of19599 proteins was improved by a factor of 7 by the GPU algorithm. However, on artificially generated densely connected networks with 1600 to 4800 nodes, speedups by a factor in the range 40 to 120 times were readily obtained. As the results show, in all cases the GPU implementation is significantly faster than the original MCL running on CPU. Such approaches are allowing large-scale parallel computation on off-the-shelf desktop machines that were previously only possible on super-computing architectures, and have the potential to significantly change the way bioinformaticians and biologists compute and interact with their data.

查看原文本刊更多论文

基于大规模并行图形处理单元计算的生物信息学快速并行马尔可夫聚类

马尔可夫聚类正在成为生物信息学中确定网络中聚类的关键算法。例如，聚类蛋白质相互作用网络有助于发现与癌症等疾病有关的基因。然而，随着快速测序和其他技术在生物网络上产生大量数据，性能和可扩展性问题正在成为应用的关键限制因素。与此同时，图形处理(GPU)计算，在GPU卡上使用大规模并行计算环境，正在成为一种非常强大、高效和低成本的选择，以实现比CPU方法更大的性能提升。介绍了一种基于GPU大规模并行计算的快速马尔可夫聚类算法。我们使用计算统一设备架构(CUDA)允许GPU执行并行稀疏矩阵-矩阵计算和并行稀疏马尔可夫矩阵归一化，这是聚类算法的核心。优化CUDA马尔可夫聚类(CUDAMCL)的关键是利用ELLACK-R稀疏数据格式进行有效和细粒度的大规模并行处理，以应对生物信息学应用中交互网络数据集的稀疏特性。CUDA还允许我们有效地使用GPU上的片上内存，以降低延迟时间，从而规避其他并行计算环境中的主要问题，例如消息传递接口(MPI)。在这里，我们描述了GPU算法及其在几个现实世界问题和人工数据集上的应用。我们发现导致GPU方法性能变化的主要因素是网络的相对稀疏性。在已发布的(相对稀疏的)标准BIOGRID蛋白质相互作用网络(5156和23175个节点)上，将GPU计算时间与现代四核CPU进行比较，得到的速度因子分别为4倍和9倍。在人类蛋白质参考数据库上，采用GPU算法对19599个蛋白质的聚类速度提高了7倍。然而，在人工生成的具有1600到4800个节点的密集连接网络上，很容易获得40到120倍的速度提升。结果表明，在所有情况下，GPU实现都比在CPU上运行的原始MCL快得多。这些方法允许在现成的台式计算机上进行大规模并行计算，而这些计算以前只能在超级计算架构上实现，并且有可能显著改变生物信息学家和生物学家计算和与数据交互的方式。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊