Parallel Pairwise Correlation Computation on Intel Xeon Phi Clusters

2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) Pub Date : 2016-05-05 DOI:10.1109/SBAC-PAD.2016.26

Yongchao Liu, Tony Pan, S. Aluru

{"title":"Parallel Pairwise Correlation Computation on Intel Xeon Phi Clusters","authors":"Yongchao Liu, Tony Pan, S. Aluru","doi":"10.1109/SBAC-PAD.2016.26","DOIUrl":null,"url":null,"abstract":"Co-expression network is a critical technique for the identification of inter-gene interactions, which usually relies on all-pairs correlation (or similar measure) computation between gene expression profiles across multiple samples. Pearson's correlation coefficient (PCC) is one widely used technique for gene co-expression network construction. However, all-pairs PCC computation is computationally demanding for large numbers of gene expression profiles, thus motivating our acceleration of its execution using high-performance computing. In this paper, we present LightPCC, the first parallel and distributed all-pairs PCC computation on Intel Xeon Phi (Phi) clusters. It achieves high speed by exploring the SIMD-instruction-level and thread-level parallelism within Phis as well as accelerator-level parallelism among multiple Phis. To facilitate balanced workload distribution, we have proposed a general framework for symmetric all-pairs computation by building bijective functions between job identifier and coordinate space for the first time. We have evaluated LightPCC and compared it to two CPU-based counterparts: a sequential C++ implementation in ALGLIB and an implementation based on a parallel general matrix-matrix multiplication routine in Intel Math Kernel Library (MKL) (all use double precision), using a set of gene expression datasets. Performance evaluation revealed that with one 5110P Phi and 16 Phis, LightPCC runs up to 20.6× and 218.2× faster than ALGLIB, and up to 6.8× and 71.4× faster than single-threaded MKL, respectively. In addition, LightPCC demonstrated good parallel scalability in terms of number of Phis. Source code of LightPCC is publicly available at http://lightpcc.sourceforge.net.","PeriodicalId":361160,"journal":{"name":"2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"96 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SBAC-PAD.2016.26","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 14

Abstract

Co-expression network is a critical technique for the identification of inter-gene interactions, which usually relies on all-pairs correlation (or similar measure) computation between gene expression profiles across multiple samples. Pearson's correlation coefficient (PCC) is one widely used technique for gene co-expression network construction. However, all-pairs PCC computation is computationally demanding for large numbers of gene expression profiles, thus motivating our acceleration of its execution using high-performance computing. In this paper, we present LightPCC, the first parallel and distributed all-pairs PCC computation on Intel Xeon Phi (Phi) clusters. It achieves high speed by exploring the SIMD-instruction-level and thread-level parallelism within Phis as well as accelerator-level parallelism among multiple Phis. To facilitate balanced workload distribution, we have proposed a general framework for symmetric all-pairs computation by building bijective functions between job identifier and coordinate space for the first time. We have evaluated LightPCC and compared it to two CPU-based counterparts: a sequential C++ implementation in ALGLIB and an implementation based on a parallel general matrix-matrix multiplication routine in Intel Math Kernel Library (MKL) (all use double precision), using a set of gene expression datasets. Performance evaluation revealed that with one 5110P Phi and 16 Phis, LightPCC runs up to 20.6× and 218.2× faster than ALGLIB, and up to 6.8× and 71.4× faster than single-threaded MKL, respectively. In addition, LightPCC demonstrated good parallel scalability in terms of number of Phis. Source code of LightPCC is publicly available at http://lightpcc.sourceforge.net.

查看原文本刊更多论文

基于Intel Xeon Phi集群的并行两两相关计算

共表达网络是识别基因间相互作用的关键技术，它通常依赖于多个样本基因表达谱之间的全对相关(或类似测量)计算。皮尔逊相关系数(Pearson’s correlation coefficient, PCC)是目前广泛应用于基因共表达网络构建的技术之一。然而，全对PCC计算对大量基因表达谱的计算要求很高，因此促使我们使用高性能计算来加速其执行。在本文中，我们提出了LightPCC，这是第一个在Intel Xeon Phi (Phi)集群上并行和分布式的全对PCC计算。该算法通过探索并行处理器内部的simd指令级和线程级并行性以及多个并行处理器之间的加速器级并行性来实现高速。为了实现工作负载的均衡分配，首次在作业标识符和坐标空间之间建立双目标函数，提出了一种对称全对计算的通用框架。我们已经对LightPCC进行了评估，并将其与两个基于cpu的对等物进行了比较:在ALGLIB中的顺序c++实现和基于英特尔数学内核库(MKL)中的并行一般矩阵-矩阵乘法例程的实现(都使用双精度)，使用一组基因表达数据集。性能评估显示，在1个5110P Phi和16个Phi的情况下，LightPCC的运行速度分别比ALGLIB快20.6倍和218.2倍，比单线程MKL快6.8倍和71.4倍。此外，LightPCC在phi数量方面表现出良好的并行可扩展性。LightPCC的源代码可在http://lightpcc.sourceforge.net公开获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)

自引率

0.00%

发文量