{"title":"Parallel Pairwise Correlation Computation on Intel Xeon Phi Clusters","authors":"Yongchao Liu, Tony Pan, S. Aluru","doi":"10.1109/SBAC-PAD.2016.26","DOIUrl":null,"url":null,"abstract":"Co-expression network is a critical technique for the identification of inter-gene interactions, which usually relies on all-pairs correlation (or similar measure) computation between gene expression profiles across multiple samples. Pearson's correlation coefficient (PCC) is one widely used technique for gene co-expression network construction. However, all-pairs PCC computation is computationally demanding for large numbers of gene expression profiles, thus motivating our acceleration of its execution using high-performance computing. In this paper, we present LightPCC, the first parallel and distributed all-pairs PCC computation on Intel Xeon Phi (Phi) clusters. It achieves high speed by exploring the SIMD-instruction-level and thread-level parallelism within Phis as well as accelerator-level parallelism among multiple Phis. To facilitate balanced workload distribution, we have proposed a general framework for symmetric all-pairs computation by building bijective functions between job identifier and coordinate space for the first time. We have evaluated LightPCC and compared it to two CPU-based counterparts: a sequential C++ implementation in ALGLIB and an implementation based on a parallel general matrix-matrix multiplication routine in Intel Math Kernel Library (MKL) (all use double precision), using a set of gene expression datasets. Performance evaluation revealed that with one 5110P Phi and 16 Phis, LightPCC runs up to 20.6× and 218.2× faster than ALGLIB, and up to 6.8× and 71.4× faster than single-threaded MKL, respectively. In addition, LightPCC demonstrated good parallel scalability in terms of number of Phis. Source code of LightPCC is publicly available at http://lightpcc.sourceforge.net.","PeriodicalId":361160,"journal":{"name":"2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"96 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SBAC-PAD.2016.26","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14
Abstract
Co-expression network is a critical technique for the identification of inter-gene interactions, which usually relies on all-pairs correlation (or similar measure) computation between gene expression profiles across multiple samples. Pearson's correlation coefficient (PCC) is one widely used technique for gene co-expression network construction. However, all-pairs PCC computation is computationally demanding for large numbers of gene expression profiles, thus motivating our acceleration of its execution using high-performance computing. In this paper, we present LightPCC, the first parallel and distributed all-pairs PCC computation on Intel Xeon Phi (Phi) clusters. It achieves high speed by exploring the SIMD-instruction-level and thread-level parallelism within Phis as well as accelerator-level parallelism among multiple Phis. To facilitate balanced workload distribution, we have proposed a general framework for symmetric all-pairs computation by building bijective functions between job identifier and coordinate space for the first time. We have evaluated LightPCC and compared it to two CPU-based counterparts: a sequential C++ implementation in ALGLIB and an implementation based on a parallel general matrix-matrix multiplication routine in Intel Math Kernel Library (MKL) (all use double precision), using a set of gene expression datasets. Performance evaluation revealed that with one 5110P Phi and 16 Phis, LightPCC runs up to 20.6× and 218.2× faster than ALGLIB, and up to 6.8× and 71.4× faster than single-threaded MKL, respectively. In addition, LightPCC demonstrated good parallel scalability in terms of number of Phis. Source code of LightPCC is publicly available at http://lightpcc.sourceforge.net.