突破极限:海量数据多分辨率近似的混合并行实现

arXiv: Computation Pub Date : 2019-04-30 DOI:10.5065/nnt6-q689

Huang Huang, Lewis R. Blake, D. Hammerling

{"title":"突破极限:海量数据多分辨率近似的混合并行实现","authors":"Huang Huang, Lewis R. Blake, D. Hammerling","doi":"10.5065/nnt6-q689","DOIUrl":null,"url":null,"abstract":"The multi-resolution approximation (MRA) of Gaussian processes was recently proposed to conduct likelihood-based inference for massive spatial data sets. An advantage of the methodology is that it can be parallelized. We implemented the MRA in C++ for both serial and parallel versions. In the parallel implementation, we use a hybrid parallelism that employs both distributed and shared memory computing for communications between and within nodes by using the Message Passing Interface (MPI) and OpenMP, respectively. The performance of the serial code is compared between the C++ and MATLAB implementations over a small data set on a personal laptop. The C++ parallel program is further carefully studied under different configurations by applications to data sets from around a tenth of a million to 47 million observations. We show the practicality of this implementation by demonstrating that we can get quick inference for massive real-world data sets. The serial and parallel C++ code can be found at this https URL.","PeriodicalId":8446,"journal":{"name":"arXiv: Computation","volume":"68 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Pushing the Limit: A Hybrid Parallel Implementation of the Multi-resolution Approximation for Massive Data\",\"authors\":\"Huang Huang, Lewis R. Blake, D. Hammerling\",\"doi\":\"10.5065/nnt6-q689\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The multi-resolution approximation (MRA) of Gaussian processes was recently proposed to conduct likelihood-based inference for massive spatial data sets. An advantage of the methodology is that it can be parallelized. We implemented the MRA in C++ for both serial and parallel versions. In the parallel implementation, we use a hybrid parallelism that employs both distributed and shared memory computing for communications between and within nodes by using the Message Passing Interface (MPI) and OpenMP, respectively. The performance of the serial code is compared between the C++ and MATLAB implementations over a small data set on a personal laptop. The C++ parallel program is further carefully studied under different configurations by applications to data sets from around a tenth of a million to 47 million observations. We show the practicality of this implementation by demonstrating that we can get quick inference for massive real-world data sets. The serial and parallel C++ code can be found at this https URL.\",\"PeriodicalId\":8446,\"journal\":{\"name\":\"arXiv: Computation\",\"volume\":\"68 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-04-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv: Computation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5065/nnt6-q689\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv: Computation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5065/nnt6-q689","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

近年来提出了高斯过程的多分辨率近似(MRA)，用于对海量空间数据集进行基于似然的推理。这种方法的一个优点是它可以并行化。我们在c++中实现了串行和并行版本的MRA。在并行实现中，我们使用混合并行，分别使用消息传递接口(Message Passing Interface, MPI)和OpenMP，在节点之间和节点内部使用分布式和共享内存计算进行通信。在个人笔记本电脑上的一个小数据集上，比较了c++和MATLAB实现串行代码的性能。c++并行程序在不同的配置下被进一步仔细研究，应用程序的数据集从一百万的十分之一到四千七百万的观测值。我们通过演示我们可以对大量真实数据集进行快速推理来展示这种实现的实用性。串行和并行c++代码可以在这个https URL中找到。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Pushing the Limit: A Hybrid Parallel Implementation of the Multi-resolution Approximation for Massive Data

The multi-resolution approximation (MRA) of Gaussian processes was recently proposed to conduct likelihood-based inference for massive spatial data sets. An advantage of the methodology is that it can be parallelized. We implemented the MRA in C++ for both serial and parallel versions. In the parallel implementation, we use a hybrid parallelism that employs both distributed and shared memory computing for communications between and within nodes by using the Message Passing Interface (MPI) and OpenMP, respectively. The performance of the serial code is compared between the C++ and MATLAB implementations over a small data set on a personal laptop. The C++ parallel program is further carefully studied under different configurations by applications to data sets from around a tenth of a million to 47 million observations. We show the practicality of this implementation by demonstrating that we can get quick inference for massive real-world data sets. The serial and parallel C++ code can be found at this https URL.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv: Computation

自引率

0.00%

发文量