Multi- and many-core data mining with adaptive sparse grids

ACM International Conference on Computing Frontiers Pub Date : 2011-05-03 DOI:10.1145/2016604.2016640

A. Heinecke, D. Pflüger

{"title":"Multi- and many-core data mining with adaptive sparse grids","authors":"A. Heinecke, D. Pflüger","doi":"10.1145/2016604.2016640","DOIUrl":null,"url":null,"abstract":"Gaining knowledge out of vast datasets is a main challenge in data-driven applications nowadays. Sparse grids provide a numerical method for both classification and regression in data mining which scales only linearly in the number of data points and is thus well-suited for huge amounts of data. Due to the recursive nature of sparse grid algorithms, they impose a challenge for the parallelization on modern hardware architectures such as accelerators. In this paper, we present the parallelization on several current task- and data-parallel platforms, covering multi-core CPUs with vector units, GPUs, and hybrid systems. Furthermore, we analyze the suitability of parallel programming languages for the implementation.\n Considering hardware, we restrict ourselves to the x86 platform with SSE and AVX vector extensions and to NVIDIA's Fermi architecture for GPUs. We consider both multi-core CPU and GPU architectures independently, as well as hybrid systems with up to 12 cores and 2 Fermi GPUs. With respect to parallel programming, we examine both the open standard OpenCL and Intel Array Building Blocks, a recently introduced high-level programming approach. As the baseline, we use the best results obtained with classically parallelized sparse grid algorithms and their OpenMP-parallelized intrinsics counterpart (SSE and AVX instructions), reporting both single and double precision measurements. The huge data sets we use are a real-life dataset stemming from astrophysics and an artificial one which exhibits challenging properties. In all settings, we achieve excellent results, obtaining speedups of more than 60 using single precision on a hybrid system.","PeriodicalId":430420,"journal":{"name":"ACM International Conference on Computing Frontiers","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"31","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM International Conference on Computing Frontiers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2016604.2016640","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 31

Abstract

Gaining knowledge out of vast datasets is a main challenge in data-driven applications nowadays. Sparse grids provide a numerical method for both classification and regression in data mining which scales only linearly in the number of data points and is thus well-suited for huge amounts of data. Due to the recursive nature of sparse grid algorithms, they impose a challenge for the parallelization on modern hardware architectures such as accelerators. In this paper, we present the parallelization on several current task- and data-parallel platforms, covering multi-core CPUs with vector units, GPUs, and hybrid systems. Furthermore, we analyze the suitability of parallel programming languages for the implementation. Considering hardware, we restrict ourselves to the x86 platform with SSE and AVX vector extensions and to NVIDIA's Fermi architecture for GPUs. We consider both multi-core CPU and GPU architectures independently, as well as hybrid systems with up to 12 cores and 2 Fermi GPUs. With respect to parallel programming, we examine both the open standard OpenCL and Intel Array Building Blocks, a recently introduced high-level programming approach. As the baseline, we use the best results obtained with classically parallelized sparse grid algorithms and their OpenMP-parallelized intrinsics counterpart (SSE and AVX instructions), reporting both single and double precision measurements. The huge data sets we use are a real-life dataset stemming from astrophysics and an artificial one which exhibits challenging properties. In all settings, we achieve excellent results, obtaining speedups of more than 60 using single precision on a hybrid system.

查看原文本刊更多论文

基于自适应稀疏网格的多核和多核数据挖掘

从庞大的数据集中获取知识是当今数据驱动应用程序的主要挑战。稀疏网格为数据挖掘中的分类和回归提供了一种数值方法，该方法仅在数据点数量上线性扩展，因此非常适合于大量数据。由于稀疏网格算法的递归特性，它们对现代硬件架构(如加速器)的并行化提出了挑战。在本文中，我们介绍了当前几个任务和数据并行平台上的并行化，包括带有矢量单元的多核cpu, gpu和混合系统。此外，我们还分析了并行编程语言对实现的适用性。考虑到硬件，我们将自己限制在带有SSE和AVX矢量扩展的x86平台上，并将gpu限制在NVIDIA的Fermi架构上。我们独立考虑多核CPU和GPU架构，以及多达12核和2个费米GPU的混合系统。关于并行编程，我们研究了开放标准OpenCL和英特尔阵列构建块，这是最近引入的高级编程方法。作为基线，我们使用经典并行化稀疏网格算法及其openmp并行化的内在对应(SSE和AVX指令)获得的最佳结果，报告单精度和双精度测量。我们使用的庞大数据集是来自天体物理学的真实数据集和具有挑战性特性的人工数据集。在所有设置中，我们都取得了出色的结果，在混合系统上使用单精度获得了超过60的加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM International Conference on Computing Frontiers

自引率

0.00%

发文量