Parallel Exact Inference on a CPU-GPGPU Heterogenous System

2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI:10.1109/ICPP.2010.15

Hyeran Jeon, Yinglong Xia, V. Prasanna

引用次数: 22

Abstract

Exact inference is a key problem in exploring probabilistic graphical models. The computational complexity of inference increases dramatically with the parameters of the graphical model. To achieve scalability over hundreds of threads remains a fundamental challenge. In this paper, we use a lightweight scheduler hosted by the CPU to allocate cliques in junction trees to the GPGPU at run time. The scheduler merges multiple small cliques or splits large cliques dynamically so as to maximize the utilization of the GPGPU resources. We implement node level primitves on the GPGPU to process the cliques assigned by the CPU. We propose a conflict free potential table organization and an efficient data layout for coalescing memory accesses. In addition, we develop a double buffering based asynchronous data transfer between CPU and GPGPU to overlap clique processing on the GPGPU with data transfer and scheduling activities. Our implementation achieved 30X speedup compared with state-of-the-art multicore processors.

查看原文本刊更多论文

CPU-GPGPU异构系统的并行精确推理

精确推理是探索概率图模型的一个关键问题。随着图形模型参数的增加，推理的计算复杂度显著增加。实现数百个线程的可伸缩性仍然是一个根本性的挑战。在本文中，我们使用由CPU托管的轻量级调度器在运行时将连接树中的团分配给GPGPU。调度程序可以动态合并多个小的clique或拆分大的clique，从而最大限度地利用GPGPU资源。我们在GPGPU上实现节点级原语来处理CPU分配的团。我们提出了一种无冲突的潜在表组织和一种有效的数据布局来合并内存访问。此外，我们在CPU和GPGPU之间开发了一种基于双缓冲的异步数据传输，使GPGPU上的团处理与数据传输和调度活动重叠。与最先进的多核处理器相比，我们的实现实现了30倍的加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2010 39th International Conference on Parallel Processing

自引率

0.00%

发文量