Parallel Exact Inference on a CPU-GPGPU Heterogenous System

Hyeran Jeon, Yinglong Xia, V. Prasanna
{"title":"Parallel Exact Inference on a CPU-GPGPU Heterogenous System","authors":"Hyeran Jeon, Yinglong Xia, V. Prasanna","doi":"10.1109/ICPP.2010.15","DOIUrl":null,"url":null,"abstract":"Exact inference is a key problem in exploring probabilistic graphical models. The computational complexity of inference increases dramatically with the parameters of the graphical model. To achieve scalability over hundreds of threads remains a fundamental challenge. In this paper, we use a lightweight scheduler hosted by the CPU to allocate cliques in junction trees to the GPGPU at run time. The scheduler merges multiple small cliques or splits large cliques dynamically so as to maximize the utilization of the GPGPU resources. We implement node level primitves on the GPGPU to process the cliques assigned by the CPU. We propose a conflict free potential table organization and an efficient data layout for coalescing memory accesses. In addition, we develop a double buffering based asynchronous data transfer between CPU and GPGPU to overlap clique processing on the GPGPU with data transfer and scheduling activities. Our implementation achieved 30X speedup compared with state-of-the-art multicore processors.","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 39th International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPP.2010.15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 22

Abstract

Exact inference is a key problem in exploring probabilistic graphical models. The computational complexity of inference increases dramatically with the parameters of the graphical model. To achieve scalability over hundreds of threads remains a fundamental challenge. In this paper, we use a lightweight scheduler hosted by the CPU to allocate cliques in junction trees to the GPGPU at run time. The scheduler merges multiple small cliques or splits large cliques dynamically so as to maximize the utilization of the GPGPU resources. We implement node level primitves on the GPGPU to process the cliques assigned by the CPU. We propose a conflict free potential table organization and an efficient data layout for coalescing memory accesses. In addition, we develop a double buffering based asynchronous data transfer between CPU and GPGPU to overlap clique processing on the GPGPU with data transfer and scheduling activities. Our implementation achieved 30X speedup compared with state-of-the-art multicore processors.
CPU-GPGPU异构系统的并行精确推理
精确推理是探索概率图模型的一个关键问题。随着图形模型参数的增加,推理的计算复杂度显著增加。实现数百个线程的可伸缩性仍然是一个根本性的挑战。在本文中,我们使用由CPU托管的轻量级调度器在运行时将连接树中的团分配给GPGPU。调度程序可以动态合并多个小的clique或拆分大的clique,从而最大限度地利用GPGPU资源。我们在GPGPU上实现节点级原语来处理CPU分配的团。我们提出了一种无冲突的潜在表组织和一种有效的数据布局来合并内存访问。此外,我们在CPU和GPGPU之间开发了一种基于双缓冲的异步数据传输,使GPGPU上的团处理与数据传输和调度活动重叠。与最先进的多核处理器相比,我们的实现实现了30倍的加速。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信