HPSM:一个多cpu和多gpu系统的编程框架

2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW) Pub Date : 2017-10-01 DOI:10.1109/SBAC-PADW.2017.14

J. F. Lima, D. D. Domenico

{"title":"HPSM:一个多cpu和多gpu系统的编程框架","authors":"J. F. Lima, D. D. Domenico","doi":"10.1109/SBAC-PADW.2017.14","DOIUrl":null,"url":null,"abstract":"This paper presents a high-level C++ framework to explore multi-CPU and multi-GPU systems called HPSM. HPSM enables parallel loops and reductions implemented over three parallel backends: Serial, OpenMP (with GCC and libKOMP runtime), and StarPU. We evaluated HPSM development effort with AXPY program, and performance with three parallel benchmarks: N-Body, Hotspot, and CFD solver. The CPU-GPU combination attained better performance than only GPUs for all cases on a CPU-GPU system. Still, our findings provide evidence that NUMA affinity at framework level may produce different results.","PeriodicalId":325990,"journal":{"name":"2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"HPSM: A Programming Framework for Multi-CPU and Multi-GPU Systems\",\"authors\":\"J. F. Lima, D. D. Domenico\",\"doi\":\"10.1109/SBAC-PADW.2017.14\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a high-level C++ framework to explore multi-CPU and multi-GPU systems called HPSM. HPSM enables parallel loops and reductions implemented over three parallel backends: Serial, OpenMP (with GCC and libKOMP runtime), and StarPU. We evaluated HPSM development effort with AXPY program, and performance with three parallel benchmarks: N-Body, Hotspot, and CFD solver. The CPU-GPU combination attained better performance than only GPUs for all cases on a CPU-GPU system. Still, our findings provide evidence that NUMA affinity at framework level may produce different results.\",\"PeriodicalId\":325990,\"journal\":{\"name\":\"2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)\",\"volume\":\"48 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SBAC-PADW.2017.14\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SBAC-PADW.2017.14","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

本文提出了一个高级c++框架，用于探索多cpu和多gpu系统，称为HPSM。HPSM支持在三个并行后端上实现并行循环和缩减:Serial, OpenMP(使用GCC和libKOMP运行时)和StarPU。我们使用AXPY程序评估了HPSM的开发工作，并使用三个并行基准:N-Body、Hotspot和CFD求解器来评估性能。在CPU-GPU系统的所有情况下，CPU-GPU组合都比仅使用gpu获得更好的性能。尽管如此，我们的研究结果提供了证据，表明在框架水平上NUMA的亲和力可能产生不同的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

HPSM: A Programming Framework for Multi-CPU and Multi-GPU Systems

This paper presents a high-level C++ framework to explore multi-CPU and multi-GPU systems called HPSM. HPSM enables parallel loops and reductions implemented over three parallel backends: Serial, OpenMP (with GCC and libKOMP runtime), and StarPU. We evaluated HPSM development effort with AXPY program, and performance with three parallel benchmarks: N-Body, Hotspot, and CFD solver. The CPU-GPU combination attained better performance than only GPUs for all cases on a CPU-GPU system. Still, our findings provide evidence that NUMA affinity at framework level may produce different results.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)

自引率

0.00%

发文量