蒙特卡罗光子传输在CPU、GPU和MIC上的异构并发执行

Workshop on Irregular Applications: Architectures and Algorithms Pub Date : 2014-11-16 DOI:10.1109/IA3.2014.11

Noah Wolfe, Tianyu Liu, C. Carothers, X. Xu

{"title":"蒙特卡罗光子传输在CPU、GPU和MIC上的异构并发执行","authors":"Noah Wolfe, Tianyu Liu, C. Carothers, X. Xu","doi":"10.1109/IA3.2014.11","DOIUrl":null,"url":null,"abstract":"In this paper, a new level of heterogeneous concurrent execution of Monte Carlo photon transport is presented. ARCHER, an application for computing radiation dosimetry for CT imaging involving whole-body patient phantoms has been extended to execute on any combination of CPUs, GPUs and MICs concurrently. The goal is for ARCHER to detect and simultaneously utilize all CPU, GPU and MIC processing devices available. Due to the irregular nature of the Monte Carlo photon transport algorithm, a new \"self service\" approach to organizing the heterogeneous device computing has been implemented. This approach efficiently and effectively allows each device to repeatedly grab portions of the domain and compute concurrently until the entire domain has been simulated. New timing benchmarks using various combinations of various Intel and NVIDIA devices are made and presented. A speedup of 13x has been observed when utilizing Intel's Xeon X5650 CPU, Intel's Xeon Phi 5110P MIC and NVIDIA's K40 GPU concurrently versus just the Intel Xeon X5650.","PeriodicalId":208146,"journal":{"name":"Workshop on Irregular Applications: Architectures and Algorithms","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Heterogeneous concurrent execution of Monte Carlo photon transport on CPU, GPU and MIC\",\"authors\":\"Noah Wolfe, Tianyu Liu, C. Carothers, X. Xu\",\"doi\":\"10.1109/IA3.2014.11\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, a new level of heterogeneous concurrent execution of Monte Carlo photon transport is presented. ARCHER, an application for computing radiation dosimetry for CT imaging involving whole-body patient phantoms has been extended to execute on any combination of CPUs, GPUs and MICs concurrently. The goal is for ARCHER to detect and simultaneously utilize all CPU, GPU and MIC processing devices available. Due to the irregular nature of the Monte Carlo photon transport algorithm, a new \\\"self service\\\" approach to organizing the heterogeneous device computing has been implemented. This approach efficiently and effectively allows each device to repeatedly grab portions of the domain and compute concurrently until the entire domain has been simulated. New timing benchmarks using various combinations of various Intel and NVIDIA devices are made and presented. A speedup of 13x has been observed when utilizing Intel's Xeon X5650 CPU, Intel's Xeon Phi 5110P MIC and NVIDIA's K40 GPU concurrently versus just the Intel Xeon X5650.\",\"PeriodicalId\":208146,\"journal\":{\"name\":\"Workshop on Irregular Applications: Architectures and Algorithms\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-11-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Workshop on Irregular Applications: Architectures and Algorithms\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IA3.2014.11\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop on Irregular Applications: Architectures and Algorithms","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IA3.2014.11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

本文提出了一种新的蒙特卡罗光子传输异构并发执行的方法。ARCHER是一款用于计算CT成像辐射剂量测定的应用程序，涉及患者的全身幻影，现已扩展到可以同时在cpu、gpu和mic的任何组合上执行。ARCHER的目标是检测并同时利用所有可用的CPU、GPU和MIC处理设备。由于蒙特卡罗光子传输算法的不规则性，一种新的“自服务”方法被实现来组织异构设备计算。这种方法高效且有效地允许每个设备重复抓取部分域并并发计算，直到整个域被模拟。使用各种英特尔和NVIDIA设备的各种组合制作并展示了新的时序基准。与仅使用英特尔至强X5650相比，同时使用英特尔至强X5650 CPU，英特尔至强Phi 5110P MIC和NVIDIA的K40 GPU时，速度提高了13倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Heterogeneous concurrent execution of Monte Carlo photon transport on CPU, GPU and MIC

In this paper, a new level of heterogeneous concurrent execution of Monte Carlo photon transport is presented. ARCHER, an application for computing radiation dosimetry for CT imaging involving whole-body patient phantoms has been extended to execute on any combination of CPUs, GPUs and MICs concurrently. The goal is for ARCHER to detect and simultaneously utilize all CPU, GPU and MIC processing devices available. Due to the irregular nature of the Monte Carlo photon transport algorithm, a new "self service" approach to organizing the heterogeneous device computing has been implemented. This approach efficiently and effectively allows each device to repeatedly grab portions of the domain and compute concurrently until the entire domain has been simulated. New timing benchmarks using various combinations of various Intel and NVIDIA devices are made and presented. A speedup of 13x has been observed when utilizing Intel's Xeon X5650 CPU, Intel's Xeon Phi 5110P MIC and NVIDIA's K40 GPU concurrently versus just the Intel Xeon X5650.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Workshop on Irregular Applications: Architectures and Algorithms

自引率

0.00%

发文量