Parallel 3D deterministic particle transport on Intel MIC architecture

Qinglin Wang, Zuocheng Xing, Jie Liu, X. Qiang, Chunye Gong, Jiang Jiang
{"title":"Parallel 3D deterministic particle transport on Intel MIC architecture","authors":"Qinglin Wang, Zuocheng Xing, Jie Liu, X. Qiang, Chunye Gong, Jiang Jiang","doi":"10.1109/HPCSim.2014.6903685","DOIUrl":null,"url":null,"abstract":"Single-node computation speed is essential in large-scale parallel solutions of particle transport problems. The Intel Many Integrated Core (MIC) architecture supports more than 200 hardware threads as well as 512-bit double precision float-point vector operations. In this paper, we use the native model of MIC in the parallelization of the simulation of one energy group time-independent deterministic discrete ordinates particle transport in 3D Cartesian geometry (Sweep3D). The implementation adopts both hardware threads and vector units in MIC to efficiently exploit multi-level parallelism in the discrete ordinates method when keeping good data locality. Our optimized implementation is verified on target MIC and can provide up to 1.99 times speedup based on the original MPI code on Intel Xeon E5-2660 CPU when flux fixup is off. Compared with the prior on NVIDIA Tesla M2050 GPU, the speedup of up to 1.23 times is obtained. In addition, the difference between the implementations on MIC and GPU is discussed as well.","PeriodicalId":6469,"journal":{"name":"2014 International Conference on High Performance Computing & Simulation (HPCS)","volume":"46 1","pages":"186-192"},"PeriodicalIF":0.0000,"publicationDate":"2014-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on High Performance Computing & Simulation (HPCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCSim.2014.6903685","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Single-node computation speed is essential in large-scale parallel solutions of particle transport problems. The Intel Many Integrated Core (MIC) architecture supports more than 200 hardware threads as well as 512-bit double precision float-point vector operations. In this paper, we use the native model of MIC in the parallelization of the simulation of one energy group time-independent deterministic discrete ordinates particle transport in 3D Cartesian geometry (Sweep3D). The implementation adopts both hardware threads and vector units in MIC to efficiently exploit multi-level parallelism in the discrete ordinates method when keeping good data locality. Our optimized implementation is verified on target MIC and can provide up to 1.99 times speedup based on the original MPI code on Intel Xeon E5-2660 CPU when flux fixup is off. Compared with the prior on NVIDIA Tesla M2050 GPU, the speedup of up to 1.23 times is obtained. In addition, the difference between the implementations on MIC and GPU is discussed as well.
基于Intel MIC架构的并行三维确定性粒子输运
在粒子输运问题的大规模并行解中,单节点计算速度至关重要。Intel多集成核心(MIC)架构支持200多个硬件线程以及512位双精度浮点向量操作。在本文中,我们使用MIC的原生模型来并行化模拟三维笛卡尔几何中一能量群时间无关的确定性离散坐标粒子输运(Sweep3D)。该实现在保证数据局部性的前提下,同时采用了硬件线程和MIC中的矢量单元,有效地利用了离散坐标法中的多级并行性。我们的优化实现在目标MIC上进行了验证,当通量修复关闭时,基于Intel Xeon E5-2660 CPU上的原始MPI代码,可以提供高达1.99倍的加速。与之前的NVIDIA Tesla M2050 GPU相比,获得了高达1.23倍的加速。此外,还讨论了在MIC和GPU上实现的不同之处。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信