并行编程技术在航天器飞行后轨迹重建蒙特卡罗仿真中的应用研究

Robert A. Williams, Justin S. Green
{"title":"并行编程技术在航天器飞行后轨迹重建蒙特卡罗仿真中的应用研究","authors":"Robert A. Williams, Justin S. Green","doi":"10.2514/6.2018-3431","DOIUrl":null,"url":null,"abstract":"Parallelizing software to execute on multi-core central processing units (CPUs) and graphics processing units (GPUs) can be challenging. For some fields outside of Computer Science, this transition comes with new issues. For example, memory limitations can require modifications to code not initially developed to run on GPUs. This work applies the Open Multi-Processing (OpenMP) and Open Accelerators (OpenACC) directive-based parallelization strategies on a Monte Carlo simulation approach for trajectory reconstruction enabling it to run on multi-core CPUs and GPUs. Large matrix operations are the most common use of GPUs, which are not present in this algorithm; however, the natural parallelism of independent trajectories in Monte Carlo simulations is exploited. Benchmarking data are presented comparing execution times of the software for single-thread CPUs, multi-thread CPUs with OpenMP, and multi-thread GPUs using OpenACC. These data were collected using nodes with Intel ® Xeon ® E5-2670 (Sandy Bridge) CPUs enhanced with NVIDIA ® Tesla ® K40 GPUs on the Pleiades Supercomputer cluster at the National Aeronautics and Space Administration (NASA) Ames Research Center (ARC) and a local Intel ® Xeon Phi ™ node at NASA Langley Research Center (LaRC). and orientation), and integrates the inertial measurement unit (IMU) data to determine the vehicle states throughout its flight. Lugo et al. 1 developed a Monte Carlo based approach for trajectory reconstruction that incorporated the vehicle’s final state information and introduces statistics. This method decreases uncertainties in the reconstruction results, which improves model validations and post-flight analysis. However, this Monte Carlo approach requires the integration of several thousand trajectories. These calculations are time consuming when executed serially, but the execution time can be decreased by utilizing concurrent computation. This paper examines the use of parallel programming techniques on an algorithm that applies inertial navigation to trajectory reconstruction in a Monte Carlo dispersion process. The two parallel programming techniques being utilized are OpenMP and OpenACC, which are used on multi-core CPUs and GPUs, respectively. Two studies are conducted to determine optimal performance based on thread count with OpenMP and register per thread for OpenACC. Additionally, comparisons are shown between three different compilers and three different types of hardware. or V100, will tested in future work.","PeriodicalId":326346,"journal":{"name":"2018 Modeling and Simulation Technologies Conference","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Investigation of Parallel Programming Techniques Applied to Monte Carlo Simulations for Post-Flight Reconstruction of Spacecraft Trajectory\",\"authors\":\"Robert A. Williams, Justin S. Green\",\"doi\":\"10.2514/6.2018-3431\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Parallelizing software to execute on multi-core central processing units (CPUs) and graphics processing units (GPUs) can be challenging. For some fields outside of Computer Science, this transition comes with new issues. For example, memory limitations can require modifications to code not initially developed to run on GPUs. This work applies the Open Multi-Processing (OpenMP) and Open Accelerators (OpenACC) directive-based parallelization strategies on a Monte Carlo simulation approach for trajectory reconstruction enabling it to run on multi-core CPUs and GPUs. Large matrix operations are the most common use of GPUs, which are not present in this algorithm; however, the natural parallelism of independent trajectories in Monte Carlo simulations is exploited. Benchmarking data are presented comparing execution times of the software for single-thread CPUs, multi-thread CPUs with OpenMP, and multi-thread GPUs using OpenACC. These data were collected using nodes with Intel ® Xeon ® E5-2670 (Sandy Bridge) CPUs enhanced with NVIDIA ® Tesla ® K40 GPUs on the Pleiades Supercomputer cluster at the National Aeronautics and Space Administration (NASA) Ames Research Center (ARC) and a local Intel ® Xeon Phi ™ node at NASA Langley Research Center (LaRC). and orientation), and integrates the inertial measurement unit (IMU) data to determine the vehicle states throughout its flight. Lugo et al. 1 developed a Monte Carlo based approach for trajectory reconstruction that incorporated the vehicle’s final state information and introduces statistics. This method decreases uncertainties in the reconstruction results, which improves model validations and post-flight analysis. However, this Monte Carlo approach requires the integration of several thousand trajectories. These calculations are time consuming when executed serially, but the execution time can be decreased by utilizing concurrent computation. This paper examines the use of parallel programming techniques on an algorithm that applies inertial navigation to trajectory reconstruction in a Monte Carlo dispersion process. The two parallel programming techniques being utilized are OpenMP and OpenACC, which are used on multi-core CPUs and GPUs, respectively. Two studies are conducted to determine optimal performance based on thread count with OpenMP and register per thread for OpenACC. Additionally, comparisons are shown between three different compilers and three different types of hardware. or V100, will tested in future work.\",\"PeriodicalId\":326346,\"journal\":{\"name\":\"2018 Modeling and Simulation Technologies Conference\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-06-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 Modeling and Simulation Technologies Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2514/6.2018-3431\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 Modeling and Simulation Technologies Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2514/6.2018-3431","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在多核中央处理单元(cpu)和图形处理单元(gpu)上并行执行软件可能具有挑战性。对于计算机科学以外的一些领域来说,这种转变带来了新的问题。例如,内存限制可能需要修改最初不是为在gpu上运行而开发的代码。这项工作将基于Open Multi-Processing (OpenMP)和Open Accelerators (OpenACC)指令的并行化策略应用于轨迹重建的蒙特卡罗模拟方法,使其能够在多核cpu和gpu上运行。大矩阵运算是gpu最常见的使用,而在本算法中不存在;然而,在蒙特卡罗模拟中,独立轨迹的自然并行性被利用。给出了软件在单线程cpu、使用OpenMP的多线程cpu和使用OpenACC的多线程gpu上的执行时间的基准测试数据。这些数据是在美国国家航空航天局(NASA)艾姆斯研究中心(ARC)的Pleiades超级计算机集群上使用Intel®Xeon®E5-2670 (Sandy Bridge) cpu和NVIDIA®Tesla®K40 gpu增强的节点和NASA兰利研究中心(LaRC)的本地Intel®Xeon Phi™节点收集的。和方向),并集成惯性测量单元(IMU)数据来确定飞行器在整个飞行过程中的状态。Lugo等人1开发了一种基于蒙特卡罗的轨迹重建方法,该方法结合了车辆的最终状态信息并引入了统计信息。该方法减少了重建结果中的不确定性,提高了模型验证和飞后分析的质量。然而,这种蒙特卡罗方法需要对几千个轨迹进行积分。这些计算在串行执行时非常耗时,但是通过使用并发计算可以减少执行时间。本文研究了在蒙特卡罗色散过程中应用惯性导航进行轨迹重建的算法上使用并行编程技术。所使用的两种并行编程技术是OpenMP和OpenACC,它们分别用于多核cpu和gpu。为了确定基于OpenMP的线程数和OpenACC的每线程寄存器的最佳性能,进行了两项研究。此外,还比较了三种不同的编译器和三种不同类型的硬件。或V100,将在未来的工作中进行测试。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An Investigation of Parallel Programming Techniques Applied to Monte Carlo Simulations for Post-Flight Reconstruction of Spacecraft Trajectory
Parallelizing software to execute on multi-core central processing units (CPUs) and graphics processing units (GPUs) can be challenging. For some fields outside of Computer Science, this transition comes with new issues. For example, memory limitations can require modifications to code not initially developed to run on GPUs. This work applies the Open Multi-Processing (OpenMP) and Open Accelerators (OpenACC) directive-based parallelization strategies on a Monte Carlo simulation approach for trajectory reconstruction enabling it to run on multi-core CPUs and GPUs. Large matrix operations are the most common use of GPUs, which are not present in this algorithm; however, the natural parallelism of independent trajectories in Monte Carlo simulations is exploited. Benchmarking data are presented comparing execution times of the software for single-thread CPUs, multi-thread CPUs with OpenMP, and multi-thread GPUs using OpenACC. These data were collected using nodes with Intel ® Xeon ® E5-2670 (Sandy Bridge) CPUs enhanced with NVIDIA ® Tesla ® K40 GPUs on the Pleiades Supercomputer cluster at the National Aeronautics and Space Administration (NASA) Ames Research Center (ARC) and a local Intel ® Xeon Phi ™ node at NASA Langley Research Center (LaRC). and orientation), and integrates the inertial measurement unit (IMU) data to determine the vehicle states throughout its flight. Lugo et al. 1 developed a Monte Carlo based approach for trajectory reconstruction that incorporated the vehicle’s final state information and introduces statistics. This method decreases uncertainties in the reconstruction results, which improves model validations and post-flight analysis. However, this Monte Carlo approach requires the integration of several thousand trajectories. These calculations are time consuming when executed serially, but the execution time can be decreased by utilizing concurrent computation. This paper examines the use of parallel programming techniques on an algorithm that applies inertial navigation to trajectory reconstruction in a Monte Carlo dispersion process. The two parallel programming techniques being utilized are OpenMP and OpenACC, which are used on multi-core CPUs and GPUs, respectively. Two studies are conducted to determine optimal performance based on thread count with OpenMP and register per thread for OpenACC. Additionally, comparisons are shown between three different compilers and three different types of hardware. or V100, will tested in future work.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信