并行编程技术在航天器飞行后轨迹重建蒙特卡罗仿真中的应用研究

2018 Modeling and Simulation Technologies Conference Pub Date : 2018-06-24 DOI:10.2514/6.2018-3431

Robert A. Williams, Justin S. Green

{"title":"并行编程技术在航天器飞行后轨迹重建蒙特卡罗仿真中的应用研究","authors":"Robert A. Williams, Justin S. Green","doi":"10.2514/6.2018-3431","DOIUrl":null,"url":null,"abstract":"Parallelizing software to execute on multi-core central processing units (CPUs) and graphics processing units (GPUs) can be challenging. For some fields outside of Computer Science, this transition comes with new issues. For example, memory limitations can require modifications to code not initially developed to run on GPUs. This work applies the Open Multi-Processing (OpenMP) and Open Accelerators (OpenACC) directive-based parallelization strategies on a Monte Carlo simulation approach for trajectory reconstruction enabling it to run on multi-core CPUs and GPUs. Large matrix operations are the most common use of GPUs, which are not present in this algorithm; however, the natural parallelism of independent trajectories in Monte Carlo simulations is exploited. Benchmarking data are presented comparing execution times of the software for single-thread CPUs, multi-thread CPUs with OpenMP, and multi-thread GPUs using OpenACC. These data were collected using nodes with Intel ® Xeon ® E5-2670 (Sandy Bridge) CPUs enhanced with NVIDIA ® Tesla ® K40 GPUs on the Pleiades Supercomputer cluster at the National Aeronautics and Space Administration (NASA) Ames Research Center (ARC) and a local Intel ® Xeon Phi ™ node at NASA Langley Research Center (LaRC). and orientation), and integrates the inertial measurement unit (IMU) data to determine the vehicle states throughout its flight. Lugo et al. 1 developed a Monte Carlo based approach for trajectory reconstruction that incorporated the vehicle’s final state information and introduces statistics. This method decreases uncertainties in the reconstruction results, which improves model validations and post-flight analysis. However, this Monte Carlo approach requires the integration of several thousand trajectories. These calculations are time consuming when executed serially, but the execution time can be decreased by utilizing concurrent computation. This paper examines the use of parallel programming techniques on an algorithm that applies inertial navigation to trajectory reconstruction in a Monte Carlo dispersion process. The two parallel programming techniques being utilized are OpenMP and OpenACC, which are used on multi-core CPUs and GPUs, respectively. Two studies are conducted to determine optimal performance based on thread count with OpenMP and register per thread for OpenACC. Additionally, comparisons are shown between three different compilers and three different types of hardware. or V100, will tested in future work.","PeriodicalId":326346,"journal":{"name":"2018 Modeling and Simulation Technologies Conference","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Investigation of Parallel Programming Techniques Applied to Monte Carlo Simulations for Post-Flight Reconstruction of Spacecraft Trajectory\",\"authors\":\"Robert A. Williams, Justin S. Green\",\"doi\":\"10.2514/6.2018-3431\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Parallelizing software to execute on multi-core central processing units (CPUs) and graphics processing units (GPUs) can be challenging. For some fields outside of Computer Science, this transition comes with new issues. For example, memory limitations can require modifications to code not initially developed to run on GPUs. This work applies the Open Multi-Processing (OpenMP) and Open Accelerators (OpenACC) directive-based parallelization strategies on a Monte Carlo simulation approach for trajectory reconstruction enabling it to run on multi-core CPUs and GPUs. Large matrix operations are the most common use of GPUs, which are not present in this algorithm; however, the natural parallelism of independent trajectories in Monte Carlo simulations is exploited. Benchmarking data are presented comparing execution times of the software for single-thread CPUs, multi-thread CPUs with OpenMP, and multi-thread GPUs using OpenACC. These data were collected using nodes with Intel ® Xeon ® E5-2670 (Sandy Bridge) CPUs enhanced with NVIDIA ® Tesla ® K40 GPUs on the Pleiades Supercomputer cluster at the National Aeronautics and Space Administration (NASA) Ames Research Center (ARC) and a local Intel ® Xeon Phi ™ node at NASA Langley Research Center (LaRC). and orientation), and integrates the inertial measurement unit (IMU) data to determine the vehicle states throughout its flight. Lugo et al. 1 developed a Monte Carlo based approach for trajectory reconstruction that incorporated the vehicle’s final state information and introduces statistics. This method decreases uncertainties in the reconstruction results, which improves model validations and post-flight analysis. However, this Monte Carlo approach requires the integration of several thousand trajectories. These calculations are time consuming when executed serially, but the execution time can be decreased by utilizing concurrent computation. This paper examines the use of parallel programming techniques on an algorithm that applies inertial navigation to trajectory reconstruction in a Monte Carlo dispersion process. The two parallel programming techniques being utilized are OpenMP and OpenACC, which are used on multi-core CPUs and GPUs, respectively. Two studies are conducted to determine optimal performance based on thread count with OpenMP and register per thread for OpenACC. Additionally, comparisons are shown between three different compilers and three different types of hardware. or V100, will tested in future work.\",\"PeriodicalId\":326346,\"journal\":{\"name\":\"2018 Modeling and Simulation Technologies Conference\",\"volume\":\"54 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-06-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 Modeling and Simulation Technologies Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2514/6.2018-3431\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 Modeling and Simulation Technologies Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2514/6.2018-3431","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在多核中央处理单元(cpu)和图形处理单元(gpu)上并行执行软件可能具有挑战性。对于计算机科学以外的一些领域来说，这种转变带来了新的问题。例如，内存限制可能需要修改最初不是为在gpu上运行而开发的代码。这项工作将基于Open Multi-Processing (OpenMP)和Open Accelerators (OpenACC)指令的并行化策略应用于轨迹重建的蒙特卡罗模拟方法，使其能够在多核cpu和gpu上运行。大矩阵运算是gpu最常见的使用，而在本算法中不存在;然而，在蒙特卡罗模拟中，独立轨迹的自然并行性被利用。给出了软件在单线程cpu、使用OpenMP的多线程cpu和使用OpenACC的多线程gpu上的执行时间的基准测试数据。这些数据是在美国国家航空航天局(NASA)艾姆斯研究中心(ARC)的Pleiades超级计算机集群上使用Intel®Xeon®E5-2670 (Sandy Bridge) cpu和NVIDIA®Tesla®K40 gpu增强的节点和NASA兰利研究中心(LaRC)的本地Intel®Xeon Phi™节点收集的。和方向)，并集成惯性测量单元(IMU)数据来确定飞行器在整个飞行过程中的状态。Lugo等人1开发了一种基于蒙特卡罗的轨迹重建方法，该方法结合了车辆的最终状态信息并引入了统计信息。该方法减少了重建结果中的不确定性，提高了模型验证和飞后分析的质量。然而，这种蒙特卡罗方法需要对几千个轨迹进行积分。这些计算在串行执行时非常耗时，但是通过使用并发计算可以减少执行时间。本文研究了在蒙特卡罗色散过程中应用惯性导航进行轨迹重建的算法上使用并行编程技术。所使用的两种并行编程技术是OpenMP和OpenACC，它们分别用于多核cpu和gpu。为了确定基于OpenMP的线程数和OpenACC的每线程寄存器的最佳性能，进行了两项研究。此外，还比较了三种不同的编译器和三种不同类型的硬件。或V100，将在未来的工作中进行测试。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An Investigation of Parallel Programming Techniques Applied to Monte Carlo Simulations for Post-Flight Reconstruction of Spacecraft Trajectory

Parallelizing software to execute on multi-core central processing units (CPUs) and graphics processing units (GPUs) can be challenging. For some fields outside of Computer Science, this transition comes with new issues. For example, memory limitations can require modifications to code not initially developed to run on GPUs. This work applies the Open Multi-Processing (OpenMP) and Open Accelerators (OpenACC) directive-based parallelization strategies on a Monte Carlo simulation approach for trajectory reconstruction enabling it to run on multi-core CPUs and GPUs. Large matrix operations are the most common use of GPUs, which are not present in this algorithm; however, the natural parallelism of independent trajectories in Monte Carlo simulations is exploited. Benchmarking data are presented comparing execution times of the software for single-thread CPUs, multi-thread CPUs with OpenMP, and multi-thread GPUs using OpenACC. These data were collected using nodes with Intel ® Xeon ® E5-2670 (Sandy Bridge) CPUs enhanced with NVIDIA ® Tesla ® K40 GPUs on the Pleiades Supercomputer cluster at the National Aeronautics and Space Administration (NASA) Ames Research Center (ARC) and a local Intel ® Xeon Phi ™ node at NASA Langley Research Center (LaRC). and orientation), and integrates the inertial measurement unit (IMU) data to determine the vehicle states throughout its flight. Lugo et al. 1 developed a Monte Carlo based approach for trajectory reconstruction that incorporated the vehicle’s final state information and introduces statistics. This method decreases uncertainties in the reconstruction results, which improves model validations and post-flight analysis. However, this Monte Carlo approach requires the integration of several thousand trajectories. These calculations are time consuming when executed serially, but the execution time can be decreased by utilizing concurrent computation. This paper examines the use of parallel programming techniques on an algorithm that applies inertial navigation to trajectory reconstruction in a Monte Carlo dispersion process. The two parallel programming techniques being utilized are OpenMP and OpenACC, which are used on multi-core CPUs and GPUs, respectively. Two studies are conducted to determine optimal performance based on thread count with OpenMP and register per thread for OpenACC. Additionally, comparisons are shown between three different compilers and three different types of hardware. or V100, will tested in future work.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 Modeling and Simulation Technologies Conference

自引率

0.00%

发文量