将GPU加速添加到基于工业cpu的模拟器，开发策略和结果

Day 1 Tue, October 26, 2021 Pub Date : 2021-10-19 DOI:10.2118/203936-ms

H. Cao, Rustem Zaydullin, Terrence Liao, N. Gohaud, E. Obi, G. Darche

{"title":"将GPU加速添加到基于工业cpu的模拟器，开发策略和结果","authors":"H. Cao, Rustem Zaydullin, Terrence Liao, N. Gohaud, E. Obi, G. Darche","doi":"10.2118/203936-ms","DOIUrl":null,"url":null,"abstract":"\n Running multi-million cell simulation problems in minutes has been a dream for reservoir engineers for decades. Today, with the advancement of Graphic Processing Unit (GPU), we have a real chance to make this dream a reality. Here we present our experience in the step-by-step transformation of a fully developed industrial CPU-based simulator into a fully functional GPU-based simulator. We also demonstrate significant accelerations achieved through the use of GPU technology.\n To achieve the best performance possible, we choose to use CUDA (NVIDIA GPU’s native language), and offload as much computations to GPU as possible. Our CUDA implementation covers all reservoir computes, which include property calculation, linearization, linear solver, etc. The well and Field Management still reside on CPU and need minor changes for their interaction with GPU-based reservoir. Importantly, there is no change to the nonlinear logic. The GPU and CPU parts are overlapped, fully utilizing the asynchronous nature of GPU operations. Each reservoir computation can be run in three modes, CPU_only (existing one), GPU_only, CPU followed by GPU. The latter is only used for result checking and debugging.\n In early 2019, we prototyped two reservoir linearization operations (mass accumulation and mass flux) in CUDA; both showed very strong runtime speed-up of several hundred times, 1 P100-GPU (NVIDIA) vs 1 POWER8NVL CPU core rated at 2.8 GHz (IBM). Encouraged by this success, we moved into linear solver development and managed to move the entire linear solver module into GPU. Again, strong speed-up of ~50 times was achieved (1 GPU vs 1 CPU). The focus for 2019 has been on standard Black-Oil cases. Our implementation was tested with multiple \"million-cell range\" models (SPE10 and other real field cases). In early 2020, we managed to put SPE10 fully on GPU, and finished the entire 2000 day time-stepping in ~35 sec with a single P100 card. After that our effort has switched to compositional AIM (Adaptive Implicit Method), with focus on compositional flash and AIM implementation for reservoir linearization and linear solver, both show early promising results.\n GPU-based reservoir simulation is a future trend for HPC. The development of a reservoir simulator is complex, multi-discipline and time-consuming work. Our paper demonstrates a clear strategy to add tremendous GPU acceleration into an existing CPU-based simulator. Our approach fully utilizes the strength of the existing CPU simulator and minimizes the GPU development effort. This paper is also the first publication targeting GPU acceleration for compositional AIM models.","PeriodicalId":11146,"journal":{"name":"Day 1 Tue, October 26, 2021","volume":"8 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Adding GPU Acceleration to an Industrial CPU-Based Simulator, Development Strategy and Results\",\"authors\":\"H. Cao, Rustem Zaydullin, Terrence Liao, N. Gohaud, E. Obi, G. Darche\",\"doi\":\"10.2118/203936-ms\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n Running multi-million cell simulation problems in minutes has been a dream for reservoir engineers for decades. Today, with the advancement of Graphic Processing Unit (GPU), we have a real chance to make this dream a reality. Here we present our experience in the step-by-step transformation of a fully developed industrial CPU-based simulator into a fully functional GPU-based simulator. We also demonstrate significant accelerations achieved through the use of GPU technology.\\n To achieve the best performance possible, we choose to use CUDA (NVIDIA GPU’s native language), and offload as much computations to GPU as possible. Our CUDA implementation covers all reservoir computes, which include property calculation, linearization, linear solver, etc. The well and Field Management still reside on CPU and need minor changes for their interaction with GPU-based reservoir. Importantly, there is no change to the nonlinear logic. The GPU and CPU parts are overlapped, fully utilizing the asynchronous nature of GPU operations. Each reservoir computation can be run in three modes, CPU_only (existing one), GPU_only, CPU followed by GPU. The latter is only used for result checking and debugging.\\n In early 2019, we prototyped two reservoir linearization operations (mass accumulation and mass flux) in CUDA; both showed very strong runtime speed-up of several hundred times, 1 P100-GPU (NVIDIA) vs 1 POWER8NVL CPU core rated at 2.8 GHz (IBM). Encouraged by this success, we moved into linear solver development and managed to move the entire linear solver module into GPU. Again, strong speed-up of ~50 times was achieved (1 GPU vs 1 CPU). The focus for 2019 has been on standard Black-Oil cases. Our implementation was tested with multiple \\\"million-cell range\\\" models (SPE10 and other real field cases). In early 2020, we managed to put SPE10 fully on GPU, and finished the entire 2000 day time-stepping in ~35 sec with a single P100 card. After that our effort has switched to compositional AIM (Adaptive Implicit Method), with focus on compositional flash and AIM implementation for reservoir linearization and linear solver, both show early promising results.\\n GPU-based reservoir simulation is a future trend for HPC. The development of a reservoir simulator is complex, multi-discipline and time-consuming work. Our paper demonstrates a clear strategy to add tremendous GPU acceleration into an existing CPU-based simulator. Our approach fully utilizes the strength of the existing CPU simulator and minimizes the GPU development effort. This paper is also the first publication targeting GPU acceleration for compositional AIM models.\",\"PeriodicalId\":11146,\"journal\":{\"name\":\"Day 1 Tue, October 26, 2021\",\"volume\":\"8 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Day 1 Tue, October 26, 2021\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2118/203936-ms\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Day 1 Tue, October 26, 2021","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2118/203936-ms","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

几十年来，在几分钟内运行数百万个单元模拟问题一直是油藏工程师的梦想。今天，随着图形处理单元(GPU)的进步，我们有真正的机会让这个梦想成为现实。在这里，我们介绍了我们在逐步将完全开发的基于工业cpu的模拟器转变为功能齐全的基于gpu的模拟器的经验。我们还演示了通过使用GPU技术实现的显著加速。为了实现最佳性能，我们选择使用CUDA (NVIDIA GPU的原生语言)，并尽可能多地将计算卸载到GPU上。我们的CUDA实现涵盖了所有油藏计算，包括属性计算，线性化，线性求解等。井和现场管理仍然驻留在CPU上，需要对其与基于gpu的储层的交互进行微小的更改。重要的是，非线性逻辑没有改变。GPU和CPU部分重叠，充分利用了GPU操作的异步特性。每个储层计算可以在CPU_only(已有)、GPU_only、CPU、GPU三种模式下运行。后者仅用于结果检查和调试。2019年初，我们在CUDA中原型化了两种油藏线性化操作(质量积累和质量通量);1个P100-GPU (NVIDIA) vs 1个2.8 GHz的POWER8NVL CPU核心(IBM)，两者都显示出数百倍的强大运行时加速。受到这一成功的鼓舞，我们转向线性求解器开发，并设法将整个线性求解器模块转移到GPU中。同样，实现了约50倍的强大加速(1个GPU vs 1个CPU)。2019年的重点是标准的黑油案例。我们的实现用多个“百万单元范围”模型(SPE10和其他实际现场案例)进行了测试。在2020年初，我们成功地将SPE10完全放在GPU上，并在35秒内完成了整个2000天的时间步进。之后，我们的工作转向了组合AIM(自适应隐式方法)，重点研究了组合flash和AIM在油藏线性化和线性求解中的实现，两者都显示出了早期有希望的结果。基于gpu的油藏模拟是高性能计算的未来发展趋势。油藏模拟器的开发是一项复杂、多学科、耗时的工作。我们的论文展示了一种清晰的策略，将巨大的GPU加速添加到现有的基于cpu的模拟器中。我们的方法充分利用了现有CPU模拟器的优势，并最大限度地减少了GPU的开发工作量。这篇论文也是第一篇针对合成AIM模型的GPU加速的论文。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Adding GPU Acceleration to an Industrial CPU-Based Simulator, Development Strategy and Results

Running multi-million cell simulation problems in minutes has been a dream for reservoir engineers for decades. Today, with the advancement of Graphic Processing Unit (GPU), we have a real chance to make this dream a reality. Here we present our experience in the step-by-step transformation of a fully developed industrial CPU-based simulator into a fully functional GPU-based simulator. We also demonstrate significant accelerations achieved through the use of GPU technology. To achieve the best performance possible, we choose to use CUDA (NVIDIA GPU’s native language), and offload as much computations to GPU as possible. Our CUDA implementation covers all reservoir computes, which include property calculation, linearization, linear solver, etc. The well and Field Management still reside on CPU and need minor changes for their interaction with GPU-based reservoir. Importantly, there is no change to the nonlinear logic. The GPU and CPU parts are overlapped, fully utilizing the asynchronous nature of GPU operations. Each reservoir computation can be run in three modes, CPU_only (existing one), GPU_only, CPU followed by GPU. The latter is only used for result checking and debugging. In early 2019, we prototyped two reservoir linearization operations (mass accumulation and mass flux) in CUDA; both showed very strong runtime speed-up of several hundred times, 1 P100-GPU (NVIDIA) vs 1 POWER8NVL CPU core rated at 2.8 GHz (IBM). Encouraged by this success, we moved into linear solver development and managed to move the entire linear solver module into GPU. Again, strong speed-up of ~50 times was achieved (1 GPU vs 1 CPU). The focus for 2019 has been on standard Black-Oil cases. Our implementation was tested with multiple "million-cell range" models (SPE10 and other real field cases). In early 2020, we managed to put SPE10 fully on GPU, and finished the entire 2000 day time-stepping in ~35 sec with a single P100 card. After that our effort has switched to compositional AIM (Adaptive Implicit Method), with focus on compositional flash and AIM implementation for reservoir linearization and linear solver, both show early promising results. GPU-based reservoir simulation is a future trend for HPC. The development of a reservoir simulator is complex, multi-discipline and time-consuming work. Our paper demonstrates a clear strategy to add tremendous GPU acceleration into an existing CPU-based simulator. Our approach fully utilizes the strength of the existing CPU simulator and minimizes the GPU development effort. This paper is also the first publication targeting GPU acceleration for compositional AIM models.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Day 1 Tue, October 26, 2021

自引率

0.00%

发文量