{"title":"A Massively Parallel Restriction-Smoothed Basis Multiscale Solver on Multi-Core and GPU Architectures","authors":"A. Manea","doi":"10.2118/203939-ms","DOIUrl":null,"url":null,"abstract":"\n Due to its simplicity, adaptability, and applicability to various grid formats, the restriction-smoothed basis multiscale method (MsRSB) (Møyne and Lie 2016) has received wide attention and has been extended to various flow problems in porous media. Unlike the standard multiscale methods, MsRSB relies on iterative smoothing to find the multiscale basis functions in an adaptive manner, giving it the ability to naturally adjust to various complex grid orientations often encountered in real-life industrial applications. In this work, we investigate the scalability of MsRSB on various state-of-the-art parallel architectures, including multi-core systems and GPUs. While MsRSB is — like most other multiscale methods — directly amenable to parallelization, the dependence on a smoother to find the basis functions creates unique control- and data-flow patterns. These patterns require careful design and implementation in parallel environments to achieve good scalability. We extend the work on parallel multiscale methods in Manea et al. (2016) and Manea and Almani (2019) to map the MsRSB special kernels to the shared-memory parallel multi-core and GPU architectures. The scalability of our optimized parallel MsRSB implementation is demonstrated using highly heterogeneous 3D problems derived from the SPE10 Benchmark (Christie and Blunt 2001). Those problems range in size from millions to tens of millions of cells. The multi-core implementation is benchmarked on a shared memory multi-core architecture consisting of two packages of Intel's Cascade Lake Xeon® Gold 6246 CPU, while the GPU implementation is benchmarked on a massively parallel architecture consisting of Nvidia Volta V100 GPUs. We compare the multi-core implementation to the GPU implementation for both the setup and solution stages. To the best of our knowledge, this is the first parallel implementation and demonstration of the versatile MsRSB method on the GPU architecture.","PeriodicalId":11146,"journal":{"name":"Day 1 Tue, October 26, 2021","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Day 1 Tue, October 26, 2021","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2118/203939-ms","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Due to its simplicity, adaptability, and applicability to various grid formats, the restriction-smoothed basis multiscale method (MsRSB) (Møyne and Lie 2016) has received wide attention and has been extended to various flow problems in porous media. Unlike the standard multiscale methods, MsRSB relies on iterative smoothing to find the multiscale basis functions in an adaptive manner, giving it the ability to naturally adjust to various complex grid orientations often encountered in real-life industrial applications. In this work, we investigate the scalability of MsRSB on various state-of-the-art parallel architectures, including multi-core systems and GPUs. While MsRSB is — like most other multiscale methods — directly amenable to parallelization, the dependence on a smoother to find the basis functions creates unique control- and data-flow patterns. These patterns require careful design and implementation in parallel environments to achieve good scalability. We extend the work on parallel multiscale methods in Manea et al. (2016) and Manea and Almani (2019) to map the MsRSB special kernels to the shared-memory parallel multi-core and GPU architectures. The scalability of our optimized parallel MsRSB implementation is demonstrated using highly heterogeneous 3D problems derived from the SPE10 Benchmark (Christie and Blunt 2001). Those problems range in size from millions to tens of millions of cells. The multi-core implementation is benchmarked on a shared memory multi-core architecture consisting of two packages of Intel's Cascade Lake Xeon® Gold 6246 CPU, while the GPU implementation is benchmarked on a massively parallel architecture consisting of Nvidia Volta V100 GPUs. We compare the multi-core implementation to the GPU implementation for both the setup and solution stages. To the best of our knowledge, this is the first parallel implementation and demonstration of the versatile MsRSB method on the GPU architecture.
限制光滑基多尺度方法(MsRSB) (Møyne and Lie 2016)因其简单、适应性强、适用于各种网格格式而受到广泛关注,并已推广到多孔介质中的各种流动问题。与标准的多尺度方法不同,MsRSB依赖于迭代平滑,以自适应的方式找到多尺度基函数,使其能够自然地适应现实工业应用中经常遇到的各种复杂网格方向。在这项工作中,我们研究了MsRSB在各种最先进的并行架构上的可扩展性,包括多核系统和gpu。虽然MsRSB像大多数其他多尺度方法一样,直接适用于并行化,但依赖于平滑器来查找基函数创建了独特的控制和数据流模式。这些模式需要在并行环境中仔细设计和实现,以获得良好的可伸缩性。我们在Manea等人(2016)和Manea和Almani(2019)中扩展了并行多尺度方法的工作,以将MsRSB特殊内核映射到共享内存并行多核和GPU架构。我们优化的并行MsRSB实现的可扩展性使用源自SPE10基准的高度异构3D问题进行了演示(Christie and Blunt 2001)。这些问题的大小从数百万到数千万个细胞不等。多核实现在共享内存多核架构上进行基准测试,该架构由两个Intel的Cascade Lake Xeon®Gold 6246 CPU组成,而GPU实现在由Nvidia Volta V100 GPU组成的大规模并行架构上进行基准测试。我们在设置和解决方案阶段将多核实现与GPU实现进行了比较。据我们所知,这是GPU架构上通用MsRSB方法的第一个并行实现和演示。