大涡模拟超分辨率生成对抗网络湍流模型的并行实现与性能研究

IF 2.5 3区工程技术 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computers & Fluids Pub Date : 2024-12-02 DOI:10.1016/j.compfluid.2024.106498

Ludovico Nista , Christoph D.K. Schumann , Peicho Petkov , Valentin Pavlov , Temistocle Grenga , Jonathan F. MacArt , Antonio Attili , Stoyan Markov , Heinz Pitsch

{"title":"大涡模拟超分辨率生成对抗网络湍流模型的并行实现与性能研究","authors":"Ludovico Nista , Christoph D.K. Schumann , Peicho Petkov , Valentin Pavlov , Temistocle Grenga , Jonathan F. MacArt , Antonio Attili , Stoyan Markov , Heinz Pitsch","doi":"10.1016/j.compfluid.2024.106498","DOIUrl":null,"url":null,"abstract":"<div><div>Super-resolution (SR) generative adversarial networks (GANs) are promising for turbulence closure in large-eddy simulation (LES) due to their ability to accurately reconstruct high-resolution data from low-resolution fields. Current model training and inference strategies are not sufficiently mature for large-scale, distributed calculations due to the computational demands and often unstable training of SR-GANs, which limits the exploration of improved model structures, training strategies, and loss-function definitions. Integrating SR-GANs into LES solvers for inference-coupled simulations is also necessary to assess their <em>a posteriori</em> accuracy, stability, and cost. We investigate parallelization strategies for SR-GAN training and inference-coupled LES, focusing on computational performance and reconstruction accuracy. We examine distributed data-parallel training strategies for hybrid CPU–GPU node architectures and the associated influence of low-/high-resolution subbox size, global batch size, and discriminator accuracy. Accurate predictions require training subboxes that are sufficiently large relative to the Kolmogorov length scale. Care should be placed on the coupled effect of training batch size, learning rate, number of training subboxes, and discriminator’s learning capabilities. We introduce a data-parallel SR-GAN training and inference library for heterogeneous architectures that enables exchange between the LES solver and SR-GAN inference at runtime. We investigate the predictive accuracy and computational performance of this arrangement with particular focus on the overlap (halo) size required for accurate SR reconstruction. Similarly, <em>a posteriori</em> parallel scaling for efficient inference-coupled LES is constrained by the SR subdomain size, GPU utilization, and reconstruction accuracy. Based on these findings, we establish guidelines and best practices to optimize resource utilization and parallel acceleration of SR-GAN turbulence model training and inference-coupled LES calculations while maintaining predictive accuracy.</div></div>","PeriodicalId":287,"journal":{"name":"Computers & Fluids","volume":"288 ","pages":"Article 106498"},"PeriodicalIF":2.5000,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Parallel implementation and performance of super-resolution generative adversarial network turbulence models for large-eddy simulation\",\"authors\":\"Ludovico Nista , Christoph D.K. Schumann , Peicho Petkov , Valentin Pavlov , Temistocle Grenga , Jonathan F. MacArt , Antonio Attili , Stoyan Markov , Heinz Pitsch\",\"doi\":\"10.1016/j.compfluid.2024.106498\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Super-resolution (SR) generative adversarial networks (GANs) are promising for turbulence closure in large-eddy simulation (LES) due to their ability to accurately reconstruct high-resolution data from low-resolution fields. Current model training and inference strategies are not sufficiently mature for large-scale, distributed calculations due to the computational demands and often unstable training of SR-GANs, which limits the exploration of improved model structures, training strategies, and loss-function definitions. Integrating SR-GANs into LES solvers for inference-coupled simulations is also necessary to assess their <em>a posteriori</em> accuracy, stability, and cost. We investigate parallelization strategies for SR-GAN training and inference-coupled LES, focusing on computational performance and reconstruction accuracy. We examine distributed data-parallel training strategies for hybrid CPU–GPU node architectures and the associated influence of low-/high-resolution subbox size, global batch size, and discriminator accuracy. Accurate predictions require training subboxes that are sufficiently large relative to the Kolmogorov length scale. Care should be placed on the coupled effect of training batch size, learning rate, number of training subboxes, and discriminator’s learning capabilities. We introduce a data-parallel SR-GAN training and inference library for heterogeneous architectures that enables exchange between the LES solver and SR-GAN inference at runtime. We investigate the predictive accuracy and computational performance of this arrangement with particular focus on the overlap (halo) size required for accurate SR reconstruction. Similarly, <em>a posteriori</em> parallel scaling for efficient inference-coupled LES is constrained by the SR subdomain size, GPU utilization, and reconstruction accuracy. Based on these findings, we establish guidelines and best practices to optimize resource utilization and parallel acceleration of SR-GAN turbulence model training and inference-coupled LES calculations while maintaining predictive accuracy.</div></div>\",\"PeriodicalId\":287,\"journal\":{\"name\":\"Computers & Fluids\",\"volume\":\"288 \",\"pages\":\"Article 106498\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2024-12-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Fluids\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0045793024003293\",\"RegionNum\":3,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Fluids","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0045793024003293","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

超分辨率（SR）生成对抗网络（gan）由于能够从低分辨率场中精确重建高分辨率数据，因此有望用于大涡模拟（LES）中的湍流关闭。由于sr - gan的计算需求和经常不稳定的训练，目前的模型训练和推理策略对于大规模、分布式计算来说还不够成熟，这限制了对改进模型结构、训练策略和损失函数定义的探索。将sr - gan集成到LES求解器中进行推理耦合模拟，也有必要评估它们的后验精度、稳定性和成本。我们研究了SR-GAN训练和推理耦合LES的并行化策略，重点关注计算性能和重建精度。我们研究了混合CPU-GPU节点架构的分布式数据并行训练策略，以及低/高分辨率子盒大小、全局批大小和鉴别器精度的相关影响。准确的预测需要训练子盒相对于柯尔莫哥洛夫长度尺度足够大。应该注意训练批大小、学习率、训练子盒数量和判别器学习能力的耦合效应。我们为异构架构引入了一个数据并行的SR-GAN训练和推理库，该库允许在运行时在LES求解器和SR-GAN推理之间进行交换。我们研究了这种排列的预测精度和计算性能，特别关注精确SR重建所需的重叠（光晕）大小。同样，高效推理耦合LES的后验并行缩放受到SR子域大小、GPU利用率和重建精度的限制。基于这些发现，我们建立了指导方针和最佳实践，以优化资源利用和并行加速SR-GAN湍流模型训练和推理耦合LES计算，同时保持预测准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Parallel implementation and performance of super-resolution generative adversarial network turbulence models for large-eddy simulation

Super-resolution (SR) generative adversarial networks (GANs) are promising for turbulence closure in large-eddy simulation (LES) due to their ability to accurately reconstruct high-resolution data from low-resolution fields. Current model training and inference strategies are not sufficiently mature for large-scale, distributed calculations due to the computational demands and often unstable training of SR-GANs, which limits the exploration of improved model structures, training strategies, and loss-function definitions. Integrating SR-GANs into LES solvers for inference-coupled simulations is also necessary to assess their a posteriori accuracy, stability, and cost. We investigate parallelization strategies for SR-GAN training and inference-coupled LES, focusing on computational performance and reconstruction accuracy. We examine distributed data-parallel training strategies for hybrid CPU–GPU node architectures and the associated influence of low-/high-resolution subbox size, global batch size, and discriminator accuracy. Accurate predictions require training subboxes that are sufficiently large relative to the Kolmogorov length scale. Care should be placed on the coupled effect of training batch size, learning rate, number of training subboxes, and discriminator’s learning capabilities. We introduce a data-parallel SR-GAN training and inference library for heterogeneous architectures that enables exchange between the LES solver and SR-GAN inference at runtime. We investigate the predictive accuracy and computational performance of this arrangement with particular focus on the overlap (halo) size required for accurate SR reconstruction. Similarly, a posteriori parallel scaling for efficient inference-coupled LES is constrained by the SR subdomain size, GPU utilization, and reconstruction accuracy. Based on these findings, we establish guidelines and best practices to optimize resource utilization and parallel acceleration of SR-GAN turbulence model training and inference-coupled LES calculations while maintaining predictive accuracy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computers & Fluids 物理-计算机：跨学科应用

CiteScore

5.30

自引率

7.10%

发文量

242

审稿时长

10.8 months

期刊介绍： Computers & Fluids is multidisciplinary. The term ''fluid'' is interpreted in the broadest sense. Hydro- and aerodynamics, high-speed and physical gas dynamics, turbulence and flow stability, multiphase flow, rheology, tribology and fluid-structure interaction are all of interest, provided that computer technique plays a significant role in the associated studies or design methodology.