多相LBM分布在多个gpu上

2011 IEEE International Conference on Cluster Computing Pub Date : 2011-09-26 DOI:10.1109/CLUSTER.2011.9

C. Rosales

{"title":"多相LBM分布在多个gpu上","authors":"C. Rosales","doi":"10.1109/CLUSTER.2011.9","DOIUrl":null,"url":null,"abstract":"A parallel distributed CUDA implementation of a Lattice Boltzmann Method for multiphase flows with large density ratios is described in this paper. Validation runs studying the terminal velocity of a rising bubble under the effect of gravity show good agreement with the expected theoretical values. The code is benchmarked against the performance of a typical CPU implementation of the same algorithm on both AMD and Intel platforms, and a single GPU is observed to perform up to 10X faster than a quad-core CPU socket, a 40X speedup with respect to a single core. The code is shown to scale well when executed on multiple GPUs, which makes the port to CUDA valuable even when compared to parallel CPU implementations.","PeriodicalId":200830,"journal":{"name":"2011 IEEE International Conference on Cluster Computing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Multiphase LBM Distributed over Multiple GPUs\",\"authors\":\"C. Rosales\",\"doi\":\"10.1109/CLUSTER.2011.9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A parallel distributed CUDA implementation of a Lattice Boltzmann Method for multiphase flows with large density ratios is described in this paper. Validation runs studying the terminal velocity of a rising bubble under the effect of gravity show good agreement with the expected theoretical values. The code is benchmarked against the performance of a typical CPU implementation of the same algorithm on both AMD and Intel platforms, and a single GPU is observed to perform up to 10X faster than a quad-core CPU socket, a 40X speedup with respect to a single core. The code is shown to scale well when executed on multiple GPUs, which makes the port to CUDA valuable even when compared to parallel CPU implementations.\",\"PeriodicalId\":200830,\"journal\":{\"name\":\"2011 IEEE International Conference on Cluster Computing\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-09-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 IEEE International Conference on Cluster Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CLUSTER.2011.9\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE International Conference on Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLUSTER.2011.9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

摘要

本文描述了一种用于大密度比多相流的格子玻尔兹曼方法的并行分布式CUDA实现。对重力作用下气泡上升的终端速度进行了验证，结果与预期的理论值吻合较好。代码是针对AMD和Intel平台上相同算法的典型CPU实现的性能进行基准测试的，并且观察到单个GPU的执行速度比四核CPU插座快10倍，相对于单核加速40倍。当在多个gpu上执行时，代码显示出良好的可伸缩性，这使得到CUDA的端口即使与并行CPU实现相比也是有价值的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multiphase LBM Distributed over Multiple GPUs

A parallel distributed CUDA implementation of a Lattice Boltzmann Method for multiphase flows with large density ratios is described in this paper. Validation runs studying the terminal velocity of a rising bubble under the effect of gravity show good agreement with the expected theoretical values. The code is benchmarked against the performance of a typical CPU implementation of the same algorithm on both AMD and Intel platforms, and a single GPU is observed to perform up to 10X faster than a quad-core CPU socket, a 40X speedup with respect to a single core. The code is shown to scale well when executed on multiple GPUs, which makes the port to CUDA valuable even when compared to parallel CPU implementations.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 IEEE International Conference on Cluster Computing

自引率

0.00%

发文量