{"title":"优化通信在多gpu晶格玻尔兹曼模拟","authors":"E. Calore, D. Marchi, S. Schifano, R. Tripiccione","doi":"10.1109/HPCSim.2015.7237021","DOIUrl":null,"url":null,"abstract":"An increasingly large number of scientific applications run on large clusters based on GPU systems. In most cases the large scale parallelism of the applications uses MPI, widely recognized as the de-facto standard for building parallel applications, while several programming languages are used to express the parallelism available in the application and map it onto the parallel resources available on GPUs. Regular grids and stencil codes are used in a subset of these applications, often corresponding to computational “Grand Challenges”. One such class of applications are Lattice Boltzmann Methods (LB) used in computational fluid dynamics. The regular structure of LB algorithms makes them suitable for processor architectures with a large degree of parallelism like GPUs. Scalability of these applications on large clusters requires a careful design of processor-to-processor data communications, exploiting all possibilities to overlap communication and computation. This paper looks at these issues, considering as a use case a state-of-the-art two-dimensional LB model, that accurately reproduces the thermo-hydrodynamics of a 2D-fluid obeying the equation-of-state of a perfect gas. We study in details the interplay between data organization and data layout, data-communication options and overlapping of communication and computation. We derive partial models of some performance features and compare with experimental results for production-grade codes that we run on a large cluster of GPUs.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"49 17","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Optimizing communications in multi-GPU Lattice Boltzmann simulations\",\"authors\":\"E. Calore, D. Marchi, S. Schifano, R. Tripiccione\",\"doi\":\"10.1109/HPCSim.2015.7237021\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"An increasingly large number of scientific applications run on large clusters based on GPU systems. In most cases the large scale parallelism of the applications uses MPI, widely recognized as the de-facto standard for building parallel applications, while several programming languages are used to express the parallelism available in the application and map it onto the parallel resources available on GPUs. Regular grids and stencil codes are used in a subset of these applications, often corresponding to computational “Grand Challenges”. One such class of applications are Lattice Boltzmann Methods (LB) used in computational fluid dynamics. The regular structure of LB algorithms makes them suitable for processor architectures with a large degree of parallelism like GPUs. Scalability of these applications on large clusters requires a careful design of processor-to-processor data communications, exploiting all possibilities to overlap communication and computation. This paper looks at these issues, considering as a use case a state-of-the-art two-dimensional LB model, that accurately reproduces the thermo-hydrodynamics of a 2D-fluid obeying the equation-of-state of a perfect gas. We study in details the interplay between data organization and data layout, data-communication options and overlapping of communication and computation. We derive partial models of some performance features and compare with experimental results for production-grade codes that we run on a large cluster of GPUs.\",\"PeriodicalId\":134009,\"journal\":{\"name\":\"2015 International Conference on High Performance Computing & Simulation (HPCS)\",\"volume\":\"49 17\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-07-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 International Conference on High Performance Computing & Simulation (HPCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCSim.2015.7237021\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on High Performance Computing & Simulation (HPCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCSim.2015.7237021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Optimizing communications in multi-GPU Lattice Boltzmann simulations
An increasingly large number of scientific applications run on large clusters based on GPU systems. In most cases the large scale parallelism of the applications uses MPI, widely recognized as the de-facto standard for building parallel applications, while several programming languages are used to express the parallelism available in the application and map it onto the parallel resources available on GPUs. Regular grids and stencil codes are used in a subset of these applications, often corresponding to computational “Grand Challenges”. One such class of applications are Lattice Boltzmann Methods (LB) used in computational fluid dynamics. The regular structure of LB algorithms makes them suitable for processor architectures with a large degree of parallelism like GPUs. Scalability of these applications on large clusters requires a careful design of processor-to-processor data communications, exploiting all possibilities to overlap communication and computation. This paper looks at these issues, considering as a use case a state-of-the-art two-dimensional LB model, that accurately reproduces the thermo-hydrodynamics of a 2D-fluid obeying the equation-of-state of a perfect gas. We study in details the interplay between data organization and data layout, data-communication options and overlapping of communication and computation. We derive partial models of some performance features and compare with experimental results for production-grade codes that we run on a large cluster of GPUs.