Sara Ayubian, Shadi G. Alawneh, M. Richard, Jan Thijssen
{"title":"基于gpu的设计冰荷载蒙特卡罗框架的实现与性能","authors":"Sara Ayubian, Shadi G. Alawneh, M. Richard, Jan Thijssen","doi":"10.1109/HPCS.2017.27","DOIUrl":null,"url":null,"abstract":"Modern Graphics Processing Units (GPUs) with massive number of threads and many-core architecture support both graphics and general purpose computing. NVIDIA's compute unified device architecture (CUDA) takes advantage of parallel computing and utilizes the tremendous power of GPUs. The present study demonstrates a high performance computing (HPC) framework for a Monte-Carlo simulation to determine design sea ice loads which is implemented in both GPU and CPU. Results show a speedup of up to 130 times for the 4 Tesla K80 GPUs over an optimized CPU OpenMP implementation and speedup of up to 8 times for the 4 Tesla K80 over a single Tesla K80 GPU implementation. The elapsed time of the different implementations has been reduced from about 2.5 hours to 0.7 seconds.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Implementation and Performance of a GPU-Based Monte-Carlo Framework for Determining Design Ice Load\",\"authors\":\"Sara Ayubian, Shadi G. Alawneh, M. Richard, Jan Thijssen\",\"doi\":\"10.1109/HPCS.2017.27\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Modern Graphics Processing Units (GPUs) with massive number of threads and many-core architecture support both graphics and general purpose computing. NVIDIA's compute unified device architecture (CUDA) takes advantage of parallel computing and utilizes the tremendous power of GPUs. The present study demonstrates a high performance computing (HPC) framework for a Monte-Carlo simulation to determine design sea ice loads which is implemented in both GPU and CPU. Results show a speedup of up to 130 times for the 4 Tesla K80 GPUs over an optimized CPU OpenMP implementation and speedup of up to 8 times for the 4 Tesla K80 over a single Tesla K80 GPU implementation. The elapsed time of the different implementations has been reduced from about 2.5 hours to 0.7 seconds.\",\"PeriodicalId\":115758,\"journal\":{\"name\":\"2017 International Conference on High Performance Computing & Simulation (HPCS)\",\"volume\":\"54 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 International Conference on High Performance Computing & Simulation (HPCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCS.2017.27\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on High Performance Computing & Simulation (HPCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCS.2017.27","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Implementation and Performance of a GPU-Based Monte-Carlo Framework for Determining Design Ice Load
Modern Graphics Processing Units (GPUs) with massive number of threads and many-core architecture support both graphics and general purpose computing. NVIDIA's compute unified device architecture (CUDA) takes advantage of parallel computing and utilizes the tremendous power of GPUs. The present study demonstrates a high performance computing (HPC) framework for a Monte-Carlo simulation to determine design sea ice loads which is implemented in both GPU and CPU. Results show a speedup of up to 130 times for the 4 Tesla K80 GPUs over an optimized CPU OpenMP implementation and speedup of up to 8 times for the 4 Tesla K80 over a single Tesla K80 GPU implementation. The elapsed time of the different implementations has been reduced from about 2.5 hours to 0.7 seconds.