gpu加速云计算服务和性能评估

IF 3.5 2区计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Simulation Modelling Practice and Theory Pub Date : 2025-07-11 DOI:10.1016/j.simpat.2025.103181

Zakery Collins, Gennaro De Luca, Yinong Chen

{"title":"gpu加速云计算服务和性能评估","authors":"Zakery Collins, Gennaro De Luca, Yinong Chen","doi":"10.1016/j.simpat.2025.103181","DOIUrl":null,"url":null,"abstract":"<div><div>This paper explores the feasibility of replacing traditional CPU-based cloud computing with Graphic Processing Unit GPU-accelerated services. Using NVIDIA’s CUDA GPU-accelerated C/<em>C</em>++ and Python libraries, we benchmark the performance of GPU computing against multithreaded CPU computing across several domains, including machine learning and large-scale image processing. A novel contribution of this work is an intelligent autoscaling system that maximizes single-GPU resource utilization before scaling to additional GPUs, improving efficiency in cloud-based deployments. Our simulation experiments demonstrate significant performance gains for GPU-accelerated computing and highlight the impact of optimized resource allocation in cloud environments. For example, in a machine learning experiment, using a dataset with 8.790 entries, the execution of a GeForce 3060 ti GPU is 3.42 times faster than a 16-thread CPU computer. Compared with the same 16-thread CPU, Tesla K80 GPU is 4.17 times faster. Furthermore, we provide an analysis of GPU performance optimization strategies, including memory management, concurrency techniques, and workload distribution methodologies, offering insights into the long-term scalability and cost-effectiveness of GPU-accelerated cloud infrastructure.</div></div>","PeriodicalId":49518,"journal":{"name":"Simulation Modelling Practice and Theory","volume":"144 ","pages":"Article 103181"},"PeriodicalIF":3.5000,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GPU-accelerated cloud computing services and performance evaluation\",\"authors\":\"Zakery Collins, Gennaro De Luca, Yinong Chen\",\"doi\":\"10.1016/j.simpat.2025.103181\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This paper explores the feasibility of replacing traditional CPU-based cloud computing with Graphic Processing Unit GPU-accelerated services. Using NVIDIA’s CUDA GPU-accelerated C/<em>C</em>++ and Python libraries, we benchmark the performance of GPU computing against multithreaded CPU computing across several domains, including machine learning and large-scale image processing. A novel contribution of this work is an intelligent autoscaling system that maximizes single-GPU resource utilization before scaling to additional GPUs, improving efficiency in cloud-based deployments. Our simulation experiments demonstrate significant performance gains for GPU-accelerated computing and highlight the impact of optimized resource allocation in cloud environments. For example, in a machine learning experiment, using a dataset with 8.790 entries, the execution of a GeForce 3060 ti GPU is 3.42 times faster than a 16-thread CPU computer. Compared with the same 16-thread CPU, Tesla K80 GPU is 4.17 times faster. Furthermore, we provide an analysis of GPU performance optimization strategies, including memory management, concurrency techniques, and workload distribution methodologies, offering insights into the long-term scalability and cost-effectiveness of GPU-accelerated cloud infrastructure.</div></div>\",\"PeriodicalId\":49518,\"journal\":{\"name\":\"Simulation Modelling Practice and Theory\",\"volume\":\"144 \",\"pages\":\"Article 103181\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2025-07-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Simulation Modelling Practice and Theory\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1569190X25001169\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Simulation Modelling Practice and Theory","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1569190X25001169","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

本文探讨了用图形处理单元gpu加速服务取代传统基于cpu的云计算的可行性。使用NVIDIA的CUDA GPU加速C/ c++和Python库，我们将GPU计算的性能与多个领域的多线程CPU计算进行基准测试，包括机器学习和大规模图像处理。这项工作的一个新颖贡献是一个智能自动扩展系统，在扩展到其他gpu之前最大限度地利用单个gpu资源，提高基于云的部署的效率。我们的模拟实验证明了gpu加速计算的显著性能提升，并突出了云环境中优化资源分配的影响。例如，在机器学习实验中，使用具有8.790个条目的数据集，GeForce 3060 ti GPU的执行速度比16线程CPU计算机快3.42倍。与相同的16线程CPU相比，Tesla K80 GPU的速度提高了4.17倍。此外，我们还提供了GPU性能优化策略的分析，包括内存管理、并发技术和工作负载分配方法，提供了对GPU加速云基础设施的长期可扩展性和成本效益的见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

GPU-accelerated cloud computing services and performance evaluation

This paper explores the feasibility of replacing traditional CPU-based cloud computing with Graphic Processing Unit GPU-accelerated services. Using NVIDIA’s CUDA GPU-accelerated C/C++ and Python libraries, we benchmark the performance of GPU computing against multithreaded CPU computing across several domains, including machine learning and large-scale image processing. A novel contribution of this work is an intelligent autoscaling system that maximizes single-GPU resource utilization before scaling to additional GPUs, improving efficiency in cloud-based deployments. Our simulation experiments demonstrate significant performance gains for GPU-accelerated computing and highlight the impact of optimized resource allocation in cloud environments. For example, in a machine learning experiment, using a dataset with 8.790 entries, the execution of a GeForce 3060 ti GPU is 3.42 times faster than a 16-thread CPU computer. Compared with the same 16-thread CPU, Tesla K80 GPU is 4.17 times faster. Furthermore, we provide an analysis of GPU performance optimization strategies, including memory management, concurrency techniques, and workload distribution methodologies, offering insights into the long-term scalability and cost-effectiveness of GPU-accelerated cloud infrastructure.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Simulation Modelling Practice and Theory 工程技术-计算机：跨学科应用

CiteScore

9.80

自引率

4.80%

发文量

142

审稿时长

21 days

期刊介绍： The journal Simulation Modelling Practice and Theory provides a forum for original, high-quality papers dealing with any aspect of systems simulation and modelling. The journal aims at being a reference and a powerful tool to all those professionally active and/or interested in the methods and applications of simulation. Submitted papers will be peer reviewed and must significantly contribute to modelling and simulation in general or use modelling and simulation in application areas. Paper submission is solicited on: • theoretical aspects of modelling and simulation including formal modelling, model-checking, random number generators, sensitivity analysis, variance reduction techniques, experimental design, meta-modelling, methods and algorithms for validation and verification, selection and comparison procedures etc.; • methodology and application of modelling and simulation in any area, including computer systems, networks, real-time and embedded systems, mobile and intelligent agents, manufacturing and transportation systems, management, engineering, biomedical engineering, economics, ecology and environment, education, transaction handling, etc.; • simulation languages and environments including those, specific to distributed computing, grid computing, high performance computers or computer networks, etc.; • distributed and real-time simulation, simulation interoperability; • tools for high performance computing simulation, including dedicated architectures and parallel computing.