Alexander Geiß , Téodora Hovi , Alexandru Calotoiu , Felix Wolf
{"title":"Validating the performance of GPU ports using differential performance models","authors":"Alexander Geiß , Téodora Hovi , Alexandru Calotoiu , Felix Wolf","doi":"10.1016/j.future.2025.108018","DOIUrl":null,"url":null,"abstract":"<div><div>Offloading computation to the GPU is crucial to leverage many of today’s supercomputers. We expect the GPU port of an application to outperform the pure CPU implementation, but is this always true? Simple benchmarking only allows us to take a limited number of samples from a vast space of execution configurations and can, therefore, deliver only a fragmented answer. To answer the question systematically, even for individual application kernels, we propose a semi-automatic toolchain based on differential performance modeling and intuitive visualizations. We combine empirical performance models based on unified CPU–GPU profiles with hardware characteristics to derive differential performance models that can be easily compared across device types. In four case studies, we demonstrate how our toolchain pinpoints scaling issues in GPU ports, guides performance improvements, and identifies execution configurations with superior performance.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"174 ","pages":"Article 108018"},"PeriodicalIF":6.2000,"publicationDate":"2025-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X25003139","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Offloading computation to the GPU is crucial to leverage many of today’s supercomputers. We expect the GPU port of an application to outperform the pure CPU implementation, but is this always true? Simple benchmarking only allows us to take a limited number of samples from a vast space of execution configurations and can, therefore, deliver only a fragmented answer. To answer the question systematically, even for individual application kernels, we propose a semi-automatic toolchain based on differential performance modeling and intuitive visualizations. We combine empirical performance models based on unified CPU–GPU profiles with hardware characteristics to derive differential performance models that can be easily compared across device types. In four case studies, we demonstrate how our toolchain pinpoints scaling issues in GPU ports, guides performance improvements, and identifies execution configurations with superior performance.
期刊介绍:
Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications.
Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration.
Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.