Álvaro Sáiz, P. Prieto, Pablo Abad Fidalgo, J. Gregorio, Valentin Puente
{"title":"Top-Down Performance Profiling on NVIDIA's GPUs","authors":"Álvaro Sáiz, P. Prieto, Pablo Abad Fidalgo, J. Gregorio, Valentin Puente","doi":"10.1109/ipdps53621.2022.00026","DOIUrl":null,"url":null,"abstract":"The rise of data-intensive algorithms, such as Machine Learning ones, has meant a strong diversification of Graphics Processing Units (GPU) in fields with intensive Data-Level Parallelism. This trend, known as general-purpose computing on GPU (GP-GPU), makes the execution process on a GPU (seemingly simple in its architecture) far from trivial when targeting performance for many dissimilar applications. A proof of this is the existence of many profiling tools that help programmers to understand how to maximize hardware utilization. In contrast, this paper proposes a profiling tool focused on microarchitecture analysis under large sets of dissimilar applications. Therefore, the tool has a double objective. On the one hand, to check the suitability of a GPU for diverse sets of application kernels. On the other hand, to identify possible bottlenecks in a given GPU microarchitecture, facilitating the improvement of subsequent designs. For this purpose, using Top-Down methodology proposed by Intel for their CPUs as inspiration, we have defined a hierarchical organization for the execution pipeline of the GPU. The proposal makes use of the available hardware performance counters to identify how each component contributes to performance losses. We demonstrate the feasibility of the proposed methodology, analyzing how different modern NVIDIA architectures behave running relevant benchmarks, assessing in which microarchitecture component performance losses are the most significant.","PeriodicalId":321801,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ipdps53621.2022.00026","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
The rise of data-intensive algorithms, such as Machine Learning ones, has meant a strong diversification of Graphics Processing Units (GPU) in fields with intensive Data-Level Parallelism. This trend, known as general-purpose computing on GPU (GP-GPU), makes the execution process on a GPU (seemingly simple in its architecture) far from trivial when targeting performance for many dissimilar applications. A proof of this is the existence of many profiling tools that help programmers to understand how to maximize hardware utilization. In contrast, this paper proposes a profiling tool focused on microarchitecture analysis under large sets of dissimilar applications. Therefore, the tool has a double objective. On the one hand, to check the suitability of a GPU for diverse sets of application kernels. On the other hand, to identify possible bottlenecks in a given GPU microarchitecture, facilitating the improvement of subsequent designs. For this purpose, using Top-Down methodology proposed by Intel for their CPUs as inspiration, we have defined a hierarchical organization for the execution pipeline of the GPU. The proposal makes use of the available hardware performance counters to identify how each component contributes to performance losses. We demonstrate the feasibility of the proposed methodology, analyzing how different modern NVIDIA architectures behave running relevant benchmarks, assessing in which microarchitecture component performance losses are the most significant.