Milo Lurati, Stijn Heldens, Alessio Sclocco, Ben van Werkhoven
{"title":"Bringing Auto-tuning to HIP: Analysis of Tuning Impact and Difficulty on AMD and Nvidia GPUs","authors":"Milo Lurati, Stijn Heldens, Alessio Sclocco, Ben van Werkhoven","doi":"arxiv-2407.11488","DOIUrl":null,"url":null,"abstract":"Many studies have focused on developing and improving auto-tuning algorithms\nfor Nvidia Graphics Processing Units (GPUs), but the effectiveness and\nefficiency of these approaches on AMD devices have hardly been studied. This\npaper aims to address this gap by introducing an auto-tuner for AMD's HIP. We\ndo so by extending Kernel Tuner, an open-source Python library for auto-tuning\nGPU programs. We analyze the performance impact and tuning difficulty for four\nhighly-tunable benchmark kernels on four different GPUs: two from Nvidia and\ntwo from AMD. Our results demonstrate that auto-tuning has a significantly\nhigher impact on performance on AMD compared to Nvidia (10x vs 2x).\nAdditionally, we show that applications tuned for Nvidia do not perform\noptimally on AMD, underscoring the importance of auto-tuning specifically for\nAMD to achieve high performance on these GPUs.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"60 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Performance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.11488","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Many studies have focused on developing and improving auto-tuning algorithms
for Nvidia Graphics Processing Units (GPUs), but the effectiveness and
efficiency of these approaches on AMD devices have hardly been studied. This
paper aims to address this gap by introducing an auto-tuner for AMD's HIP. We
do so by extending Kernel Tuner, an open-source Python library for auto-tuning
GPU programs. We analyze the performance impact and tuning difficulty for four
highly-tunable benchmark kernels on four different GPUs: two from Nvidia and
two from AMD. Our results demonstrate that auto-tuning has a significantly
higher impact on performance on AMD compared to Nvidia (10x vs 2x).
Additionally, we show that applications tuned for Nvidia do not perform
optimally on AMD, underscoring the importance of auto-tuning specifically for
AMD to achieve high performance on these GPUs.