Mehmed Mujić, Irvin Ćatić, Samra Behić, Amila Hadžibajramović, N. Nosovic, Tarik Hrnjić
{"title":"Accelerating Sorting on GPUs: A Scalable CUDA Quicksort Revision","authors":"Mehmed Mujić, Irvin Ćatić, Samra Behić, Amila Hadžibajramović, N. Nosovic, Tarik Hrnjić","doi":"10.1109/INFOTEH57020.2023.10094180","DOIUrl":null,"url":null,"abstract":"In this article, an upgraded version of CUDA-Quicksort - an iterative implementation of the quicksort algorithm suitable for highly parallel multicore graphics processors, is described and evaluated. Three key changes which lead to improved performance are proposed. The main goal was to provide an implementation with increased scalability with the size of data sets and number of cores with modern GPU architectures, which was successfully achieved. The proposed changes also lead to significant reduction in execution time. The execution times were measured on an NVIDIA graphics card, taking into account the possible distributions of the input data.","PeriodicalId":287923,"journal":{"name":"2023 22nd International Symposium INFOTEH-JAHORINA (INFOTEH)","volume":"151 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 22nd International Symposium INFOTEH-JAHORINA (INFOTEH)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INFOTEH57020.2023.10094180","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In this article, an upgraded version of CUDA-Quicksort - an iterative implementation of the quicksort algorithm suitable for highly parallel multicore graphics processors, is described and evaluated. Three key changes which lead to improved performance are proposed. The main goal was to provide an implementation with increased scalability with the size of data sets and number of cores with modern GPU architectures, which was successfully achieved. The proposed changes also lead to significant reduction in execution time. The execution times were measured on an NVIDIA graphics card, taking into account the possible distributions of the input data.