Timothy Hayes, Oscar Palomar, O. Unsal, A. Cristal, M. Valero
{"title":"VSR sort: A novel vectorised sorting algorithm & architecture extensions for future microprocessors","authors":"Timothy Hayes, Oscar Palomar, O. Unsal, A. Cristal, M. Valero","doi":"10.1109/HPCA.2015.7056019","DOIUrl":null,"url":null,"abstract":"Sorting is a widely studied problem in computer science and an elementary building block in many of its subfields. There are several known techniques to vectorise and accelerate a handful of sorting algorithms by using single instruction-multiple data (SIMD) instructions. It is expected that the widths and capabilities of SIMD support will improve dramatically in future microprocessor generations and it is not yet clear whether or not these sorting algorithms will be suitable or optimal when executed on them. This work extrapolates the level of SIMD support in future microprocessors and evaluates these algorithms using a simulation framework. The scalability, strengths and weaknesses of each algorithm are experimentally derived. We then propose VSR sort, our own novel vectorised non-comparative sorting algorithm based on radix sort. To facilitate the execution of this algorithm we define two new SIMD instructions and propose a complementary hardware structure for their execution. Our results show that VSR sort has maximum speedups between 14.9x and 20.6x over a scalar baseline and an average speedup of 3.4x over the next-best vectorised sorting algorithm.","PeriodicalId":6593,"journal":{"name":"2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)","volume":"14 1","pages":"26-38"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2015.7056019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15
Abstract
Sorting is a widely studied problem in computer science and an elementary building block in many of its subfields. There are several known techniques to vectorise and accelerate a handful of sorting algorithms by using single instruction-multiple data (SIMD) instructions. It is expected that the widths and capabilities of SIMD support will improve dramatically in future microprocessor generations and it is not yet clear whether or not these sorting algorithms will be suitable or optimal when executed on them. This work extrapolates the level of SIMD support in future microprocessors and evaluates these algorithms using a simulation framework. The scalability, strengths and weaknesses of each algorithm are experimentally derived. We then propose VSR sort, our own novel vectorised non-comparative sorting algorithm based on radix sort. To facilitate the execution of this algorithm we define two new SIMD instructions and propose a complementary hardware structure for their execution. Our results show that VSR sort has maximum speedups between 14.9x and 20.6x over a scalar baseline and an average speedup of 3.4x over the next-best vectorised sorting algorithm.