VSR sort: A novel vectorised sorting algorithm & architecture extensions for future microprocessors

2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2015-03-09 DOI:10.1109/HPCA.2015.7056019

Timothy Hayes, Oscar Palomar, O. Unsal, A. Cristal, M. Valero

{"title":"VSR sort: A novel vectorised sorting algorithm & architecture extensions for future microprocessors","authors":"Timothy Hayes, Oscar Palomar, O. Unsal, A. Cristal, M. Valero","doi":"10.1109/HPCA.2015.7056019","DOIUrl":null,"url":null,"abstract":"Sorting is a widely studied problem in computer science and an elementary building block in many of its subfields. There are several known techniques to vectorise and accelerate a handful of sorting algorithms by using single instruction-multiple data (SIMD) instructions. It is expected that the widths and capabilities of SIMD support will improve dramatically in future microprocessor generations and it is not yet clear whether or not these sorting algorithms will be suitable or optimal when executed on them. This work extrapolates the level of SIMD support in future microprocessors and evaluates these algorithms using a simulation framework. The scalability, strengths and weaknesses of each algorithm are experimentally derived. We then propose VSR sort, our own novel vectorised non-comparative sorting algorithm based on radix sort. To facilitate the execution of this algorithm we define two new SIMD instructions and propose a complementary hardware structure for their execution. Our results show that VSR sort has maximum speedups between 14.9x and 20.6x over a scalar baseline and an average speedup of 3.4x over the next-best vectorised sorting algorithm.","PeriodicalId":6593,"journal":{"name":"2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)","volume":"14 1","pages":"26-38"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2015.7056019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

Abstract

Sorting is a widely studied problem in computer science and an elementary building block in many of its subfields. There are several known techniques to vectorise and accelerate a handful of sorting algorithms by using single instruction-multiple data (SIMD) instructions. It is expected that the widths and capabilities of SIMD support will improve dramatically in future microprocessor generations and it is not yet clear whether or not these sorting algorithms will be suitable or optimal when executed on them. This work extrapolates the level of SIMD support in future microprocessors and evaluates these algorithms using a simulation framework. The scalability, strengths and weaknesses of each algorithm are experimentally derived. We then propose VSR sort, our own novel vectorised non-comparative sorting algorithm based on radix sort. To facilitate the execution of this algorithm we define two new SIMD instructions and propose a complementary hardware structure for their execution. Our results show that VSR sort has maximum speedups between 14.9x and 20.6x over a scalar baseline and an average speedup of 3.4x over the next-best vectorised sorting algorithm.

查看原文本刊更多论文

VSR排序:一种新的矢量化排序算法和未来微处理器的架构扩展

排序是计算机科学中一个被广泛研究的问题，也是其许多子领域的基本组成部分。有几种已知的技术可以通过使用单指令多数据(SIMD)指令来矢量化和加速一些排序算法。预计SIMD支持的宽度和能力将在未来的微处理器世代中得到显著改善，目前尚不清楚这些排序算法在它们上执行时是否合适或最佳。这项工作推断了未来微处理器中SIMD支持的水平，并使用模拟框架评估了这些算法。实验推导了各算法的可扩展性、优缺点。然后，我们提出了VSR排序，我们自己的新颖的基于基数排序的矢量非比较排序算法。为了方便该算法的执行，我们定义了两个新的SIMD指令，并为它们的执行提出了一个互补的硬件结构。我们的结果表明，VSR排序在标量基线上的最大加速在14.9到20.6倍之间，在次优矢量化排序算法上的平均加速为3.4倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)

自引率

0.00%

发文量