{"title":"Generating Permutations Using Hash Tables","authors":"Oded Green, Corey J. Nolet, Joe Eaton","doi":"10.1109/HPEC55821.2022.9926387","DOIUrl":null,"url":null,"abstract":"Given a set of $N$ distinct values, the operation of shuffling those elements and creating a random order is used in a wide range range of applications, including (but not limited to) statistical analysis, machine learning, games, and bootstrapping. The operation of shuffling elements is equivalent to generating a random permutation and applying the permutation. For example, the random permutation of an input allows splitting into two or more subsets without bias. This operation is repeated for machine learning applications when both a train and test data set are needed. In this paper we describe a new method for creating random permutations that is scalable, efficient, and simple. We show that the operation of generating a random permutation shares traits with building a hashtable. Our method uses a fairly new hash table, called HashGraph, to generate the permutation. HashGraph's unique data-structure ensures easy generation and retrieval of the permutation. HashGraph is one of the fastest known hash-tables for the GPU and also outperforms many leading CPU hash-tables. We show the performance of our new permutation generation scheme using both Python and CUDA versions of HashGraph. Our CUDA implementation is roughly 10% faster than our Python implementation. In contrast to the shuffle operation in NVIDIA's Thrust and cuPy frameworks, our new permutation generation algorithm is 2.6 x and 1.73 x faster, respectively and up to 150 x faster than numPy.","PeriodicalId":200071,"journal":{"name":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"69 6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPEC55821.2022.9926387","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Given a set of $N$ distinct values, the operation of shuffling those elements and creating a random order is used in a wide range range of applications, including (but not limited to) statistical analysis, machine learning, games, and bootstrapping. The operation of shuffling elements is equivalent to generating a random permutation and applying the permutation. For example, the random permutation of an input allows splitting into two or more subsets without bias. This operation is repeated for machine learning applications when both a train and test data set are needed. In this paper we describe a new method for creating random permutations that is scalable, efficient, and simple. We show that the operation of generating a random permutation shares traits with building a hashtable. Our method uses a fairly new hash table, called HashGraph, to generate the permutation. HashGraph's unique data-structure ensures easy generation and retrieval of the permutation. HashGraph is one of the fastest known hash-tables for the GPU and also outperforms many leading CPU hash-tables. We show the performance of our new permutation generation scheme using both Python and CUDA versions of HashGraph. Our CUDA implementation is roughly 10% faster than our Python implementation. In contrast to the shuffle operation in NVIDIA's Thrust and cuPy frameworks, our new permutation generation algorithm is 2.6 x and 1.73 x faster, respectively and up to 150 x faster than numPy.