{"title":"Enhancing the Scalability and Memory Usage of Hashsieve on Multi-core CPUs","authors":"Artur Mariano, C. Bischof","doi":"10.1109/PDP.2016.31","DOIUrl":null,"url":null,"abstract":"The Shortest Vector Problem (SVP) is a key problem in lattice-based cryptography and cryptanalysis. While the cryptography community has accumulated a vast knowledge of SVP-solvers from a theoretical standpoint, the practical performance of these algorithms is commonly not well understood. This gap in knowledge poses many challenges to cryptographers, who are oftentimes confronted with algorithms that perform worse in practice then expected from theory. This is a problem because the asymptotic complexity of the best algorithms plays a key role in the construction of cryptosystems, but only practically appealing, validated algorithms are accounted for in this process. Thus, if one cannot extract the full potential of theoretically strong algorithms in practice, efficient algorithms might be ruled out and wrong assumptions are made when constructing cryptosystems. In this paper, we take a step forward to fill this gap, by providing a computational analysis of HashSieve, the most practical sieving SVP-solver to date, and showing how its performance can be enhanced in practice. To this end, we revisit the parallel generation of random numbers, memory allocation and memory access patterns. Employing scalable random sampling, object memory pools, scalable memory allocators and aggressive memory prefetching, we were able to improve the best current implementation of HashSieve by factors of 3x and 4x, depending on the lattice dimension, and set new records for the HashSieve algorithm, thereby shrinking the gap between its theoretical complexity and its performance in practice.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDP.2016.31","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13
Abstract
The Shortest Vector Problem (SVP) is a key problem in lattice-based cryptography and cryptanalysis. While the cryptography community has accumulated a vast knowledge of SVP-solvers from a theoretical standpoint, the practical performance of these algorithms is commonly not well understood. This gap in knowledge poses many challenges to cryptographers, who are oftentimes confronted with algorithms that perform worse in practice then expected from theory. This is a problem because the asymptotic complexity of the best algorithms plays a key role in the construction of cryptosystems, but only practically appealing, validated algorithms are accounted for in this process. Thus, if one cannot extract the full potential of theoretically strong algorithms in practice, efficient algorithms might be ruled out and wrong assumptions are made when constructing cryptosystems. In this paper, we take a step forward to fill this gap, by providing a computational analysis of HashSieve, the most practical sieving SVP-solver to date, and showing how its performance can be enhanced in practice. To this end, we revisit the parallel generation of random numbers, memory allocation and memory access patterns. Employing scalable random sampling, object memory pools, scalable memory allocators and aggressive memory prefetching, we were able to improve the best current implementation of HashSieve by factors of 3x and 4x, depending on the lattice dimension, and set new records for the HashSieve algorithm, thereby shrinking the gap between its theoretical complexity and its performance in practice.