SIMD Instruction Set Extensions for Keccak with Applications to SHA-3, Keyak and Ketje

Hemendra K. Rawat, P. Schaumont
{"title":"SIMD Instruction Set Extensions for Keccak with Applications to SHA-3, Keyak and Ketje","authors":"Hemendra K. Rawat, P. Schaumont","doi":"10.1145/2948618.2948622","DOIUrl":null,"url":null,"abstract":"Recent processor architectures such as Intel Westmere (and later) and ARMv8 include instruction-level support for the Advanced Encryption Standard (AES), for the Secure Hashing Standard (SHA-1, SHA2) and for carry-less multiplication. These crypto-instruction sets provide specialized hardware processing at the top of the memory hierarchy, and provide significant performance improvements over general-purpose software for common cryptographic operations. We propose a crypto-instruction set for the Keccak cryptographic sponge and for the Keccak duplex construction. Our design is integrated on a 128 bit SIMD interface, applicable to the ARM NEON and Intel AVX (128 bit) architecture. The proposed instruction set is optimized for flexibility and supports multiple variants of the Keccak-f[b] permutation, for b equal to 200, 400, 800, or 1600 bit. We investigate the performance of the design using the GEM5 micro-architecture simulator. Compared to the latest hand-optimized results, we demonstrate a performance improvement of 2 times (over NEON programming) to 6 times (over Assembly programming). For example, an optimized NEON implementation of SHA3-512 computes a hash at 48.1 instructions per byte, while our design uses 21.9 instructions per byte. The NEON optimized version of the Lake Keyak AEAD uses 13.4 instructions per byte, while our design uses 7.7 instructions per byte. We provide comprehensive performance evaluation for multiple configurations of the Keccak-f[b] permutation in multiple applications (Hash, Encryption, AEAD). We also analyze the hardware cost of the proposed instructions in gate-equivalent of 90nm standard cells, and show that the proposed instructions only require 4658 GE, a fraction of the cost of a full ARM Cortex-A9.","PeriodicalId":141766,"journal":{"name":"Hardware and Architectural Support for Security and Privacy","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Hardware and Architectural Support for Security and Privacy","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2948618.2948622","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Recent processor architectures such as Intel Westmere (and later) and ARMv8 include instruction-level support for the Advanced Encryption Standard (AES), for the Secure Hashing Standard (SHA-1, SHA2) and for carry-less multiplication. These crypto-instruction sets provide specialized hardware processing at the top of the memory hierarchy, and provide significant performance improvements over general-purpose software for common cryptographic operations. We propose a crypto-instruction set for the Keccak cryptographic sponge and for the Keccak duplex construction. Our design is integrated on a 128 bit SIMD interface, applicable to the ARM NEON and Intel AVX (128 bit) architecture. The proposed instruction set is optimized for flexibility and supports multiple variants of the Keccak-f[b] permutation, for b equal to 200, 400, 800, or 1600 bit. We investigate the performance of the design using the GEM5 micro-architecture simulator. Compared to the latest hand-optimized results, we demonstrate a performance improvement of 2 times (over NEON programming) to 6 times (over Assembly programming). For example, an optimized NEON implementation of SHA3-512 computes a hash at 48.1 instructions per byte, while our design uses 21.9 instructions per byte. The NEON optimized version of the Lake Keyak AEAD uses 13.4 instructions per byte, while our design uses 7.7 instructions per byte. We provide comprehensive performance evaluation for multiple configurations of the Keccak-f[b] permutation in multiple applications (Hash, Encryption, AEAD). We also analyze the hardware cost of the proposed instructions in gate-equivalent of 90nm standard cells, and show that the proposed instructions only require 4658 GE, a fraction of the cost of a full ARM Cortex-A9.
SIMD指令集扩展为Keccak与应用程序到SHA-3, Keyak和Ketje
最近的处理器架构,如Intel Westmere(及以后的版本)和ARMv8,包括对高级加密标准(AES)、安全散列标准(SHA-1、SHA2)和无进位乘法的指令级支持。这些加密指令集在内存层次结构的顶层提供了专门的硬件处理,并且相对于通用的加密操作软件提供了显著的性能改进。我们提出了一种用于Keccak密码海绵和Keccak双工结构的密码指令集。我们的设计集成在一个128位SIMD接口上,适用于ARM NEON和Intel AVX(128位)架构。所提出的指令集针对灵活性进行了优化,并支持Keccak-f[b]排列的多种变体,其中b等于200,400,800或1600位。我们使用GEM5微架构模拟器来研究该设计的性能。与最新的手工优化结果相比,我们证明了性能提高了2倍(比NEON编程)到6倍(比汇编编程)。例如,SHA3-512的优化NEON实现以每字节48.1条指令计算哈希,而我们的设计使用每字节21.9条指令。Lake Keyak AEAD的NEON优化版本每字节使用13.4条指令,而我们的设计每字节使用7.7条指令。我们对多种应用(哈希、加密、AEAD)中Keccak-f[b]排列的多种配置进行了全面的性能评估。我们还分析了在栅极等效的90nm标准单元中所提出的指令的硬件成本,并表明所提出的指令仅需要4658 GE,是完整ARM Cortex-A9成本的一小部分。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信