Olivier Bronchain, M. Azouaoui, Mohamed ElGhamrawy, Joost Renes, Tobias Schneider
{"title":"Exploiting Small-Norm Polynomial Multiplication with Physical Attacks: Application to CRYSTALS-Dilithium","authors":"Olivier Bronchain, M. Azouaoui, Mohamed ElGhamrawy, Joost Renes, Tobias Schneider","doi":"10.46586/tches.v2024.i2.359-383","DOIUrl":"https://doi.org/10.46586/tches.v2024.i2.359-383","url":null,"abstract":"We present a set of physical profiled attacks against CRYSTALS-Dilithium that accumulate noisy knowledge on secret keys over multiple signatures, finally leading to a full key recovery attack. The methodology is composed of two steps. The first step consists of observing or inserting a bias in the posterior distribution of sensitive variables. The second step is an information processing phase which is based on belief propagation and effectively exploits that bias. The proposed concrete attacks rely on side-channel information, induced faults or possibly a combination of the two. Interestingly, the adversary benefits most from this previous knowledge when targeting the released signatures, however, the latter are not strictly necessary. We show that the combination of a physical attack with the binary knowledge of acceptance or rejection of a signature also leads to exploitable information on the secret key. Finally, we demonstrate that this approach is also effective against shuffled implementations of CRYSTALS-Dilithium.","PeriodicalId":508905,"journal":{"name":"IACR Cryptol. ePrint Arch.","volume":"47 6","pages":"1545"},"PeriodicalIF":0.0,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140249766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Darius Mercadier, Viet Sang Nguyen, Matthieu Rivain, A. Udovenko
{"title":"OBSCURE: Versatile Software Obfuscation from a Lightweight Secure Element","authors":"Darius Mercadier, Viet Sang Nguyen, Matthieu Rivain, A. Udovenko","doi":"10.46586/tches.v2024.i2.588-629","DOIUrl":"https://doi.org/10.46586/tches.v2024.i2.588-629","url":null,"abstract":"Software obfuscation is a powerful tool to protect the intellectual property or secret keys inside programs. Strong software obfuscation is crucial in the context of untrusted execution environments (e.g., subject to malware infection) or to face potentially malicious users trying to reverse-engineer a sensitive program. Unfortunately, the state-of-the-art of pure software-based obfuscation (including white-box cryptography) is either insecure or infeasible in practice.This work introduces OBSCURE, a versatile framework for practical and cryptographically strong software obfuscation relying on a simple stateless secure element (to be embedded, for example, in a protected hardware chip or a token). Based on the foundational result by Goyal et al. from TCC 2010, our scheme enjoys provable security guarantees, and further focuses on practical aspects, such as efficient execution of the obfuscated programs, while maintaining simplicity of the secure element. In particular, we propose a new rectangular universalization technique, which is also of independent interest. We provide an implementation of OBSCURE taking as input a program source code written in a subset of the C programming language. This ensures usability and a broad range of applications of our framework. We benchmark the obfuscation on simple software programs as well as on cryptographic primitives, hence highlighting the possible use cases of the framework as an alternative to pure software-based white-box implementations.","PeriodicalId":508905,"journal":{"name":"IACR Cryptol. ePrint Arch.","volume":"272 1","pages":"77"},"PeriodicalIF":0.0,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140250022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"eLIMInate: a Leakage-focused ISE for Masked Implementation","authors":"Hao Cheng, D. Page","doi":"10.46586/tches.v2024.i2.329-358","DOIUrl":"https://doi.org/10.46586/tches.v2024.i2.329-358","url":null,"abstract":"Even given a state-of-the-art masking scheme, masked software implementation of some cryptography functionality can pose significant challenges stemming, e.g., from simultaneous requirements for efficiency and security. In this paper we design an Instruction Set Extension (ISE) to address a specific element of said challenge, namely the elimination of leakage stemming from architectural and microarchitectural overwriting. Conceptually, the ISE allows a leakage-focused behavioural hint to be communicated from software to the micro-architecture: using it informs how computation is realised when applied to masking-specific data, which then offers an opportunity to eliminate associated leakage. We develop prototype, latencyand area-optimised implementations of the ISE design based on the RISC-V Ibex core. Using them, we demonstrate that use of the ISE can close the gap between assumptions about and actual behaviour of a device and thereby deliver an improved security guarantee.","PeriodicalId":508905,"journal":{"name":"IACR Cryptol. ePrint Arch.","volume":"57 3","pages":"966"},"PeriodicalIF":0.0,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140248471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Nibbling MAYO: Optimized Implementations for AVX2 and Cortex-M4","authors":"Ward Beullens, Fabio Campos, Sof´ıa Celi, Basil Hess, Matthias J. Kannwischer","doi":"10.46586/tches.v2024.i2.252-275","DOIUrl":"https://doi.org/10.46586/tches.v2024.i2.252-275","url":null,"abstract":"MAYO is a popular high-calorie condiment as well as an auspicious candidate in the ongoing NIST competition for additional post-quantum signature schemes achieving competitive signature and public key sizes. In this work, we present high-speed implementations of MAYO using the AVX2 and Armv7E-M instruction sets targeting recent x86 platforms and the Arm Cortex-M4. Moreover, the main contribution of our work is showing that MAYO can be even faster when switching from a bitsliced representation of keys to a nibble-sliced representation. While the bitsliced representation was primarily motivated by faster arithmetic on microcontrollers, we show that it is not necessary for achieving high performance on Cortex-M4. On Cortex-M4, we instead propose to implement the large matrix multiplications of MAYO using the Method of the Four Russians (M4R), which allows us to achieve better performance than when using the bitsliced approach. This results in up to 21% faster signing. For AVX2, the change in representation allows us to implement the arithmetic much faster using shuffle instructions. Signing takes up to 3.2x fewer cycles and key generation and verification enjoy similar speedups. This shows that MAYO is competitive with lattice-based signature schemes on x86 CPUs, and a factor of 2-6 slower than lattice-based signature schemes on Cortex-M4 (which can still be considered competitive).","PeriodicalId":508905,"journal":{"name":"IACR Cryptol. ePrint Arch.","volume":"27 2","pages":"1683"},"PeriodicalIF":0.0,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140250018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Deshpande, James Howe, Jakub Szefer, Dongze Yue
{"title":"SDitH in Hardware","authors":"S. Deshpande, James Howe, Jakub Szefer, Dongze Yue","doi":"10.46586/tches.v2024.i2.215-251","DOIUrl":"https://doi.org/10.46586/tches.v2024.i2.215-251","url":null,"abstract":"This work presents the first hardware realisation of the Syndrome-Decodingin-the-Head (SDitH) signature scheme, which is a candidate in the NIST PQC process for standardising post-quantum secure digital signature schemes. SDitH’s hardness is based on conservative code-based assumptions, and it uses the Multi-Party-Computation-in-the-Head (MPCitH) construction. This is the first hardware design of a code-based signature scheme based on traditional decoding problems and only the second for MPCitH constructions, after Picnic. This work presents optimised designs to achieve the best area efficiency, which we evaluate using the Time-Area Product (TAP) metric. This work also proposes a novel hardware architecture by dividing the signature generation algorithm into two phases, namely offline and online phases for optimising the overall clock cycle count. The hardware designs for key generation, signature generation, and signature verification are parameterised for all SDitH parameters, including the NIST security levels, both syndrome decoding base fields (GF256 and GF251), and thus conforms to the SDitH specifications. The hardware design further supports secret share splitting, and the hypercube optimisation which can be applied in this and multiple other NIST PQC candidates. The results of this work result in a hardware design with a drastic reducing in clock cycles compared to the optimised AVX2 software implementation, in the range of 2-4x for most operations. Our key generation outperforms software drastically, giving a 11-17x reduction in runtime, despite the significantly faster clock speed. On Artix 7 FPGAs we can perform key generation in 55.1 Kcycles, signature generation in 6.7 Mcycles, and signature verification in 8.6 Mcycles for NIST L1 parameters, which increase for GF251, and for L3 and L5 parameters.","PeriodicalId":508905,"journal":{"name":"IACR Cryptol. ePrint Arch.","volume":"98 4","pages":"69"},"PeriodicalIF":0.0,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140249566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiangxue Liu, Cankun Zhao, Shuohang Peng, Bohan Yang, Hang Zhao, Xiangdong Han, Min Zhu, Shaojun Wei, Leibo Liu
{"title":"A Low-Latency High-Order Arithmetic to Boolean Masking Conversion","authors":"Jiangxue Liu, Cankun Zhao, Shuohang Peng, Bohan Yang, Hang Zhao, Xiangdong Han, Min Zhu, Shaojun Wei, Leibo Liu","doi":"10.46586/tches.v2024.i2.630-653","DOIUrl":"https://doi.org/10.46586/tches.v2024.i2.630-653","url":null,"abstract":"Masking, an effective countermeasure against side-channel attacks, is commonly applied in modern cryptographic implementations. Considering cryptographic algorithms that utilize both Boolean and arithmetic masking, the conversion algorithm between arithmetic masking and Boolean masking is required. Conventional high-order arithmetic masking to Boolean masking conversion algorithms based on Boolean circuits suffer from performance overhead, especially in terms of hardware implementation. In this work, we analyze high latency for the conversion and propose an improved high-order A2B conversion algorithm. For the conversion of 16-bit variables, the hardware latency can be reduced by 47% in the best scenario. For the case study of second-order 32-bit conversion, the implementation results show that the improved scheme reduces the clock cycle latency by 42% in hardware and achieves a 30% speed performance improvement in software. Theoretically, a security proof of arbitrary order is provided for the proposed high-order A2B conversion. Experimental validations are performed to verify the second-order DPA resistance of second-order implementation. The Test Vector Leakage Assessment does not observe side-channel leakage for hardware and software implementations.","PeriodicalId":508905,"journal":{"name":"IACR Cryptol. ePrint Arch.","volume":"51 4","pages":"45"},"PeriodicalIF":0.0,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140249878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tian Zhou, Fangyu Zheng, Guang Fan, Lipeng Wan, Wenxu Tang, Yixuan Song, Yi Bian, Jingqiang Lin
{"title":"ConvKyber: Unleashing the Power of AI Accelerators for Faster Kyber with Novel Iteration-based Approaches","authors":"Tian Zhou, Fangyu Zheng, Guang Fan, Lipeng Wan, Wenxu Tang, Yixuan Song, Yi Bian, Jingqiang Lin","doi":"10.46586/tches.v2024.i2.25-63","DOIUrl":"https://doi.org/10.46586/tches.v2024.i2.25-63","url":null,"abstract":"The remarkable performance capabilities of AI accelerators offer promising opportunities for accelerating cryptographic algorithms, particularly in the context of lattice-based cryptography. However, current approaches to leveraging AI accelerators often remain at a rudimentary level of implementation, overlooking the intricate internal mechanisms of these devices. Consequently, a significant number of computational resources is underutilized.In this paper, we present a comprehensive exploration of NVIDIA Tensor Cores and introduce a novel framework tailored specifically for Kyber. Firstly, we propose two innovative approaches that efficiently break down Kyber’s NTT into iterative matrix multiplications, resulting in approximately a 75% reduction in costs compared to the state-of-the-art scanning-based methods. Secondly, by reversing the internal mechanisms, we precisely manipulate the internal resources of Tensor Cores using assembly-level code instead of inefficient standard interfaces, eliminating memory accesses and redundant function calls. Finally, building upon our highly optimized NTT, we provide a complete implementation for all parameter sets of Kyber. Our implementation surpasses the state-of-the-art Tensor Core based work, achieving remarkable speed-ups of 1.93x, 1.65x, 1.22x and 3.55x for polyvec_ntt, KeyGen, Enc and Dec in Kyber-1024, respectively. Even when considering execution latency, our throughput-oriented full Kyber implementation maintains an acceptable execution latency. For instance, the execution latency ranges from 1.02 to 5.68 milliseconds for Kyber-1024 on R3080 when achieving the peak throughput.","PeriodicalId":508905,"journal":{"name":"IACR Cryptol. ePrint Arch.","volume":"32 6","pages":"95"},"PeriodicalIF":0.0,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140250342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gora Adj, Stefano Barbero, Emanuele Bellini, Andre Esser, Luis Rivera-Zamarripa, Carlo Sanna, Javier A. Verbel, Floyd Zweydinger
{"title":"MiRitH: Efficient Post-Quantum Signatures from MinRank in the Head","authors":"Gora Adj, Stefano Barbero, Emanuele Bellini, Andre Esser, Luis Rivera-Zamarripa, Carlo Sanna, Javier A. Verbel, Floyd Zweydinger","doi":"10.46586/tches.v2024.i2.304-328","DOIUrl":"https://doi.org/10.46586/tches.v2024.i2.304-328","url":null,"abstract":"Since 2016’s NIST call for standardization of post-quantum cryptographic primitives, developing efficient post-quantum secure digital signature schemes has become a highly active area of research. The difficulty in constructing such schemes is evidenced by NIST reopening the call in 2022 for digital signature schemes, because of missing diversity in existing proposals. In this work, we introduce the new postquantum digital signature scheme MiRitH. As direct successor of a scheme recently developed by Adj, Rivera-Zamarripa and Verbel (Africacrypt ’23), it is based on the hardness of the MinRank problem and follows the MPC-in-the-Head paradigm. We revisit the initial proposal, incorporate design-level improvements and provide more efficient parameter sets. We also provide the missing justification for the quantum security of all parameter sets following NIST metrics. In this context we design a novel Grover-amplified quantum search algorithm for solving the MinRank problem that outperforms a naive quantum brute-force search for the solution.MiRitH obtains signatures of size 5.7 kB for NIST category I security and therefore competes for the smallest signatures among any post-quantum signature following the MPCitH paradigm.At the same time MiRitH offers competitive signing and verification timings compared to the state of the art. To substantiate those claims we provide extensive implementations. This includes a reference implementation as well as optimized constant-time implementations for Intel processors (AVX2), and for the ARM (NEON) architecture. The speedup of our optimized AVX2 implementation relies mostly on a redesign of the finite field arithmetic, improving over existing implementations as well as an improved memory management.","PeriodicalId":508905,"journal":{"name":"IACR Cryptol. ePrint Arch.","volume":"5 3","pages":"1666"},"PeriodicalIF":0.0,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140251025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Compact Circuits for Efficient Mobius Transform","authors":"S. Banik, F. Regazzoni","doi":"10.46586/tches.v2024.i2.481-521","DOIUrl":"https://doi.org/10.46586/tches.v2024.i2.481-521","url":null,"abstract":"The Möbius transform is a linear circuit used to compute the evaluations of a Boolean function over all points on its input domain. The operation is very useful in finding the solution of a system of polynomial equations over GF(2) for obvious reasons. However the operation, although linear, needs exponential number of logic operations (around n · 2n−1 bit xors) for an n-variable Boolean function. As such, the only known hardware circuit to efficiently compute the Möbius Transform requires silicon area that is exponential in n. For Boolean functions whose algebraic degree is bound by some parameter d, recursive definitions of the Möbius Transform exist that requires only O(nd+1) space in software. However converting the mathematical definition of this space-efficient algorithm into a hardware architecture is a non-trivial task, primarily because the recursion calls notionally lead to a depth-first search in a transition graph that requires context switches at each recursion call for which straightforward mapping to hardware is difficult. In this paper we look to overcome these very challenges in an engineering sense. We propose a space efficient sequential hardware circuit for the Möbius Transform that requires only polynomial circuit area (i.e. O(nd+1)) provided the algebraic degree of the Boolean function is limited to d. We show how this circuit can be used as a component to efficiently solve polynomial equations of degree at most d by using fast exhaustive search. We propose three different circuit architectures for this, each of which uses the Möbius Transform circuit as a core component. We show that asymptotically, all the solutions of a system of m polynomials in n unknowns and algebraic degree d over GF(2) can be found using a circuit of silicon area proportional to m · nd+1 and circuit depth proportional to 2 · log2(n − d).In the second part of the paper we introduce a fourth hardware solver that additionally aims to achieve energy efficiency. The main idea is to reduce the solution space to a small enough value by parallel application of Möbius Transform circuits over the first few equations of the system. This is done so that one can check individually whether the vectors of this reduced solution space satisfy each of the remaining equations of the system using lower power consumption. The new circuit has area also bound by m · nd+1 and has circuit depth proportional to d · log2 n. We also show that further optimizations with respect to energy consumption may be obtained by using depth-bound Möbius circuits that exponentially decrease run time at the cost of additional logic area and depth.","PeriodicalId":508905,"journal":{"name":"IACR Cryptol. ePrint Arch.","volume":"20 4","pages":"948"},"PeriodicalIF":0.0,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140248326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Ravi, Thales Paiva, Dirmanto Jap, Jan-Pieter D'Anvers, S. Bhasin
{"title":"Defeating Low-Cost Countermeasures against Side-Channel Attacks in Lattice-based Encryption - A Case Study on Crystals-Kyber","authors":"P. Ravi, Thales Paiva, Dirmanto Jap, Jan-Pieter D'Anvers, S. Bhasin","doi":"10.46586/tches.v2024.i2.795-818","DOIUrl":"https://doi.org/10.46586/tches.v2024.i2.795-818","url":null,"abstract":"In an effort to circumvent the high cost of standard countermeasures against side-channel attacks in post-quantum cryptography, some works have developed low-cost detection-based countermeasures. These countermeasures try to detect maliciously generated input ciphertexts and react to them by discarding the ciphertext or secret key. In this work, we take a look at two previously proposed low-cost countermeasures: the ciphertext sanity check and the decapsulation failure check, and demonstrate successful attacks on these schemes. We show that the first countermeasure can be broken with little to no overhead, while the second countermeasure requires a more elaborate attack strategy that relies on valid chosen ciphertexts. Thus, in this work, we propose the first chosen-ciphertext based side-channel attack that only relies on valid ciphertexts for key recovery. As part of this attack, a third contribution of our paper is an improved solver that retrieves the secret key from linear inequalities constructed using side-channel leakage from the decryption procedure. Our solver is an improvement over the state-of-the-art Belief Propagation solvers by Pessl and Prokop, and later Delvaux. Our method is simpler, easier to understand and has lower computational complexity, while needing less than half the inequalities compared to previous methods.","PeriodicalId":508905,"journal":{"name":"IACR Cryptol. ePrint Arch.","volume":"99 2","pages":"1627"},"PeriodicalIF":0.0,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140250555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}