F. Bernard, A. Garay, Patrick Haddad, Nathalie Bochard, Viktor Fischer
{"title":"Low Cost and Precise Jitter Measurement Method for TRNG Entropy Assessment","authors":"F. Bernard, A. Garay, Patrick Haddad, Nathalie Bochard, Viktor Fischer","doi":"10.46586/tches.v2024.i1.207-228","DOIUrl":"https://doi.org/10.46586/tches.v2024.i1.207-228","url":null,"abstract":"Random number generators and specifically true random number generators (TRNGs) are essential in cryptography. TRNGs implemented in logic devices usually exploit the time instability of clock signals generated in freely running oscillators as source of randomness. To assess the performance and quality of oscillator-based TRNGs, accurate measurement of clock jitter originating from thermal noise is of paramount importance. We propose a novel jitter measurement method, in which the required jitter accumulation time can be reduced to around 100 reference clock periods. Reduction of the jitter accumulation time reduces the impact of the flicker noise on the measured jitter and increases the precision of the estimated contribution of thermal noise. In addition, the method can be easily embedded in logic devices. The fact that the jitter measurement can be placed in the same device as the TRNG is important since it can be used as a basis for efficient embedded statistical tests. In contrast to other methods, we propose a thorough theoretical analysis of the measurement error. This makes it possible to tune the parameters of the method to guarantee a relative error smaller than 12% even in the worst cases.","PeriodicalId":321490,"journal":{"name":"IACR Transactions on Cryptographic Hardware and Embedded Systems","volume":"73 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138604629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Faster Bootstrapping via Modulus Raising and Composite NTT","authors":"Zhihao Li, Ying Liu, Xianhui Lu, Ruida Wang, Benqiang Wei, Chunling Chen, Kunpeng Wang","doi":"10.46586/tches.v2024.i1.563-591","DOIUrl":"https://doi.org/10.46586/tches.v2024.i1.563-591","url":null,"abstract":"FHEW-like schemes utilize exact gadget decomposition to reduce error growth and ensure that the bootstrapping incurs only polynomial error growth. However, the exact gadget decomposition method requires higher computation complexity and larger memory storage. In this paper, we improve the efficiency of the FHEWlike schemes by utilizing the composite NTT that performs the Number Theoretic Transform (NTT) with a composite modulus. Specifically, based on the composite NTT, we integrate modulus raising and gadget decomposition in the external product, which reduces the number of NTTs required in the blind rotation from 2(dg + 1)n to 2(⌈dg⌉/2 + 1)n. Furthermore, we develop a modulus packing technique that uses the Chinese Remainder Theorem (CRT) and composite NTT to bootstrap multiple LWE ciphertexts through one blind rotation process.We implement the bootstrapping algorithms and evaluate the performance on various benchmark computations using both binary and ternary secret keys. Our results of the single bootstrapping process indicate that the proposed approach achieves speedups of up to 1.7 x, and reduces the size of the blind rotation key by 50% under specific parameters. Finally, we instantiate two ciphertexts in the packing procedure, and the experimental results show that our technique is around 1.5 x faster than the two bootstrapping processes under the 127-bit security level.","PeriodicalId":321490,"journal":{"name":"IACR Transactions on Cryptographic Hardware and Embedded Systems","volume":"27 8","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138601639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vedad Hadžić, Gaëtan Cassiers, R. Primas, Stefan Mangar, Roderick Bloem
{"title":"Quantile: Quantifying Information Leakage","authors":"Vedad Hadžić, Gaëtan Cassiers, R. Primas, Stefan Mangar, Roderick Bloem","doi":"10.46586/tches.v2024.i1.433-456","DOIUrl":"https://doi.org/10.46586/tches.v2024.i1.433-456","url":null,"abstract":"The masking countermeasure is very effective against side-channel attacks such as differential power analysis. However, the design of masked circuits is a challenging problem since one has to ensure security while minimizing performance overheads. The security of masking is often studied in the t-probing model, and multiple formal verification tools can verify this notion. However, these tools generally cannot verify large masked computations due to computational complexity.We introduce a new verification tool named Quantile, which performs randomized simulations of the masked circuit in order to bound the mutual information between the leakage and the secret variables. Our approach ensures good scalability with the circuit size and results in proven statistical security bounds. Further, our bounds are quantitative and, therefore, more nuanced than t-probing security claims: by bounding the amount of information contained in the lower-order leakage, Quantile can evaluate the security provided by masking even when they are not 1-probing secure, i.e., when they are classically considered as insecure. As an example, we apply Quantile to masked circuits of Prince and AES, where randomness is aggressively reused.","PeriodicalId":321490,"journal":{"name":"IACR Transactions on Cryptographic Hardware and Embedded Systems","volume":"42 17","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138602086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Georg Land, Adrian Marotzke, Jan Richter-Brockmann, T. Güneysu
{"title":"Gadget-based Masking of Streamlined NTRU Prime Decapsulation in Hardware","authors":"Georg Land, Adrian Marotzke, Jan Richter-Brockmann, T. Güneysu","doi":"10.46586/tches.v2024.i1.1-26","DOIUrl":"https://doi.org/10.46586/tches.v2024.i1.1-26","url":null,"abstract":"Streamlined NTRU Prime is a lattice-based Key Encapsulation Mechanism (KEM) that is, together with X25519, the default algorithm in OpenSSH 9. Based on lattice assumptions, it is assumed to be secure also against attackers with access to< large-scale quantum computers. While Post-Quantum Cryptography (PQC) schemes have been subject to extensive research in recent years, challenges remain with respect to protection mechanisms against attackers that have additional side-channel information, such as the power consumption of a device processing secret data. As a countermeasure to such attacks, masking has been shown to be a promising and effective approach. For public-key schemes, including any recent PQC schemes, usually, a mixture of Boolean and arithmetic techniques is applied on an algorithmic level. Our generic hardware implementation of Streamlined NTRU Prime decapsulation, however, follows an idea that until now was assumed to be solely applicable efficiently to symmetric cryptography: gadget-based masking. The hardware design is transformed into a secure implementation by replacing each gate with a composable secure gadget that operates on uniform random shares of secret values. In our work, we show the feasibility of applying this approach also to PQC schemes and present the first Public-Key Cryptography (PKC) – pre- and post-quantum – implementation masked with the gadget-based approach considering several trade-offs and design choices. By the nature of gadget-based masking, the implementation can be instantiated at arbitrary masking order. We synthesize our implementation both for Artix-7 Field-Programmable Gate Arrays (FPGAs) and 45nm Application-Specific Integrated Circuits (ASICs), yielding practically feasible results regarding the area, randomness requirement, and latency. We verify the side-channel security of our implementation using formal verification on the one hand, and practically using Test Vector Leakage Assessment (TVLA) on the other. Finally, we also analyze the applicability of our concept to Kyber and Dilithium, which will be standardized by the National Institute of Standards and Technology (NIST).","PeriodicalId":321490,"journal":{"name":"IACR Transactions on Cryptographic Hardware and Embedded Systems","volume":"31 15","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138602480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aminudeen Abdulrahman, Hanno Becker, Matthias J. Kannwischer, Fabien Klein
{"title":"Fast and Clean: Auditable high-performance assembly via constraint solving","authors":"Aminudeen Abdulrahman, Hanno Becker, Matthias J. Kannwischer, Fabien Klein","doi":"10.46586/tches.v2024.i1.87-132","DOIUrl":"https://doi.org/10.46586/tches.v2024.i1.87-132","url":null,"abstract":"Handwritten assembly is a widely used tool in the development of highperformance cryptography: By providing full control over instruction selection, instruction scheduling, and register allocation, highest performance can be unlocked. On the flip side, developing handwritten assembly is not only time-consuming, but the artifacts produced also tend to be difficult to review and maintain – threatening their suitability for use in practice.In this work, we present SLOTHY (Super (Lazy) Optimization of Tricky Handwritten assemblY), a framework for the automated superoptimization of assembly with respect to instruction scheduling, register allocation, and loop optimization (software pipelining): With SLOTHY, the developer controls and focuses on algorithm and instruction selection, providing a readable “base” implementation in assembly, while SLOTHY automatically finds optimal and traceable instruction scheduling and register allocation strategies with respect to a model of the target (micro)architecture.We demonstrate the flexibility of SLOTHY by instantiating it with models of the Cortex-M55, Cortex-M85, Cortex-A55 and Cortex-A72microarchitectures, implementing the Armv8.1-M+Helium and AArch64+Neon architectures. We use the resulting tools to optimize three workloads: First, for Cortex-M55 and Cortex-M85, a radix-4 complex Fast Fourier Transform (FFT) in fixed-point and floating-point arithmetic, fundamental in Digital Signal Processing. Second, on Cortex-M55, Cortex-M85, Cortex-A55 and Cortex-A72, the instances of the Number Theoretic Transform (NTT) underlying CRYSTALS-Kyber and CRYSTALS-Dilithium, two recently announced winners of the NIST Post-Quantum Cryptography standardization project. Third, for Cortex-A55, the scalar multiplication for the elliptic curve key exchange X25519. The SLOTHY-optimized code matches or beats the performance of prior art in all cases, while maintaining compactness and readability.","PeriodicalId":321490,"journal":{"name":"IACR Transactions on Cryptographic Hardware and Embedded Systems","volume":"14 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138602870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CalyPSO: An Enhanced Search Optimization based Framework to Model Delay-based PUFs","authors":"Nimish Mishra, Kuheli Pratihar, Satota Mandal, Anirban Chakraborty, Ulrich Rührmair, Debdeep Mukhopadhyay","doi":"10.46586/tches.v2024.i1.501-526","DOIUrl":"https://doi.org/10.46586/tches.v2024.i1.501-526","url":null,"abstract":"Delay-based Physically Unclonable Functions (PUFs) are a popular choice for “keyless” cryptography in low-power devices. However, they have been subjected to modeling attacks using Machine Learning (ML) approaches, leading to improved PUF designs that resist ML-based attacks. On the contrary, evolutionary search (ES) based modeling approaches have garnered little attention compared to their ML counterparts due to their limited success. In this work, we revisit the problem of modeling delaybased PUFs using ES algorithms and identify drawbacks in present state-of-the-art genetic algorithms (GA) when applied to PUFs. This leads to the design of a new ES-based algorithm called CalyPSO, inspired by Particle Swarm Optimization (PSO) techniques, which is fundamentally different from classic genetic algorithm design rationale. This allows CalyPSO to avoid the pitfalls of textbook GA and mount successful modeling attacks on a variety of delay-based PUFs, including k-XOR APUF variants. Empirically, we show attacks for the parameter choices of k as high as 20, for which there are no reported ML or ES-based attacks without exploiting additional information like reliability or power/timing side-channels. We further show that CalyPSO can invade PUF designs like interpose-PUFs (i-PUFs) and (previously unattacked) LP-PUFs, which attempt to enhance ML robustness by obfuscating the input challenge. Furthermore, we evolve CalyPSO to CalyPSO++ by observing that the PUF compositions do not alter the input challenge dimensions, allowing the attacker to investigate cross-architecture modeling. This allows us to model a k-XOR APUF using a (k − 1)-XOR APUF as well as perform cross-architectural modeling of BRPUF and i-PUF using k-XOR APUF variants. CalyPSO++ provides the first modeling attack on 4 LP-PUF by reducing it to a 4-XOR APUF. Finally, we demonstrate the potency of CalyPSO and CalyPSO++ by successfully modeling various PUF architectures on noisy simulations as well as real-world hardware implementations.","PeriodicalId":321490,"journal":{"name":"IACR Transactions on Cryptographic Hardware and Embedded Systems","volume":"60 16","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138605107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Barbara Gigerl, Franz Klug, Stefan Mangard, Florian Mendel, R. Primas
{"title":"Smooth Passage with the Guards: Second-Order Hardware Masking of the AES with Low Randomness and Low Latency","authors":"Barbara Gigerl, Franz Klug, Stefan Mangard, Florian Mendel, R. Primas","doi":"10.46586/tches.v2024.i1.309-335","DOIUrl":"https://doi.org/10.46586/tches.v2024.i1.309-335","url":null,"abstract":"Cryptographic devices in hostile environments can be vulnerable to physical attacks such as power analysis. Masking is a popular countermeasure against such attacks, which works by splitting every sensitive variable into d+1 randomized shares. The implementation cost of the masking countermeasure in hardware increases significantly with the masking order d, and protecting designs often results in a large overhead. One of the main drivers of the cost is the required amount of fresh randomness for masking the non-linear parts of a cipher. In the case of AES, first-order designs have been built without the need for any fresh randomness, but state-of-the-art higher-order designs still require a significant number of random bits per encryption. Attempts to reduce the randomness however often result in a considerable latency overhead, which is not favorable in practice. This raises the need for AES designs offering a decent performance tradeoff, which are efficient both in terms of required randomness and latency.In this work, we present a second-order AES design with the minimal number of three shares, requiring only 3 200 random bits per encryption at a latency of 5 cycles per round. Our design represents a significant improvement compared to state-of-the-art designs that require more randomness and/or have a higher latency. The core of the design is an optimized 5-cycle AES S-box which needs 78 bits of fresh randomness. We use this S-box to construct a round-based AES design, for which we present a concept for sharing randomness across the S-boxes based on the changing of the guards (COTG) technique. We assess the security of our design in the probing model using a formal verification tool. Furthermore, we evaluate the practical side-channel resistance on an FPGA.","PeriodicalId":321490,"journal":{"name":"IACR Transactions on Cryptographic Hardware and Embedded Systems","volume":"9 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138603102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"1LUTSensor: Detecting FPGA Voltage Fluctuations using LookUp Tables","authors":"Darshana Jayasinghe, B. Udugama, Sri Parameswaran","doi":"10.46586/tches.v2024.i1.51-86","DOIUrl":"https://doi.org/10.46586/tches.v2024.i1.51-86","url":null,"abstract":"Remote Power Analysis (RPA) attacks use transient voltage fluctuation side channels detected via delay sensors/ on-chip voltage sensors to reveal secret keys from cryptographic circuits. The state-of-the-art research proposed five on-chip voltage sensors for Field Programmable Gate Arrays (FPGAs). This paper proposes a novel on-chip voltage sensor, 1LUTSensor, which uses FPGA LookUp Table (LUT) structure to deduce voltage fluctuations. 1LUTSensor uses LUT multiplexers to create a run-time adjustable delay line to detect voltage fluctuations and uses dedicated paths which are fabricated signal connections and cannot be changed in the FPGA LUT to form the delay line. 1LUTSensor uses only a single LUT and a single flip-flop for the delay line to sense voltage fluctuations and uses a single tapped delay element for calibration. The output of the 1LUTSensor is a single bit. Compared to the state-of-the-art on-chip sensors, 1LUTSensor proposed in this paper is the smallest and fastest on-chip voltage sensor proposed thus far. 1LUTSensor is at least 3x smaller than the smallest on-chip sensor proposed in the literature. Compared to the state-of-the-art, the proposed 1LUTSensor can be operated at 600MHz. 1LUTSensor is evaluated using RPA attacks, and a complete secret key of an AES circuit can be extracted within 100,000 traces.","PeriodicalId":321490,"journal":{"name":"IACR Transactions on Cryptographic Hardware and Embedded Systems","volume":"78 14","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138604644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Carlet, A. Daif, Sylvain Guilley, Cédric Tavernier
{"title":"Quasi-linear masking against SCA and FIA, with cost amortization","authors":"C. Carlet, A. Daif, Sylvain Guilley, Cédric Tavernier","doi":"10.46586/tches.v2024.i1.398-432","DOIUrl":"https://doi.org/10.46586/tches.v2024.i1.398-432","url":null,"abstract":"The implementation of cryptographic algorithms must be protected against physical attacks. Side-channel and fault injection analyses are two prominent such implementation-level attacks. Protections against either do exist. Against sidechannel attacks, they are characterized by SNI security orders: the higher the order, the more difficult the attack.In this paper, we leverage fast discrete Fourier transform to reduce the complexity of high-order masking. The security paradigm is that of code-based masking. Coding theory is amenable both to mask material at a prescribed order, by mixing the information, and to detect and/or correct errors purposely injected by an attacker. For the first time, we show that quasi-linear masking (pioneered by Goudarzi, Joux and Rivain at ASIACRYPT 2018) can be achieved alongside with cost amortisation. This technique consists in masking several symbols/bytes with the same masking material, therefore improving the efficiency of the masking. We provide a security proof, leveraging both coding and probing security arguments. Regarding fault detection, our masking is capable of detecting up to d faults, where 2d + 1 is the length of the code, at any place of the algorithm, including within gadgets. In addition to the theory, that makes use of the Frobenius Additive Fast Fourier Transform, we show performance results, in a C language implementation, which confirms in practice that the complexity is quasi-linear in the code length.","PeriodicalId":321490,"journal":{"name":"IACR Transactions on Cryptographic Hardware and Embedded Systems","volume":"5 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138603865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PMFault: Faulting and Bricking Server CPUs through Management Interfaces","authors":"Zitai Chen, David Oswald","doi":"10.46586/tches.v2023.i2.1-23","DOIUrl":"https://doi.org/10.46586/tches.v2023.i2.1-23","url":null,"abstract":"Apart from the actual CPU, modern server motherboards contain other auxiliary components, for example voltage regulators for power management. Those are connected to the CPU and the separate Baseboard Management Controller (BMC) via the I2C-based PMBus. In this paper, using the case study of the widely used Supermicro X11SSL motherboard, we show how remotely exploitable software weaknesses in the BMC (or other processors with PMBus access) can be used to access the PMBus and then perform hardware-based fault injection attacks on the main CPU. The underlying weaknesses include insecure firmware encryption and signing mechanisms, a lack of authentication for the firmware upgrade process and the IPMI KCS control interface, as well as the motherboard design (with the PMBus connected to the BMC and SMBus by default). First, we show that undervolting through the PMBus allows breaking the integrity guarantees of SGX enclaves, bypassing Intel’s countermeasures against previous undervolting attacks like Plundervolt/V0ltPwn. Second, we experimentally show that overvolting outside the specified range has the potential of permanently damaging Intel Xeon CPUs, rendering the server inoperable. We assess the impact of our findings on other server motherboards made by Supermicro and ASRock. Our attacks, dubbed PMFault, can be carried out by a privileged software adversary and do not require physical access to the server motherboard or knowledge of the BMC login credentials. We responsibly disclosed the issues reported in this paper to Supermicro and discuss possible countermeasures at different levels. To the best of our knowledge, the 12th generation of Supermicro motherboards, which was designed before we reported PMFault to Supermicro, is not vulnerable.","PeriodicalId":321490,"journal":{"name":"IACR Transactions on Cryptographic Hardware and Embedded Systems","volume":"644 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135036111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}