IACR Transactions on Cryptographic Hardware and Embedded Systems最新文献_第3页

Revisiting Keccak and Dilithium Implementations on ARMv7-M 重新审视 ARMv7-M 上的 Keccak 和 Dilithium 实现

IACR Transactions on Cryptographic Hardware and Embedded Systems Pub Date : 2024-03-12 DOI: 10.46586/tches.v2024.i2.1-24

Junhao Huang, Alexandre Adomnicăi, Jipeng Zhang, Wangchen Dai, Yao Liu, Ray C. C. Cheung, Ç. Koç, Donglong Chen

{"title":"Revisiting Keccak and Dilithium Implementations on ARMv7-M","authors":"Junhao Huang, Alexandre Adomnicăi, Jipeng Zhang, Wangchen Dai, Yao Liu, Ray C. C. Cheung, Ç. Koç, Donglong Chen","doi":"10.46586/tches.v2024.i2.1-24","DOIUrl":"https://doi.org/10.46586/tches.v2024.i2.1-24","url":null,"abstract":"Keccak is widely used in lattice-based cryptography (LBC) and its impact to the overall running time in LBC scheme can be predominant on platforms lacking dedicated SHA-3 instructions. This holds true on embedded devices for Kyber and Dilithium, two LBC schemes selected by NIST to be standardized as quantumsafe cryptographic algorithms. While extensive work has been done to optimize the polynomial arithmetic in these schemes, it was generally assumed that Keccak implementations were already optimal and left little room for enhancement.In this paper, we revisit various optimization techniques for both Keccak and Dilithium on two ARMv7-M processors, i.e., Cortex-M3 and M4. For Keccak, we improve its efficiency using two architecture-specific optimizations, namely lazy rotation and memory access pipelining, on ARMv7-M processors. These optimizations yield performance gains of up to 24.78% and 21.4% for the largest Keccak permutation instance on Cortex-M3 and M4, respectively. As for Dilithium, we first apply the multi-moduli NTT for the small polynomial multiplication cti on Cortex-M3. Then, we thoroughly integrate the efficient Plantard arithmetic to the 16-bit NTTs for computing the small polynomial multiplications csi and cti on Cortex-M3 and M4. We show that the multi-moduli NTT combined with the efficient Plantard arithmetic could obtain significant speed-ups for the small polynomial multiplications of Dilithium on Cortex-M3. Combining all the aforementioned optimizations for both Keccak and Dilithium, we obtain 15.44% ∼ 23.75% and 13.94% ∼ 15.52% speed-ups for Dilithium on Cortex-M3 and M4, respectively. Furthermore, we also demonstrate that the Keccak optimizations yield 13.35% to 15.00% speed-ups for Kyber, and our Keccak optimizations decrease the proportion of time spent on hashing in Dilithium and Kyber by 2.46% ∼ 5.03% on Cortex-M4.","PeriodicalId":321490,"journal":{"name":"IACR Transactions on Cryptographic Hardware and Embedded Systems","volume":"6 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140248906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Masking Floating-Point Number Multiplication and Addition of Falcon 屏蔽浮点数乘法和猎鹰加法

IACR Transactions on Cryptographic Hardware and Embedded Systems Pub Date : 2024-03-12 DOI: 10.46586/tches.v2024.i2.276-303

Keng-Yu Chen, Jiun-Peng Chen

{"title":"Masking Floating-Point Number Multiplication and Addition of Falcon","authors":"Keng-Yu Chen, Jiun-Peng Chen","doi":"10.46586/tches.v2024.i2.276-303","DOIUrl":"https://doi.org/10.46586/tches.v2024.i2.276-303","url":null,"abstract":"In this paper, we provide the first masking scheme for floating-point number multiplication and addition to defend against recent side-channel attacks on Falcon’s pre-image vector computation. Our approach involves a masked nonzero check gadget that securely identifies whether a shared value is zero. This gadget can be utilized for various computations such as rounding the mantissa, computing the sticky bit, checking the equality of two values, and normalizing a number. To support the masked floating-point number addition, we also developed a masked shift and a masked normalization gadget. Our masking design provides both first- and higherorder mask protection, and we demonstrate the theoretical security by proving the (Strong)-Non-Interference properties in the probing model. To evaluate the performance of our approach, we implemented unmasked, first-order, and second-order algorithms on an Arm Cortex-M4 processor, providing cycle counts and the number of random bytes used. We also report the time for one complete signing process with our countermeasure on an Intel-Core CPU. In addition, we assessed the practical security of our approach by conducting the test vector leakage assessment (TVLA) to validate the effectiveness of our protection. Specifically, our TVLA experiment results for second-order masking passed the test in 100,000 measured traces.","PeriodicalId":321490,"journal":{"name":"IACR Transactions on Cryptographic Hardware and Embedded Systems","volume":"44 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140250243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CASA: A Compact and Scalable Accelerator for Approximate Homomorphic Encryption CASA：近似同态加密的紧凑型可扩展加速器

IACR Transactions on Cryptographic Hardware and Embedded Systems Pub Date : 2024-03-12 DOI: 10.46586/tches.v2024.i2.451-480

Pengzhou He, Samira Carolina Oliva Madrigal, Ç. Koç, Tianyou Bao, Jiafeng Xie

{"title":"CASA: A Compact and Scalable Accelerator for Approximate Homomorphic Encryption","authors":"Pengzhou He, Samira Carolina Oliva Madrigal, Ç. Koç, Tianyou Bao, Jiafeng Xie","doi":"10.46586/tches.v2024.i2.451-480","DOIUrl":"https://doi.org/10.46586/tches.v2024.i2.451-480","url":null,"abstract":"Approximate arithmetic-based homomorphic encryption (HE) scheme CKKS [CKKS17] is arguably the most suitable one for real-world data-privacy applications due to its wider computation range than other HE schemes such as BGV [BGV14], FV and BFV [Bra12, FV12]. However, the most crucial homomorphic operation of CKKS called key-switching induces a great amount of computational burden in actual deployment situations, and creates scalability challenges for hardware acceleration. In this paper, we present a novel Compact And Scalable Accelerator (CASA) for CKKS on the field-programmable gate array (FPGA) platform. The proposed CASA addresses the aforementioned computational and scalability challenges in homomorphic operations, including key-exchange, homomorphic multiplication, homomorphic addition, and rescaling.On the architecture layer, we propose a new design methodology for efficient acceleration of CKKS. We design this novel hardware architecture by carefully studying the homomorphic operation patterns and data dependency amongst the primitive oracles. The homomorphic operations are efficiently mapped into an accelerator with simple control and smooth operation, which brings benefits for scalable implementation and enhanced pipeline and parallel processing (even with the potential for further improvement).On the component layer, we carry out a detailed and extensive study and present novel micro-architectures for primitive function modules, including memory bank, number theoretic transform (NTT) module, modulus switching bank, and dyadic multiplication and accumulation.On the arithmetic layer, we develop a new partially reduction-free modular arithmetic technique to eliminate part of the reduction cost over different prime moduli within the moduli chain of the Residue Number System (RNS). The proposed structure can support arbitrary numbers of security primes of CKKS during key exchange, which offers better security options for adopting the scalable design methodology.As a proof-of-concept, we implement CASA on the FPGA platform and compare it with state-of-the-art designs. The implementation results showcase the superior performance of the proposed CASA in many aspects such as compact area, scalable architecture, and overall better area-time complexities.In particular, we successfully implement CASA on a mainstream resource-constrained Artix-7 FPGA. To the authors’ best knowledge, this is the first compact CKKS accelerator implemented on an Artix-7 device, e.g., CASA achieves a 10.8x speedup compared with the state-of-the-art CPU implementations (with power consumption of only 5.8%). Considering the power-delay product metric, CASA also achieves 138x and 105x improvement compared with the recent GPU implementation.","PeriodicalId":321490,"journal":{"name":"IACR Transactions on Cryptographic Hardware and Embedded Systems","volume":"34 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140250335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Gleeok: A Family of Low-Latency PRFs and its Applications to Authenticated Encryption Gleeok：低延迟 PRF 系列及其在认证加密中的应用

IACR Transactions on Cryptographic Hardware and Embedded Systems Pub Date : 2024-03-12 DOI: 10.46586/tches.v2024.i2.545-587

Ravi Anand, S. Banik, Andrea Caforio, Tatsuya Ishikawa, Takanori Isobe, Fukang Liu, Kazuhiko Minematsu, Mostafizar Rahman, Kosei Sakamoto

{"title":"Gleeok: A Family of Low-Latency PRFs and its Applications to Authenticated Encryption","authors":"Ravi Anand, S. Banik, Andrea Caforio, Tatsuya Ishikawa, Takanori Isobe, Fukang Liu, Kazuhiko Minematsu, Mostafizar Rahman, Kosei Sakamoto","doi":"10.46586/tches.v2024.i2.545-587","DOIUrl":"https://doi.org/10.46586/tches.v2024.i2.545-587","url":null,"abstract":"In this paper, we propose a new family of low-latency pseudorandom functions (PRFs), dubbed Gleeok.Gleeok utilizes three 128-bit branches to achieve a 256-bit key size while maintaining low latency. The first two branches are specifically designed to defend against statistical attacks, especially for differential attacks, while the third branch provides resilience against algebraic attacks. This unique design enables Gleeok to offer ultralow latency while supporting 256-bit keys, setting it apart from existing ciphers dedicated to low-latency requirements. In addition, we propose wide-block variants having three 256-bit branches. We also present an application of Gleeok to short-input authenticated encryption which is crucial for memory encryption and various realtime communication applications. Furthermore, we present comprehensive hardware implementation results that establish the capabilities of Gleeok and demonstrate its competitiveness against related schemes in the literature. In particular, Gleeok achieves a minimum latency of roughly 360 ps with the NanGate 15 nm cell library and is thus on par with related low-latency schemes that only feature 128-bit keys while maintaining minimal overhead when equipped in an authenticated mode of operation.","PeriodicalId":321490,"journal":{"name":"IACR Transactions on Cryptographic Hardware and Embedded Systems","volume":"2 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140251352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient ASIC Architecture for Low Latency Classic McEliece Decoding 用于低延迟经典 McEliece 解码的高效 ASIC 架构

IACR Transactions on Cryptographic Hardware and Embedded Systems Pub Date : 2024-03-12 DOI: 10.46586/tches.v2024.i2.403-425

Daniel Fallnich, Christian Lanius, Shutao Zhang, Tobias Gemmeke

{"title":"Efficient ASIC Architecture for Low Latency Classic McEliece Decoding","authors":"Daniel Fallnich, Christian Lanius, Shutao Zhang, Tobias Gemmeke","doi":"10.46586/tches.v2024.i2.403-425","DOIUrl":"https://doi.org/10.46586/tches.v2024.i2.403-425","url":null,"abstract":"Post-quantum cryptography addresses the increasing threat that quantum computing poses to modern communication systems. Among the available “quantum-resistant” systems, the Classic McEliece key encapsulation mechanism (KEM) is positioned as a conservative choice with strong security guarantees. Building upon the code-based Niederreiter cryptosystem, this KEM enables high performance encapsulation and decapsulation and is thus ideally suited for applications such as the acceleration of server workloads. However, until now, no ASIC architecture is available for low latency computation of Classic McEliece operations. Therefore, the present work targets the design, implementation and optimization of a tailored ASIC architecture for low latency Classic McEliece decoding. An efficient ASIC design is proposed, which was implemented and manufactured in a 22 nm FDSOI CMOS technology node. We also introduce a novel inversionless architecture for the computation of error-locator polynomials as well as a systolic array for combined syndrome computation and polynomial evaluation. With these approaches, the associated optimized architecture improves the latency of computing error-locator polynomials by 47% and the overall decoding latency by 27% compared to a state-of-the-art reference, while requiring only 25% of the area.","PeriodicalId":321490,"journal":{"name":"IACR Transactions on Cryptographic Hardware and Embedded Systems","volume":"40 11","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140248133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

High-Performance Hardware Implementation of MPCitH and Picnic3 高性能硬件实现 MPCitH 和 Picnic3

IACR Transactions on Cryptographic Hardware and Embedded Systems Pub Date : 2024-03-12 DOI: 10.46586/tches.v2024.i2.190-214

Guoxiao Liu, Keting Jia, Puwen Wei, Lei Ju

{"title":"High-Performance Hardware Implementation of MPCitH and Picnic3","authors":"Guoxiao Liu, Keting Jia, Puwen Wei, Lei Ju","doi":"10.46586/tches.v2024.i2.190-214","DOIUrl":"https://doi.org/10.46586/tches.v2024.i2.190-214","url":null,"abstract":"Picnic is a post-quantum digital signature, the security of which relies solely on symmetric-key primitives such as block ciphers and hash functions instead of number theoretic assumptions. One of the main concerns of Picnic is the large signature size. Although Katz et al.’s protocol (MPCitH-PP) significantly reduces the size of Picnic, the involvement of more parties in MPCitH-PP leads to longer signing/verification times and more hardware resources. This poses new challenges for implementing high-performance Picnic on resource-constrained FPGAs. So far as we know, current works on the hardware implementation of MPCitH-based signatures are compatible with 3 parties only. In this work, we investigate the optimization of the implementation of MPCitH-PP and successfully deploying MPCitH-PP with more than three parties on resource-constrained FPGAs, e.g., Xilinx Artix-7 and Kintex-7, for the first time. In particular, we propose a series of optimizations, which include pipelining and parallel optimization for MPCitH-PP and the optimization of the underlying symmetric primitives. Besides, we make a slight modification to the computation of the offline commitment, which can further reduce the number of computations of Keccak. These optimizations significantly improve the hardware performance of Picnic3. Signing messages on our FPGA takes 0.047 ms for the L1 security level, outperforming Picnic1 with hardware by a factor of about 5.3, which is the fastest implementation of post-quantum signatures as far as we know. Our FPGA implementation for the L5 security level takes 0.146 ms beating Picnic1 by a factor of 8.5, and outperforming Sphincs by a factor of 17.3.","PeriodicalId":321490,"journal":{"name":"IACR Transactions on Cryptographic Hardware and Embedded Systems","volume":"9 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140248733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Laser-Based Command Injection Attacks on Voice-Controlled Microphone Arrays 基于激光的声控麦克风阵列指令注入攻击

IACR Transactions on Cryptographic Hardware and Embedded Systems Pub Date : 2024-03-12 DOI: 10.46586/tches.v2024.i2.654-676

Hetian Shi, Yi He, Qing Wang, Jianwei Zhuge, Qi Li, Xin Liu

{"title":"Laser-Based Command Injection Attacks on Voice-Controlled Microphone Arrays","authors":"Hetian Shi, Yi He, Qing Wang, Jianwei Zhuge, Qi Li, Xin Liu","doi":"10.46586/tches.v2024.i2.654-676","DOIUrl":"https://doi.org/10.46586/tches.v2024.i2.654-676","url":null,"abstract":"Voice-controlled (VC) systems, such as mobile phones and smart speakers, enable users to operate smart devices through voice commands. Previous works (e.g., LightCommands) show that attackers can trigger VC systems to respond to various audio commands by injecting light signals. However, LightCommands only discusses attacks on devices with a single microphone, while new devices typically use microphone arrays with sensor fusion technology for better capturing sound from different distances. By replicating LightCommands’s experiments on the new devices, we find that simply extending the light scope (just as they do) to overlap multiple microphone apertures is inadequate to wake up the device with sensor fusion. Adapting LightCommands’s approach to microphone arrays is challenging due to their requirement for multiple sound amplifiers, and each amplifier requires an independent power driver with unique settings. The number of additional devices increases with the microphone aperture count, significantly increasing the complexity of implementing and deploying the attack equipment. With a growing number of devices adopting sensor fusion to distinguish the sound location, it is essential to propose new approaches to adapting the light injection attacks to these new devices. To address these problems, we propose a lightweight microphone array laser injection solution called LCMA (Laser Commands for Microphone Array), which can use a single laser controller to manipulate multiple laser points and simultaneously target all the apertures of a microphone array and input light waves at different frequencies. Our key design is to propose a new PWM (Pulse Width Modulation) based control signal algorithm that can be implemented on a single MCU and directly control multiple lasers via different PWM output channels. Moreover, LCMA can be remotely configured via BLE (Bluetooth Low Energy). These features allow our solution to be deployed on a drone to covertly attack the targets hidden inside the building. Using LCMA, we successfully attack 29 devices. The experiment results show that LCMA is robust on the newest devices such as the iPhone 15, and the control panel of the Tesla Model Y.","PeriodicalId":321490,"journal":{"name":"IACR Transactions on Cryptographic Hardware and Embedded Systems","volume":"65 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140250071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

EstraNet: An Efficient Shift-Invariant Transformer Network for Side-Channel Analysis EstraNet：用于侧信道分析的高效移位不变变压器网络

IACR Transactions on Cryptographic Hardware and Embedded Systems Pub Date : 2023-12-04 DOI: 10.46586/tches.v2024.i1.336-374

Suvadeep Hajra, Siddhartha Chowdhury, Debdeep Mukhopadhyay

{"title":"EstraNet: An Efficient Shift-Invariant Transformer Network for Side-Channel Analysis","authors":"Suvadeep Hajra, Siddhartha Chowdhury, Debdeep Mukhopadhyay","doi":"10.46586/tches.v2024.i1.336-374","DOIUrl":"https://doi.org/10.46586/tches.v2024.i1.336-374","url":null,"abstract":"Deep Learning (DL) based Side-Channel Analysis (SCA) has been extremely popular recently. DL-based SCA can easily break implementations protected by masking countermeasures. DL-based SCA has also been highly successful against implementations protected by various trace desynchronization-based countermeasures like random delay, clock jitter and shuffling. Over the years, many DL models have been explored to perform SCA. Recently, Transformer Network (TN) based model has also been introduced for SCA. Though the previously introduced TN-based model is successful against implementations jointly protected by masking and random delay countermeasures, it is not scalable to long traces (having a length greater than a few thousand) due to its quadratic time and memory complexity. This work proposes a novel shift-invariant TN-based model with linear time and memory complexity. he contributions of the work are two-fold. First, we introduce a novel TN-based model called EstraNet for SCA. EstraNet has linear time and memory complexity in trace length, significantly improving over the previously proposed TN-based model’s quadratic time and memory cost. EstraNet is also shift-invariant, making it highly effective against countermeasures like random delay and clock jitter. Secondly, we evaluated EstraNet on three SCA datasets of masked implementations with random delay and clock jitter effect. Our experimental results show that EstraNet significantly outperforms several benchmark models, demonstrating up to an order of magnitude reduction in the number of attack traces required to reach guessing entropy 1.","PeriodicalId":321490,"journal":{"name":"IACR Transactions on Cryptographic Hardware and Embedded Systems","volume":"41 12","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138602389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TeeJam: Sub-Cache-Line Leakages Strike Back TeeJam：子缓存线路泄漏反击战

IACR Transactions on Cryptographic Hardware and Embedded Systems Pub Date : 2023-12-04 DOI: 10.46586/tches.v2024.i1.457-500

Florian Sieck, Zhiyuan Zhang, Sebastian Berndt, C. Chuengsatiansup, T. Eisenbarth, Y. Yarom

{"title":"TeeJam: Sub-Cache-Line Leakages Strike Back","authors":"Florian Sieck, Zhiyuan Zhang, Sebastian Berndt, C. Chuengsatiansup, T. Eisenbarth, Y. Yarom","doi":"10.46586/tches.v2024.i1.457-500","DOIUrl":"https://doi.org/10.46586/tches.v2024.i1.457-500","url":null,"abstract":"The microarchitectural behavior of modern CPUs is mostly hidden from developers and users of computer software. Due to a plethora of attacks exploiting microarchitectural behavior, developers of security-critical software must, e.g., ensure their code is constant-time, which is cumbersome and usually results in slower programs. In practice, small leakages which are deemed not exploitable still remain in the codebase. For example, sub-cache-line leakages have previously been investigated in the CacheBleed and MemJam attacks, which are deemed impractical on modern platforms.In this work, we revisit and carefully analyze the 4k-aliasing effect and discover that the measurable delay introduced by this microarchitectural effect is higher than found by previous work and described by Intel. By combining the rediscovered effect with a high temporal resolution possible when single-stepping an SGX enclave, we construct a very precise, yet widely applicable attack with sub-cache-line leakage resolution. o demonstrate the significance of our findings, we apply the new attack primitive to break a hardened AES T-Table implementation that features constant cache line access patterns. The attack is up to three orders of magnitude more efficient than previous sub-cache-line attacks on AES in SGX. Furthermore, we improve upon the recent work of Sieck et al. which showed partial exploitability of very faint leakages in a utility function loading base64-encoded RSA keys. With reliable sub-cache-line resolution, we build an end-to-end attack exploiting the faint leakage that can recover 4096-bit keys in minutes on a laptop. Finally, we extend the key recovery algorithm to also work for RSA keys following the standard that uses Carmichael’s totient function, while previous attacks were restricted to RSA keys using Euler’s totient function.","PeriodicalId":321490,"journal":{"name":"IACR Transactions on Cryptographic Hardware and Embedded Systems","volume":"19 11","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138603358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Tale of Snakes and Horses: Amplifying Correlation Power Analysis on Quadratic Maps 蛇与马的故事：四元图上的放大相关功率分析

IACR Transactions on Cryptographic Hardware and Embedded Systems Pub Date : 2023-12-04 DOI: 10.46586/tches.v2024.i1.27-50

Anna Guinet, Georg Land, Ioan Gabriel Bucur, T. Güneysu

{"title":"A Tale of Snakes and Horses: Amplifying Correlation Power Analysis on Quadratic Maps","authors":"Anna Guinet, Georg Land, Ioan Gabriel Bucur, T. Güneysu","doi":"10.46586/tches.v2024.i1.27-50","DOIUrl":"https://doi.org/10.46586/tches.v2024.i1.27-50","url":null,"abstract":"We study the success probabilities of two variants of Correlation Power Analysis (CPA) to retrieve multiple secret bits. The target is a permutation-based symmetric cryptographic construction with a quadratic map as an S-box. More precisely, we focus on the nonlinear mapping X used in the Xoodoo and Keccak-p permutations, which is affine equivalent to the nonlinear mapping of Ascon. We thus consider three-bit and five-bit S-boxes. Our leakage model is the difference in power consumption of register cells before and after one round. It reflects that, >in hardware, the aforesaid cryptographic algorithms are usually implemented by deploying a round-based architecture. The power consumption difference depends on whether the targeted bits in the register flip. In particular, we describe two attacks based on the CPA methodology. First, we start with a standard CPA approach, for which, to the best of our knowledge, we are the first to point out the differences between attacking a three-bit and a five-bit S-box. For CPA, the highest correlation coefficient is the most likely secret hypothesis. We improve on this with our novel combined Correlation Power Analysis (combined CPA), or Snake attack, which uses quadratic map cryptanalysis (e.g., of the function X) to achieve a better attack in terms of the number of traces required and computational complexity. For the Snake attack, sums of absolute or squared values of correlation coefficients are used to determine the most likely guess. As a result, we effectively show that our proposed Snake attack can recover the secret in n22 (or n22+n) intermediate results, compared to 22n for the CPA, where n is the length of the targeted S-Box. We collected power measurements from a hardware setup to demonstrate practical attack success probabilities according to the rank of the correct secret hypothesis both for Xoodoo and Keccak-p. In addition, we explain our success probabilities thanks to the Henery model developed for horse races. In short, after performing 16,896 attacks, the Snake attack or combined CPA on Xoodoo consistently recovers the correct secret bits with each attack using 43,860 traces on average and with only 12 correlations, compared to 61,380 traces for the standard CPA attack with 64 correlations. The Snake attack requires about one-fifth as many correlation values as the standard CPA. For Keccak-p, the difference is more drastic: the Snake attack recovers the secret bits invariably after 771,600 traces with just 20 correlations. In contrast, the standard CPA attack operates on 1,024 correlations (about fifty times more than the Snake attack) and requires 1,223,400 traces.","PeriodicalId":321490,"journal":{"name":"IACR Transactions on Cryptographic Hardware and Embedded Systems","volume":"7 7","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138604414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0