{"title":"Highly-Efficient Hardware Architecture for ML-KEM PQC Standard","authors":"Haesung Jung;Quang Dang Truong;Hanho Lee","doi":"10.1109/OJCAS.2025.3591136","DOIUrl":null,"url":null,"abstract":"The advent of quantum computers, with their immense computational potential, poses significant threats to traditional cryptographic systems. In response, NIST announced the quantum-resistant Module Lattice-based Key Encapsulation Mechanism (ML-KEM) standard in 2024. This paper presents an efficient hardware architecture for the ML-KEM scheme, capable of supporting all algorithms and flexibly adapting to different security levels. The proposed design achieves a balance between high performance and low hardware resource consumption, making it suitable for deployment across various FPGA platforms. Key innovations include the Unified Polynomial Arithmetic Module (UniPAM), capable of handling all polynomial arithmetic operations, and an optimized hash module for the SHA-3 variants integral to ML-KEM. Additionally, the design introduces an efficient timing diagram and conflict-free memory management strategy, enabling seamless parallelism and reducing execution time while minimizing hardware resource consumption. Furthermore, the implementation incorporates several methods to effectively mitigate side-channel attacks, a common concern in hardware-based cryptosystem deployments. The proposed architecture is validated through implementation on an Artix-7 FPGA and Synopsys 14nm ASIC technology. Compared to state-of-the-art designs, our approach demonstrates superior performance while maintaining comparable hardware resource efficiency. Specifically, the hardware implementation on the Xilinx Artix-7 utilizes 12k LUTs, 6.9k FFs, 4 DSPs, and 9 BRAMs at clock frequency of 220 MHz.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":"6 ","pages":"356-369"},"PeriodicalIF":2.4000,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11088254","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE open journal of circuits and systems","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11088254/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
The advent of quantum computers, with their immense computational potential, poses significant threats to traditional cryptographic systems. In response, NIST announced the quantum-resistant Module Lattice-based Key Encapsulation Mechanism (ML-KEM) standard in 2024. This paper presents an efficient hardware architecture for the ML-KEM scheme, capable of supporting all algorithms and flexibly adapting to different security levels. The proposed design achieves a balance between high performance and low hardware resource consumption, making it suitable for deployment across various FPGA platforms. Key innovations include the Unified Polynomial Arithmetic Module (UniPAM), capable of handling all polynomial arithmetic operations, and an optimized hash module for the SHA-3 variants integral to ML-KEM. Additionally, the design introduces an efficient timing diagram and conflict-free memory management strategy, enabling seamless parallelism and reducing execution time while minimizing hardware resource consumption. Furthermore, the implementation incorporates several methods to effectively mitigate side-channel attacks, a common concern in hardware-based cryptosystem deployments. The proposed architecture is validated through implementation on an Artix-7 FPGA and Synopsys 14nm ASIC technology. Compared to state-of-the-art designs, our approach demonstrates superior performance while maintaining comparable hardware resource efficiency. Specifically, the hardware implementation on the Xilinx Artix-7 utilizes 12k LUTs, 6.9k FFs, 4 DSPs, and 9 BRAMs at clock frequency of 220 MHz.