{"title":"Homomorphic Evaluation Cluster Architecture for Fully Homomorphic Encryption","authors":"Hanyoung Lee;Ardianto Satriawan;Hanho Lee","doi":"10.1109/OJCAS.2025.3568058","DOIUrl":null,"url":null,"abstract":"Fully Homomorphic Encryption (FHE) allows computational processing of encrypted data on cloud servers, providing high security and enabling safe data utilization. As homomorphic multiplication progresses with encrypted data, noise accumulates, requiring a process called bootstrapping to restore the noise level of the new ciphertext <inline-formula> <tex-math>$ct^{\\prime }$ </tex-math></inline-formula>. Bootstrapping involves linear transformation processes, such as Coefficient to Slots and Slots to Coefficient, where most operations used are rotation. Rotation shifts elements in slots to new positions based on rotation index k. However, the computational cost and memory bandwidth required for a rotation adds significant overhead and limits the ability to perform FHE operations. Therefore, an efficient implementation of rotation is crucial for high-performance FHE applications. To address this problem, we optimized the datapath of rotation in the CKKS scheme to be hardware-friendly and proposed a homomorphic evaluation cluster hardware accelerator tailored for FHE workloads. Our architecture is aware of the computational and memory constraints of field programmable gate arrays (FPGAs) and performs number theoretic transform (NTT), its inverse (INTT), key multiplication, base conversion, and automorphism in a single cluster. We implemented our design in the AMD Alveo U280 FPGA platform. With a polynomial length of 216 and operating at 250 MHz as a rotation accelerator, the design implementation on the FPGA shows a speed-up of about <inline-formula> <tex-math>$700\\times $ </tex-math></inline-formula> compared to the CPU implementation in OpenFHE. Compared to the GPU implementation, it shows a <inline-formula> <tex-math>$1.77\\times $ </tex-math></inline-formula> speed-up, and compared to previous FPGA implementations, it shows a <inline-formula> <tex-math>$1.13\\times $ </tex-math></inline-formula> better.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":"6 ","pages":"135-146"},"PeriodicalIF":2.4000,"publicationDate":"2025-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10993408","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE open journal of circuits and systems","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10993408/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Fully Homomorphic Encryption (FHE) allows computational processing of encrypted data on cloud servers, providing high security and enabling safe data utilization. As homomorphic multiplication progresses with encrypted data, noise accumulates, requiring a process called bootstrapping to restore the noise level of the new ciphertext $ct^{\prime }$ . Bootstrapping involves linear transformation processes, such as Coefficient to Slots and Slots to Coefficient, where most operations used are rotation. Rotation shifts elements in slots to new positions based on rotation index k. However, the computational cost and memory bandwidth required for a rotation adds significant overhead and limits the ability to perform FHE operations. Therefore, an efficient implementation of rotation is crucial for high-performance FHE applications. To address this problem, we optimized the datapath of rotation in the CKKS scheme to be hardware-friendly and proposed a homomorphic evaluation cluster hardware accelerator tailored for FHE workloads. Our architecture is aware of the computational and memory constraints of field programmable gate arrays (FPGAs) and performs number theoretic transform (NTT), its inverse (INTT), key multiplication, base conversion, and automorphism in a single cluster. We implemented our design in the AMD Alveo U280 FPGA platform. With a polynomial length of 216 and operating at 250 MHz as a rotation accelerator, the design implementation on the FPGA shows a speed-up of about $700\times $ compared to the CPU implementation in OpenFHE. Compared to the GPU implementation, it shows a $1.77\times $ speed-up, and compared to previous FPGA implementations, it shows a $1.13\times $ better.