Haohao Liao, Mahmoud A. Elmohr, Xuan Dong, Yanjun Qian, Wenzhe Yang, Zhiwei Shang, Yin Tan
{"title":"TurboHE: Accelerating Fully Homomorphic Encryption Using FPGA Clusters","authors":"Haohao Liao, Mahmoud A. Elmohr, Xuan Dong, Yanjun Qian, Wenzhe Yang, Zhiwei Shang, Yin Tan","doi":"10.1109/IPDPS54959.2023.00084","DOIUrl":null,"url":null,"abstract":"With the burgeoning demands for cloud computing in various fields followed by the rising attention to sensitive data exposure, Fully Homomorphic Encryption (FHE) is gaining popularity as a potential solution to privacy protection. By performing computations directly on the ciphertext (encrypted data) without decrypting it, FHE can guarantee the security of data throughout its lifecycle without compromising the privacy. However, the excruciatingly slow speed of FHE scheme makes adopting it impractical in real life applications. Therefore, hardware accelerators come to the rescue to mitigate the problem. Among various hardware platforms, FPGA clusters are particularly promising because of their flexibility and ready availability at many cloud providers such as FPGA-as-a-Service (FaaS). Hence, reusing the existing infrastructure can greatly facilitate the implementation of FHE on the cloud.In this paper, we present TurboHE, the first hardware accelerator for FHE operations based on an FPGA cluster. TurboHE aims to boost the performance of CKKS, one of the fastest FHE schemes which is most suitable to machine learning applications, by accelerating its computationally intensive and frequently used operation: relinearization. The proposed scalable architecture based on hardware partitioning can be easily configured to accommodate high acceleration requirements for relinearization with very large CKKS parameters. As a demonstration, an implementation, which supports 32,768 polynomial coefficients and a coefficient bitwidth of 594 decomposed into 11 Residue Number System (RNS) components, was deployed on a cluster consisting of 9 Xilinx VU13P FPGAs. The cluster operated at 200 MHz and achieved 1096 times throughput compared with a single threaded CPU implementation. Moreover, the low level hardware components implemented in this work such as the NTT module can also be applied to accelerate other lattice-based cryptography schemes.","PeriodicalId":343684,"journal":{"name":"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS54959.2023.00084","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
With the burgeoning demands for cloud computing in various fields followed by the rising attention to sensitive data exposure, Fully Homomorphic Encryption (FHE) is gaining popularity as a potential solution to privacy protection. By performing computations directly on the ciphertext (encrypted data) without decrypting it, FHE can guarantee the security of data throughout its lifecycle without compromising the privacy. However, the excruciatingly slow speed of FHE scheme makes adopting it impractical in real life applications. Therefore, hardware accelerators come to the rescue to mitigate the problem. Among various hardware platforms, FPGA clusters are particularly promising because of their flexibility and ready availability at many cloud providers such as FPGA-as-a-Service (FaaS). Hence, reusing the existing infrastructure can greatly facilitate the implementation of FHE on the cloud.In this paper, we present TurboHE, the first hardware accelerator for FHE operations based on an FPGA cluster. TurboHE aims to boost the performance of CKKS, one of the fastest FHE schemes which is most suitable to machine learning applications, by accelerating its computationally intensive and frequently used operation: relinearization. The proposed scalable architecture based on hardware partitioning can be easily configured to accommodate high acceleration requirements for relinearization with very large CKKS parameters. As a demonstration, an implementation, which supports 32,768 polynomial coefficients and a coefficient bitwidth of 594 decomposed into 11 Residue Number System (RNS) components, was deployed on a cluster consisting of 9 Xilinx VU13P FPGAs. The cluster operated at 200 MHz and achieved 1096 times throughput compared with a single threaded CPU implementation. Moreover, the low level hardware components implemented in this work such as the NTT module can also be applied to accelerate other lattice-based cryptography schemes.