CASA:近似同态加密的紧凑型可扩展加速器

Pengzhou He, Samira Carolina Oliva Madrigal, Ç. Koç, Tianyou Bao, Jiafeng Xie
{"title":"CASA:近似同态加密的紧凑型可扩展加速器","authors":"Pengzhou He, Samira Carolina Oliva Madrigal, Ç. Koç, Tianyou Bao, Jiafeng Xie","doi":"10.46586/tches.v2024.i2.451-480","DOIUrl":null,"url":null,"abstract":"Approximate arithmetic-based homomorphic encryption (HE) scheme CKKS [CKKS17] is arguably the most suitable one for real-world data-privacy applications due to its wider computation range than other HE schemes such as BGV [BGV14], FV and BFV [Bra12, FV12]. However, the most crucial homomorphic operation of CKKS called key-switching induces a great amount of computational burden in actual deployment situations, and creates scalability challenges for hardware acceleration. In this paper, we present a novel Compact And Scalable Accelerator (CASA) for CKKS on the field-programmable gate array (FPGA) platform. The proposed CASA addresses the aforementioned computational and scalability challenges in homomorphic operations, including key-exchange, homomorphic multiplication, homomorphic addition, and rescaling.On the architecture layer, we propose a new design methodology for efficient acceleration of CKKS. We design this novel hardware architecture by carefully studying the homomorphic operation patterns and data dependency amongst the primitive oracles. The homomorphic operations are efficiently mapped into an accelerator with simple control and smooth operation, which brings benefits for scalable implementation and enhanced pipeline and parallel processing (even with the potential for further improvement).On the component layer, we carry out a detailed and extensive study and present novel micro-architectures for primitive function modules, including memory bank, number theoretic transform (NTT) module, modulus switching bank, and dyadic multiplication and accumulation.On the arithmetic layer, we develop a new partially reduction-free modular arithmetic technique to eliminate part of the reduction cost over different prime moduli within the moduli chain of the Residue Number System (RNS). The proposed structure can support arbitrary numbers of security primes of CKKS during key exchange, which offers better security options for adopting the scalable design methodology.As a proof-of-concept, we implement CASA on the FPGA platform and compare it with state-of-the-art designs. The implementation results showcase the superior performance of the proposed CASA in many aspects such as compact area, scalable architecture, and overall better area-time complexities.In particular, we successfully implement CASA on a mainstream resource-constrained Artix-7 FPGA. To the authors’ best knowledge, this is the first compact CKKS accelerator implemented on an Artix-7 device, e.g., CASA achieves a 10.8x speedup compared with the state-of-the-art CPU implementations (with power consumption of only 5.8%). Considering the power-delay product metric, CASA also achieves 138x and 105x improvement compared with the recent GPU implementation.","PeriodicalId":321490,"journal":{"name":"IACR Transactions on Cryptographic Hardware and Embedded Systems","volume":"34 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CASA: A Compact and Scalable Accelerator for Approximate Homomorphic Encryption\",\"authors\":\"Pengzhou He, Samira Carolina Oliva Madrigal, Ç. Koç, Tianyou Bao, Jiafeng Xie\",\"doi\":\"10.46586/tches.v2024.i2.451-480\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Approximate arithmetic-based homomorphic encryption (HE) scheme CKKS [CKKS17] is arguably the most suitable one for real-world data-privacy applications due to its wider computation range than other HE schemes such as BGV [BGV14], FV and BFV [Bra12, FV12]. However, the most crucial homomorphic operation of CKKS called key-switching induces a great amount of computational burden in actual deployment situations, and creates scalability challenges for hardware acceleration. In this paper, we present a novel Compact And Scalable Accelerator (CASA) for CKKS on the field-programmable gate array (FPGA) platform. The proposed CASA addresses the aforementioned computational and scalability challenges in homomorphic operations, including key-exchange, homomorphic multiplication, homomorphic addition, and rescaling.On the architecture layer, we propose a new design methodology for efficient acceleration of CKKS. We design this novel hardware architecture by carefully studying the homomorphic operation patterns and data dependency amongst the primitive oracles. The homomorphic operations are efficiently mapped into an accelerator with simple control and smooth operation, which brings benefits for scalable implementation and enhanced pipeline and parallel processing (even with the potential for further improvement).On the component layer, we carry out a detailed and extensive study and present novel micro-architectures for primitive function modules, including memory bank, number theoretic transform (NTT) module, modulus switching bank, and dyadic multiplication and accumulation.On the arithmetic layer, we develop a new partially reduction-free modular arithmetic technique to eliminate part of the reduction cost over different prime moduli within the moduli chain of the Residue Number System (RNS). The proposed structure can support arbitrary numbers of security primes of CKKS during key exchange, which offers better security options for adopting the scalable design methodology.As a proof-of-concept, we implement CASA on the FPGA platform and compare it with state-of-the-art designs. The implementation results showcase the superior performance of the proposed CASA in many aspects such as compact area, scalable architecture, and overall better area-time complexities.In particular, we successfully implement CASA on a mainstream resource-constrained Artix-7 FPGA. To the authors’ best knowledge, this is the first compact CKKS accelerator implemented on an Artix-7 device, e.g., CASA achieves a 10.8x speedup compared with the state-of-the-art CPU implementations (with power consumption of only 5.8%). Considering the power-delay product metric, CASA also achieves 138x and 105x improvement compared with the recent GPU implementation.\",\"PeriodicalId\":321490,\"journal\":{\"name\":\"IACR Transactions on Cryptographic Hardware and Embedded Systems\",\"volume\":\"34 2\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-03-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IACR Transactions on Cryptographic Hardware and Embedded Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.46586/tches.v2024.i2.451-480\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IACR Transactions on Cryptographic Hardware and Embedded Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.46586/tches.v2024.i2.451-480","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

与 BGV [BGV14]、FV 和 BFV [Bra12, FV12]等其他同态加密(HE)方案相比,基于近似算术的同态加密(HE)方案 CKKS [CKKS17]的计算范围更广,因此可以说是最适合实际数据保密应用的方案。然而,CKKS 中最关键的同态操作--密钥切换--在实际部署情况下会造成巨大的计算负担,并对硬件加速的可扩展性造成挑战。在本文中,我们在现场可编程门阵列(FPGA)平台上为 CKKS 提出了一种新型紧凑型可扩展加速器(CASA)。所提出的 CASA 解决了上述同态运算中的计算和可扩展性难题,包括密钥交换、同态乘法、同态加法和重缩放。我们通过仔细研究同态操作模式和原始算例之间的数据依赖性,设计出了这种新型硬件架构。同态运算被有效地映射到一个控制简单、运行流畅的加速器中,这为可扩展的实施和增强的流水线及并行处理(甚至还有进一步改进的潜力)带来了好处。在组件层,我们进行了详细而广泛的研究,并提出了用于原始函数模块的新型微体系结构,包括内存库、数论变换(NTT)模块、模数转换库以及二元乘法和累加。作为概念验证,我们在 FPGA 平台上实现了 CASA,并与最先进的设计进行了比较。实现结果表明,所提出的 CASA 在许多方面都具有优越性能,如紧凑的面积、可扩展的架构和整体上更好的面积-时间复杂性。据作者所知,这是首个在 Artix-7 器件上实现的紧凑型 CKKS 加速器,例如,与最先进的 CPU 实现相比,CASA 的速度提高了 10.8 倍(功耗仅为 5.8%)。考虑到功率-延迟乘积指标,CASA 与最近的 GPU 实现相比,也分别提高了 138 倍和 105 倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
CASA: A Compact and Scalable Accelerator for Approximate Homomorphic Encryption
Approximate arithmetic-based homomorphic encryption (HE) scheme CKKS [CKKS17] is arguably the most suitable one for real-world data-privacy applications due to its wider computation range than other HE schemes such as BGV [BGV14], FV and BFV [Bra12, FV12]. However, the most crucial homomorphic operation of CKKS called key-switching induces a great amount of computational burden in actual deployment situations, and creates scalability challenges for hardware acceleration. In this paper, we present a novel Compact And Scalable Accelerator (CASA) for CKKS on the field-programmable gate array (FPGA) platform. The proposed CASA addresses the aforementioned computational and scalability challenges in homomorphic operations, including key-exchange, homomorphic multiplication, homomorphic addition, and rescaling.On the architecture layer, we propose a new design methodology for efficient acceleration of CKKS. We design this novel hardware architecture by carefully studying the homomorphic operation patterns and data dependency amongst the primitive oracles. The homomorphic operations are efficiently mapped into an accelerator with simple control and smooth operation, which brings benefits for scalable implementation and enhanced pipeline and parallel processing (even with the potential for further improvement).On the component layer, we carry out a detailed and extensive study and present novel micro-architectures for primitive function modules, including memory bank, number theoretic transform (NTT) module, modulus switching bank, and dyadic multiplication and accumulation.On the arithmetic layer, we develop a new partially reduction-free modular arithmetic technique to eliminate part of the reduction cost over different prime moduli within the moduli chain of the Residue Number System (RNS). The proposed structure can support arbitrary numbers of security primes of CKKS during key exchange, which offers better security options for adopting the scalable design methodology.As a proof-of-concept, we implement CASA on the FPGA platform and compare it with state-of-the-art designs. The implementation results showcase the superior performance of the proposed CASA in many aspects such as compact area, scalable architecture, and overall better area-time complexities.In particular, we successfully implement CASA on a mainstream resource-constrained Artix-7 FPGA. To the authors’ best knowledge, this is the first compact CKKS accelerator implemented on an Artix-7 device, e.g., CASA achieves a 10.8x speedup compared with the state-of-the-art CPU implementations (with power consumption of only 5.8%). Considering the power-delay product metric, CASA also achieves 138x and 105x improvement compared with the recent GPU implementation.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信