MCHEAS: Optimizing Large-Parameter NTT Over Multicluster In-Situ FHE Accelerating System

IF 2.9 3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Zhenyu Guan;Yongqing Zhu;Luchang Lei;Hongyang Jia;Yi Chen;Bo Zhang;Changrui Ren;Jin Dong;Song Bian
{"title":"MCHEAS: Optimizing Large-Parameter NTT Over Multicluster In-Situ FHE Accelerating System","authors":"Zhenyu Guan;Yongqing Zhu;Luchang Lei;Hongyang Jia;Yi Chen;Bo Zhang;Changrui Ren;Jin Dong;Song Bian","doi":"10.1109/TCAD.2025.3555191","DOIUrl":null,"url":null,"abstract":"Fully Homomorphic encryption (FHE) enables high-level security but with a heavy computation workload, necessitating software-hardware co-design for aggressive acceleration. Recent works on specialized accelerators for HE evaluation have made significant progress in supporting lightweight RNS-CKKS applications, especially those with high-density in-memory computing techniques. To fulfill higher computational demands for more general applications, this article proposes multicluster HE accelerating system (MCHEAS), an accelerating system comprising multiple in-situ HE processing accelerators, each functioning as a cluster to perform large-parameter RNS-CKKS evaluation collaboratively. MCHEAS features optimization strategies including the synchronous, preemptive swap, square-diagonal, and odd-even index separation. Using these strategies to compile the computation and transmission of number theoretic transform (NTT) coefficients, the method optimizes the intercluster data swaps, a major bottleneck in NTT computations. Evaluations show that under 1 GHz, with different intercluster data transfer bandwidths, our approach accelerates NTT computations by 26.40% to 51.75%. MCHEAS also improves computing unit utilization by 10.30% to 33.97%, with a maximum peak utilization rate of up to 99.62%. MCHEAS achieves 17.63% to 34.67% speedups for HE operations involving NTT, and 15.12% to 30.62% speedups for demonstrated applications, while enhancing the computing units’ utilization by 5.18% to 21.87% during application execution. Furthermore, we compare MCHEAS with SOTA designs under a specific intercluster data transfer bandwidth, achieving up to <inline-formula> <tex-math>$81.45\\times $ </tex-math></inline-formula> their area efficiencies in applications.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 10","pages":"3683-3696"},"PeriodicalIF":2.9000,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10939019/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

Fully Homomorphic encryption (FHE) enables high-level security but with a heavy computation workload, necessitating software-hardware co-design for aggressive acceleration. Recent works on specialized accelerators for HE evaluation have made significant progress in supporting lightweight RNS-CKKS applications, especially those with high-density in-memory computing techniques. To fulfill higher computational demands for more general applications, this article proposes multicluster HE accelerating system (MCHEAS), an accelerating system comprising multiple in-situ HE processing accelerators, each functioning as a cluster to perform large-parameter RNS-CKKS evaluation collaboratively. MCHEAS features optimization strategies including the synchronous, preemptive swap, square-diagonal, and odd-even index separation. Using these strategies to compile the computation and transmission of number theoretic transform (NTT) coefficients, the method optimizes the intercluster data swaps, a major bottleneck in NTT computations. Evaluations show that under 1 GHz, with different intercluster data transfer bandwidths, our approach accelerates NTT computations by 26.40% to 51.75%. MCHEAS also improves computing unit utilization by 10.30% to 33.97%, with a maximum peak utilization rate of up to 99.62%. MCHEAS achieves 17.63% to 34.67% speedups for HE operations involving NTT, and 15.12% to 30.62% speedups for demonstrated applications, while enhancing the computing units’ utilization by 5.18% to 21.87% during application execution. Furthermore, we compare MCHEAS with SOTA designs under a specific intercluster data transfer bandwidth, achieving up to $81.45\times $ their area efficiencies in applications.
基于多簇原位FHE加速系统的大参数NTT优化
完全同态加密(FHE)实现了高级别安全性,但计算工作量很大,需要软件和硬件协同设计来实现积极的加速。最近关于HE评估专用加速器的工作在支持轻量级RNS-CKKS应用方面取得了重大进展,特别是那些具有高密度内存计算技术的应用。为了满足更广泛应用的更高计算需求,本文提出了多集群HE加速系统(MCHEAS),这是一个由多个原位HE处理加速器组成的加速系统,每个加速器作为一个集群协同执行大参数RNS-CKKS评估。MCHEAS的优化策略包括同步、抢占式交换、平方对角线和奇偶索引分离。利用这些策略编译数论变换(NTT)系数的计算和传输,优化了集群间的数据交换,这是NTT计算的主要瓶颈。评估表明,在1 GHz下,在不同的集群间数据传输带宽下,我们的方法将NTT计算速度提高了26.40%至51.75%。MCHEAS还将计算单元利用率提高了10.30% ~ 33.97%,最大峰值利用率高达99.62%。MCHEAS在涉及NTT的HE操作中实现了17.63%到34.67%的加速,在演示应用程序中实现了15.12%到30.62%的加速,同时在应用程序执行期间将计算单元的利用率提高了5.18%到21.87%。此外,在特定的集群间数据传输带宽下,我们将MCHEAS与SOTA设计进行了比较,在应用中实现了高达81.45倍的区域效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
5.60
自引率
13.80%
发文量
500
审稿时长
7 months
期刊介绍: The purpose of this Transactions is to publish papers of interest to individuals in the area of computer-aided design of integrated circuits and systems composed of analog, digital, mixed-signal, optical, or microwave components. The aids include methods, models, algorithms, and man-machine interfaces for system-level, physical and logical design including: planning, synthesis, partitioning, modeling, simulation, layout, verification, testing, hardware-software co-design and documentation of integrated circuit and system designs of all complexities. Design tools and techniques for evaluating and designing integrated circuits and systems for metrics such as performance, power, reliability, testability, and security are a focus.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信