{"title":"SAFE: A Scalable Homomorphic Encryption Accelerator for Vertical Federated Learning","authors":"Zhaohui Chen;Zhen Gu;Yanheng Lu;Xuanle Ren;Ruiguang Zhong;Wen-Jie Lu;Jiansong Zhang;Yichi Zhang;Hanghang Wu;Xiaofu Zheng;Heng Liu;Tingqiang Chu;Cheng Hong;Changzheng Wei;Dimin Niu;Yuan Xie","doi":"10.1109/TCAD.2024.3496836","DOIUrl":null,"url":null,"abstract":"Privacy preservation has become a critical concern for governments, hospitals, and large corporations. Homomorphic encryption (HE) enables a ciphertext-based computation paradigm with strong security guarantees. In emerging cross-agency data cooperation scenarios like vertical federated learning (VFL), HE protects the data interaction from exposure to counterparts. However, computation on ciphertext has significant performance challenges due to increased data size and substantial overhead. Related work has been proposed to accelerate HE using parallel hardware, such as GPUs, FPGAs, and ASICs. However, many existing hardware accelerators target specific HE operations, such as number theoretic transform (NTT) and key switching, providing limited performance improvement for end-to-end applications. Others support bootstrapping, which requires quite a large ASIC design. To better support existing VFL training applications, we propose SAFE, an HE accelerator for scalable homomorphic matrix-vector products (HMVPs), which is the performance bottleneck. SAFE adopts a coefficient-wise encoded HMVP algorithm, despite a vanilla mode, we further explore the compressed and concatenated modes, which can fully utilize the polynomial encoding slots. The proposed hardware architecture, customized for HMVP dataflow, supports spatial and temporal parallelization of function units. The most costly polynomial function, NTT, is implemented with a low-area constant geometry unit which improves efficiency by <inline-formula> <tex-math>$2.43\\times $ </tex-math></inline-formula>. SAFE is implemented as a CPU-FPGA heterogeneous acceleration system, unleashing the multithread potential. The evaluation demonstrates an up to <inline-formula> <tex-math>$36\\times $ </tex-math></inline-formula> speed-up in end-to-end federated logistic regression training.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 5","pages":"1662-1675"},"PeriodicalIF":2.7000,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10750502/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Privacy preservation has become a critical concern for governments, hospitals, and large corporations. Homomorphic encryption (HE) enables a ciphertext-based computation paradigm with strong security guarantees. In emerging cross-agency data cooperation scenarios like vertical federated learning (VFL), HE protects the data interaction from exposure to counterparts. However, computation on ciphertext has significant performance challenges due to increased data size and substantial overhead. Related work has been proposed to accelerate HE using parallel hardware, such as GPUs, FPGAs, and ASICs. However, many existing hardware accelerators target specific HE operations, such as number theoretic transform (NTT) and key switching, providing limited performance improvement for end-to-end applications. Others support bootstrapping, which requires quite a large ASIC design. To better support existing VFL training applications, we propose SAFE, an HE accelerator for scalable homomorphic matrix-vector products (HMVPs), which is the performance bottleneck. SAFE adopts a coefficient-wise encoded HMVP algorithm, despite a vanilla mode, we further explore the compressed and concatenated modes, which can fully utilize the polynomial encoding slots. The proposed hardware architecture, customized for HMVP dataflow, supports spatial and temporal parallelization of function units. The most costly polynomial function, NTT, is implemented with a low-area constant geometry unit which improves efficiency by $2.43\times $ . SAFE is implemented as a CPU-FPGA heterogeneous acceleration system, unleashing the multithread potential. The evaluation demonstrates an up to $36\times $ speed-up in end-to-end federated logistic regression training.
期刊介绍:
The purpose of this Transactions is to publish papers of interest to individuals in the area of computer-aided design of integrated circuits and systems composed of analog, digital, mixed-signal, optical, or microwave components. The aids include methods, models, algorithms, and man-machine interfaces for system-level, physical and logical design including: planning, synthesis, partitioning, modeling, simulation, layout, verification, testing, hardware-software co-design and documentation of integrated circuit and system designs of all complexities. Design tools and techniques for evaluating and designing integrated circuits and systems for metrics such as performance, power, reliability, testability, and security are a focus.