Fast Permutation Architecture on Encrypted Data for Secure Neural Network Inference

2020 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS) Pub Date : 2020-12-08 DOI:10.1109/APCCAS50809.2020.9301698

Xiao Hu, Jing Tian, Zhongfeng Wang

{"title":"Fast Permutation Architecture on Encrypted Data for Secure Neural Network Inference","authors":"Xiao Hu, Jing Tian, Zhongfeng Wang","doi":"10.1109/APCCAS50809.2020.9301698","DOIUrl":null,"url":null,"abstract":"Recently, the secure neural network inference, an organic combination of the homomorphic encryption (HE) and the deep neural network (DNN), has attracted much attention. Nevertheless, the large number computations, brought by the HE scheme, form the bottleneck for real-time applications. A significant portion of the network is the permutation (Perm), which is mainly made up of the number theoretic transform (NTT). In this paper, for the first time, we propose an efficient architecture for the Perm by incorporating algorithmic transformations and architectural level optimizations. First, the core butterfly unit (BU) of NTT is optimized, which reduces the multiplication operations by about 30% compared with the original BU. Then, based on the optimization, a highly parallelized architecture is devised for the Perm. The operations in different modules are well managed by a merging strategy to balance the data path and reduce the memory access. The proposed architecture is synthesized under the TSMC 28-nm CMOS technology. The experimental results show that for the ciphertext size of 2048×60 bits, the proposed design achieves a 7.54x speedup compared to the implementation on an Intel(R) Core(TM) i7-6850K 3.60Hz CPU. Moreover, we apply eight Perm engines to the 1D convolution, which shows a 17.25x speedup over the software implementation.","PeriodicalId":127075,"journal":{"name":"2020 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APCCAS50809.2020.9301698","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Recently, the secure neural network inference, an organic combination of the homomorphic encryption (HE) and the deep neural network (DNN), has attracted much attention. Nevertheless, the large number computations, brought by the HE scheme, form the bottleneck for real-time applications. A significant portion of the network is the permutation (Perm), which is mainly made up of the number theoretic transform (NTT). In this paper, for the first time, we propose an efficient architecture for the Perm by incorporating algorithmic transformations and architectural level optimizations. First, the core butterfly unit (BU) of NTT is optimized, which reduces the multiplication operations by about 30% compared with the original BU. Then, based on the optimization, a highly parallelized architecture is devised for the Perm. The operations in different modules are well managed by a merging strategy to balance the data path and reduce the memory access. The proposed architecture is synthesized under the TSMC 28-nm CMOS technology. The experimental results show that for the ciphertext size of 2048×60 bits, the proposed design achieves a 7.54x speedup compared to the implementation on an Intel(R) Core(TM) i7-6850K 3.60Hz CPU. Moreover, we apply eight Perm engines to the 1D convolution, which shows a 17.25x speedup over the software implementation.

查看原文本刊更多论文

用于安全神经网络推理的加密数据快速置换体系结构

近年来，安全神经网络推理作为同态加密(HE)和深度神经网络(DNN)的有机结合受到了广泛的关注。然而，HE方案带来的大量计算量成为实时应用的瓶颈。排列(Perm)是网络的重要组成部分，它主要由数论变换(NTT)组成。在本文中，我们首次通过结合算法转换和架构级优化，为Perm提出了一个高效的架构。首先，对NTT核心蝴蝶单元(BU)进行了优化，与原来的BU相比，减少了约30%的乘法运算。在此基础上，设计了一种高度并行化的Perm架构，通过合并策略对不同模块间的操作进行管理，平衡数据路径，减少内存访问。该架构是在TSMC 28纳米CMOS技术下合成的。实验结果表明，对于2048×60位的密文大小，与在Intel(R) Core(TM) i7-6850K 3.60Hz CPU上实现相比，所提出的设计实现了7.54倍的加速。此外，我们对1D卷积应用了8个Perm引擎，其速度比软件实现提高了17.25倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)

自引率

0.00%

发文量