ENIGMA: Low-Latency and Privacy-Preserving Edge Inference on Heterogeneous Neural Network Accelerators

2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS) Pub Date : 2022-07-01 DOI:10.1109/ICDCS54860.2022.00051

Qiushi Li, Ju Ren, Xinglin Pan, Yuezhi Zhou, Yaoxue Zhang

{"title":"ENIGMA: Low-Latency and Privacy-Preserving Edge Inference on Heterogeneous Neural Network Accelerators","authors":"Qiushi Li, Ju Ren, Xinglin Pan, Yuezhi Zhou, Yaoxue Zhang","doi":"10.1109/ICDCS54860.2022.00051","DOIUrl":null,"url":null,"abstract":"Time-efficient artificial intelligence (AI) service has recently witnessed increasing interest from academia and industry due to the urgent needs in massive smart applications such as self-driving cars, virtual reality, high-resolution video streaming, etc. Existing solutions to reduce AI latency, like edge computing and heterogeneous neural-network accelerators (NNAs), face high risk of privacy leakage. To achieve both low-latency and privacy-preserving purposes on edge servers (e.g., NNAs), this paper proposes ENIGMA that can exploit the trusted execution environment (TEE) and heterogeneous NNAs of edge servers for edge inference. The low-latency is supported by a new ahead-of-time analysis framework for analyzing the linearity of multilayer neural networks, which automatically slices forward-graph and assigns sub-graphs to TEE or NNA. To avoid privacy leakage issue, we then introduce a pre-forwarded cipher generation (PFCG) scheme for computing linear sub-forward-graphs on NNA. The input data is encrypted to ciphertext that can be computed directly by linear sub-graphs, and the output can be decrypted to obtain the correct output. To enable non-linear computation of sub-graphs on TEE, we use ring-cache and automatic vectorization optimization to address the memory limitation of TEE. Qualitative analysis and quantitative experiments on GPU, NPU and TPU demonstrate that ENIGMA is not only compatible with heterogeneous NNAs, but also can avoid leakages of private features with latency as low as 50-milliseconds.","PeriodicalId":225883,"journal":{"name":"2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDCS54860.2022.00051","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Time-efficient artificial intelligence (AI) service has recently witnessed increasing interest from academia and industry due to the urgent needs in massive smart applications such as self-driving cars, virtual reality, high-resolution video streaming, etc. Existing solutions to reduce AI latency, like edge computing and heterogeneous neural-network accelerators (NNAs), face high risk of privacy leakage. To achieve both low-latency and privacy-preserving purposes on edge servers (e.g., NNAs), this paper proposes ENIGMA that can exploit the trusted execution environment (TEE) and heterogeneous NNAs of edge servers for edge inference. The low-latency is supported by a new ahead-of-time analysis framework for analyzing the linearity of multilayer neural networks, which automatically slices forward-graph and assigns sub-graphs to TEE or NNA. To avoid privacy leakage issue, we then introduce a pre-forwarded cipher generation (PFCG) scheme for computing linear sub-forward-graphs on NNA. The input data is encrypted to ciphertext that can be computed directly by linear sub-graphs, and the output can be decrypted to obtain the correct output. To enable non-linear computation of sub-graphs on TEE, we use ring-cache and automatic vectorization optimization to address the memory limitation of TEE. Qualitative analysis and quantitative experiments on GPU, NPU and TPU demonstrate that ENIGMA is not only compatible with heterogeneous NNAs, but also can avoid leakages of private features with latency as low as 50-milliseconds.

查看原文本刊更多论文

ENIGMA:异构神经网络加速器的低延迟和隐私保护边缘推断

由于自动驾驶汽车、虚拟现实、高分辨率视频流等大规模智能应用的迫切需求，高效率的人工智能(AI)服务最近引起了学术界和工业界越来越多的兴趣。现有的减少人工智能延迟的解决方案，如边缘计算和异构神经网络加速器(NNAs)，面临着很高的隐私泄露风险。为了在边缘服务器(如NNAs)上实现低延迟和隐私保护的目的，本文提出了利用可信执行环境(TEE)和边缘服务器的异构NNAs进行边缘推理的ENIGMA。低延迟由一个新的提前分析框架支持，用于分析多层神经网络的线性性，该框架自动切片前向图并将子图分配给TEE或NNA。为了避免隐私泄露问题，我们引入了一种用于计算NNA上线性子前向图的预转发密码生成(PFCG)方案。将输入的数据加密为可由线性子图直接计算的密文，对输出进行解密，得到正确的输出。为了实现TEE上子图的非线性计算，我们使用环缓存和自动矢量化优化来解决TEE的内存限制。在GPU、NPU和TPU上的定性分析和定量实验表明，ENIGMA不仅可以兼容异构NNAs，而且可以避免私有特征的泄漏，延迟低至50毫秒。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)

自引率

0.00%

发文量