Vertin：用于神经网络和llm的快速，通信友好和密钥紧凑的安全推理系统

IF 3.7 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Information Security and Applications Pub Date : 2025-04-24 DOI:10.1016/j.jisa.2025.104060

Xin Bie , Zhenhua Liu , Han Liang

{"title":"Vertin：用于神经网络和llm的快速，通信友好和密钥紧凑的安全推理系统","authors":"Xin Bie , Zhenhua Liu , Han Liang","doi":"10.1016/j.jisa.2025.104060","DOIUrl":null,"url":null,"abstract":"<div><div>Existing secure inference schemes based on function secret sharing (FSS) allow the client to obtain inference results while protecting the client’s inputs, the server’s neural networks (NNs), and large language models (LLMs), ensuring high online efficiency. However, there is still room for improvement in terms of storage, communication, and inference speed for linear layers in these schemes. In this work, we introduce a novel semi-honest secure two-party inference system tailored for NNs and LLMs, which surpasses state-of-the-art solutions in speed, communication efficiency, and key storage. Our system leverages plaintext weight matrices for the server, introducing <em>FMLO</em>, a secure two-party computation protocol supporting linear operations. By using precomputed random matrices correlated with weight matrices, <em>FMLO</em> minimizes key storage, online computation, and communication demands. We also develop two efficient protocols, <span><math><msub><mrow><mi>π</mi></mrow><mrow><mi>M</mi><mi>u</mi><mi>l</mi><mi>P</mi><mi>r</mi><mi>e</mi></mrow></msub></math></span> for matrix multiplication and <span><math><msub><mrow><mi>π</mi></mrow><mrow><mi>C</mi><mi>o</mi><mi>n</mi><mi>v</mi><mi>P</mi><mi>r</mi><mi>e</mi></mrow></msub></math></span> for matrix convolution, by using vector oblivious linear evaluation. Both protocols batch-generate required random numbers securely in the offline phase, reducing preprocessing overhead in <em>FMLO</em>. Compared to the leading FSS-based scheme <em>Orca</em>, <em>Vertin</em> reduces key storage by 5.37%, online communication by 16.46%, and online inference time by 10.71% in secure inference with ResNet-50. When compared to the state-of-the-art <em>SIGMA</em> on BERT-large model with the sequence length of 64, <em>Vertin</em> achieves reductions in key storage, online communication, and online runtime by 9.81%, 9.17%, and 8.9% respectively.</div></div>","PeriodicalId":48638,"journal":{"name":"Journal of Information Security and Applications","volume":"91 ","pages":"Article 104060"},"PeriodicalIF":3.7000,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Vertin: Fast, Communication-friendly and Key-compact secure inference system for NNs and LLMs\",\"authors\":\"Xin Bie , Zhenhua Liu , Han Liang\",\"doi\":\"10.1016/j.jisa.2025.104060\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Existing secure inference schemes based on function secret sharing (FSS) allow the client to obtain inference results while protecting the client’s inputs, the server’s neural networks (NNs), and large language models (LLMs), ensuring high online efficiency. However, there is still room for improvement in terms of storage, communication, and inference speed for linear layers in these schemes. In this work, we introduce a novel semi-honest secure two-party inference system tailored for NNs and LLMs, which surpasses state-of-the-art solutions in speed, communication efficiency, and key storage. Our system leverages plaintext weight matrices for the server, introducing <em>FMLO</em>, a secure two-party computation protocol supporting linear operations. By using precomputed random matrices correlated with weight matrices, <em>FMLO</em> minimizes key storage, online computation, and communication demands. We also develop two efficient protocols, <span><math><msub><mrow><mi>π</mi></mrow><mrow><mi>M</mi><mi>u</mi><mi>l</mi><mi>P</mi><mi>r</mi><mi>e</mi></mrow></msub></math></span> for matrix multiplication and <span><math><msub><mrow><mi>π</mi></mrow><mrow><mi>C</mi><mi>o</mi><mi>n</mi><mi>v</mi><mi>P</mi><mi>r</mi><mi>e</mi></mrow></msub></math></span> for matrix convolution, by using vector oblivious linear evaluation. Both protocols batch-generate required random numbers securely in the offline phase, reducing preprocessing overhead in <em>FMLO</em>. Compared to the leading FSS-based scheme <em>Orca</em>, <em>Vertin</em> reduces key storage by 5.37%, online communication by 16.46%, and online inference time by 10.71% in secure inference with ResNet-50. When compared to the state-of-the-art <em>SIGMA</em> on BERT-large model with the sequence length of 64, <em>Vertin</em> achieves reductions in key storage, online communication, and online runtime by 9.81%, 9.17%, and 8.9% respectively.</div></div>\",\"PeriodicalId\":48638,\"journal\":{\"name\":\"Journal of Information Security and Applications\",\"volume\":\"91 \",\"pages\":\"Article 104060\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-04-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Information Security and Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2214212625000973\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information Security and Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2214212625000973","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

现有的基于功能秘密共享（FSS）的安全推理方案，允许客户端在获得推理结果的同时，保护客户端的输入、服务器端的神经网络和大型语言模型（llm），保证了较高的在线效率。然而，这些方案在线性层的存储、通信和推理速度方面仍有改进的空间。在这项工作中，我们介绍了一种针对神经网络和llm量身定制的新型半诚实安全两方推理系统，该系统在速度、通信效率和密钥存储方面超过了最先进的解决方案。我们的系统利用明文权重矩阵作为服务器，引入了FMLO，一种支持线性操作的安全的两方计算协议。通过使用与权重矩阵相关的预先计算的随机矩阵，FMLO最大限度地减少了密钥存储、在线计算和通信需求。我们还利用向量无关线性求值的方法，开发了两种高效的协议：用于矩阵乘法的πMulPre协议和用于矩阵卷积的πConvPre协议。两种协议都在脱机阶段安全地批量生成所需的随机数，从而减少了FMLO中的预处理开销。与领先的基于fss的方案Orca相比，Vertin在使用ResNet-50进行安全推理时，密钥存储减少5.37%，在线通信减少16.46%，在线推理时间减少10.71%。与目前最先进的序列长度为64的BERT-large模型上的SIGMA相比，Vertin在密钥存储、在线通信和在线运行时间方面分别减少了9.81%、9.17%和8.9%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Vertin: Fast, Communication-friendly and Key-compact secure inference system for NNs and LLMs

查看原文本刊更多论文

Vertin: Fast, Communication-friendly and Key-compact secure inference system for NNs and LLMs

Existing secure inference schemes based on function secret sharing (FSS) allow the client to obtain inference results while protecting the client’s inputs, the server’s neural networks (NNs), and large language models (LLMs), ensuring high online efficiency. However, there is still room for improvement in terms of storage, communication, and inference speed for linear layers in these schemes. In this work, we introduce a novel semi-honest secure two-party inference system tailored for NNs and LLMs, which surpasses state-of-the-art solutions in speed, communication efficiency, and key storage. Our system leverages plaintext weight matrices for the server, introducing FMLO, a secure two-party computation protocol supporting linear operations. By using precomputed random matrices correlated with weight matrices, FMLO minimizes key storage, online computation, and communication demands. We also develop two efficient protocols,

π_{M u l P r e}

for matrix multiplication and

π_{C o n v P r e}

for matrix convolution, by using vector oblivious linear evaluation. Both protocols batch-generate required random numbers securely in the offline phase, reducing preprocessing overhead in FMLO. Compared to the leading FSS-based scheme Orca, Vertin reduces key storage by 5.37%, online communication by 16.46%, and online inference time by 10.71% in secure inference with ResNet-50. When compared to the state-of-the-art SIGMA on BERT-large model with the sequence length of 64, Vertin achieves reductions in key storage, online communication, and online runtime by 9.81%, 9.17%, and 8.9% respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Information Security and Applications Computer Science-Computer Networks and Communications

CiteScore

10.90

自引率

5.40%

发文量

206

审稿时长

56 days

期刊介绍： Journal of Information Security and Applications (JISA) focuses on the original research and practice-driven applications with relevance to information security and applications. JISA provides a common linkage between a vibrant scientific and research community and industry professionals by offering a clear view on modern problems and challenges in information security, as well as identifying promising scientific and "best-practice" solutions. JISA issues offer a balance between original research work and innovative industrial approaches by internationally renowned information security experts and researchers.