{"title":"Vertin:用于神经网络和llm的快速,通信友好和密钥紧凑的安全推理系统","authors":"Xin Bie , Zhenhua Liu , Han Liang","doi":"10.1016/j.jisa.2025.104060","DOIUrl":null,"url":null,"abstract":"<div><div>Existing secure inference schemes based on function secret sharing (FSS) allow the client to obtain inference results while protecting the client’s inputs, the server’s neural networks (NNs), and large language models (LLMs), ensuring high online efficiency. However, there is still room for improvement in terms of storage, communication, and inference speed for linear layers in these schemes. In this work, we introduce a novel semi-honest secure two-party inference system tailored for NNs and LLMs, which surpasses state-of-the-art solutions in speed, communication efficiency, and key storage. Our system leverages plaintext weight matrices for the server, introducing <em>FMLO</em>, a secure two-party computation protocol supporting linear operations. By using precomputed random matrices correlated with weight matrices, <em>FMLO</em> minimizes key storage, online computation, and communication demands. We also develop two efficient protocols, <span><math><msub><mrow><mi>π</mi></mrow><mrow><mi>M</mi><mi>u</mi><mi>l</mi><mi>P</mi><mi>r</mi><mi>e</mi></mrow></msub></math></span> for matrix multiplication and <span><math><msub><mrow><mi>π</mi></mrow><mrow><mi>C</mi><mi>o</mi><mi>n</mi><mi>v</mi><mi>P</mi><mi>r</mi><mi>e</mi></mrow></msub></math></span> for matrix convolution, by using vector oblivious linear evaluation. Both protocols batch-generate required random numbers securely in the offline phase, reducing preprocessing overhead in <em>FMLO</em>. Compared to the leading FSS-based scheme <em>Orca</em>, <em>Vertin</em> reduces key storage by 5.37%, online communication by 16.46%, and online inference time by 10.71% in secure inference with ResNet-50. When compared to the state-of-the-art <em>SIGMA</em> on BERT-large model with the sequence length of 64, <em>Vertin</em> achieves reductions in key storage, online communication, and online runtime by 9.81%, 9.17%, and 8.9% respectively.</div></div>","PeriodicalId":48638,"journal":{"name":"Journal of Information Security and Applications","volume":"91 ","pages":"Article 104060"},"PeriodicalIF":3.8000,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Vertin: Fast, Communication-friendly and Key-compact secure inference system for NNs and LLMs\",\"authors\":\"Xin Bie , Zhenhua Liu , Han Liang\",\"doi\":\"10.1016/j.jisa.2025.104060\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Existing secure inference schemes based on function secret sharing (FSS) allow the client to obtain inference results while protecting the client’s inputs, the server’s neural networks (NNs), and large language models (LLMs), ensuring high online efficiency. However, there is still room for improvement in terms of storage, communication, and inference speed for linear layers in these schemes. In this work, we introduce a novel semi-honest secure two-party inference system tailored for NNs and LLMs, which surpasses state-of-the-art solutions in speed, communication efficiency, and key storage. Our system leverages plaintext weight matrices for the server, introducing <em>FMLO</em>, a secure two-party computation protocol supporting linear operations. By using precomputed random matrices correlated with weight matrices, <em>FMLO</em> minimizes key storage, online computation, and communication demands. We also develop two efficient protocols, <span><math><msub><mrow><mi>π</mi></mrow><mrow><mi>M</mi><mi>u</mi><mi>l</mi><mi>P</mi><mi>r</mi><mi>e</mi></mrow></msub></math></span> for matrix multiplication and <span><math><msub><mrow><mi>π</mi></mrow><mrow><mi>C</mi><mi>o</mi><mi>n</mi><mi>v</mi><mi>P</mi><mi>r</mi><mi>e</mi></mrow></msub></math></span> for matrix convolution, by using vector oblivious linear evaluation. Both protocols batch-generate required random numbers securely in the offline phase, reducing preprocessing overhead in <em>FMLO</em>. Compared to the leading FSS-based scheme <em>Orca</em>, <em>Vertin</em> reduces key storage by 5.37%, online communication by 16.46%, and online inference time by 10.71% in secure inference with ResNet-50. When compared to the state-of-the-art <em>SIGMA</em> on BERT-large model with the sequence length of 64, <em>Vertin</em> achieves reductions in key storage, online communication, and online runtime by 9.81%, 9.17%, and 8.9% respectively.</div></div>\",\"PeriodicalId\":48638,\"journal\":{\"name\":\"Journal of Information Security and Applications\",\"volume\":\"91 \",\"pages\":\"Article 104060\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2025-04-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Information Security and Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2214212625000973\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information Security and Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2214212625000973","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Vertin: Fast, Communication-friendly and Key-compact secure inference system for NNs and LLMs
Existing secure inference schemes based on function secret sharing (FSS) allow the client to obtain inference results while protecting the client’s inputs, the server’s neural networks (NNs), and large language models (LLMs), ensuring high online efficiency. However, there is still room for improvement in terms of storage, communication, and inference speed for linear layers in these schemes. In this work, we introduce a novel semi-honest secure two-party inference system tailored for NNs and LLMs, which surpasses state-of-the-art solutions in speed, communication efficiency, and key storage. Our system leverages plaintext weight matrices for the server, introducing FMLO, a secure two-party computation protocol supporting linear operations. By using precomputed random matrices correlated with weight matrices, FMLO minimizes key storage, online computation, and communication demands. We also develop two efficient protocols, for matrix multiplication and for matrix convolution, by using vector oblivious linear evaluation. Both protocols batch-generate required random numbers securely in the offline phase, reducing preprocessing overhead in FMLO. Compared to the leading FSS-based scheme Orca, Vertin reduces key storage by 5.37%, online communication by 16.46%, and online inference time by 10.71% in secure inference with ResNet-50. When compared to the state-of-the-art SIGMA on BERT-large model with the sequence length of 64, Vertin achieves reductions in key storage, online communication, and online runtime by 9.81%, 9.17%, and 8.9% respectively.
期刊介绍:
Journal of Information Security and Applications (JISA) focuses on the original research and practice-driven applications with relevance to information security and applications. JISA provides a common linkage between a vibrant scientific and research community and industry professionals by offering a clear view on modern problems and challenges in information security, as well as identifying promising scientific and "best-practice" solutions. JISA issues offer a balance between original research work and innovative industrial approaches by internationally renowned information security experts and researchers.