Junjian Shi;Ye Han;Xiaojie Guo;Zekun Fei;Zheli Liu;Siyi Lv;Tong Li;Xiaotao Liu
{"title":"SMPCache: Towards More Efficient SQL Queries in Multi-Party Collaborative Data Analysis","authors":"Junjian Shi;Ye Han;Xiaojie Guo;Zekun Fei;Zheli Liu;Siyi Lv;Tong Li;Xiaotao Liu","doi":"10.1109/TKDE.2025.3535944","DOIUrl":null,"url":null,"abstract":"Privacy-preserving collaborative data analysis is a popular research direction in recent years. Among all such analysis tasks, privacy-preserving SQL queries on multi-party databases are of particular industrial interest. Although the privacy concern can be addressed by many cryptographic tools, such as secure multi-party computation (MPC), the efficiency of executing such SQL queries is far from satisfactory, especially for high-volume databases. In particular, existing MPC-based solutions treat each SQL query as an isolated task and launch it from scratch, in spite of the nature that many SQL queries are done regularly and somewhat overlap in their functionalities. In this work, we are motivated to exploit this nature to improve the efficiency of MPC-based, privacy-preserving SQL queries. We introduce a cache-like optimization mechanism. To ensure a higher cache hit rate and reduce redundant MPC operators, we present a cache structure different from that of plain databases and design a set of cache strategies. Our optimization mechanism, SMPCache, can be built upon secret-sharing-based MPC frameworks, which attract much attention from the industry. To demonstrate the utility of SMPCache, we implement it on Rosetta, an open-source MPC library, and use real-world datasets to launch extensive experiments on some basic SQL operators (e.g., Filter, Order-by, Aggregation, and Inner-Join) and some representative composite SQL queries. To give a data point, we note that SMPCache can achieve most up to 3536× efficiency improvement on the TPC-DS dataset and 562× on the TPC-H dataset at a moderate storage cost. We also apply SMPCache to the basic SQL operators (Filter, Order-by, Group-by, Aggregation, and Inner-join) of the Secrecy framework, achieving up to 127.3× efficiency improvement.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 4","pages":"2111-2125"},"PeriodicalIF":8.9000,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Knowledge and Data Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10857626/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Privacy-preserving collaborative data analysis is a popular research direction in recent years. Among all such analysis tasks, privacy-preserving SQL queries on multi-party databases are of particular industrial interest. Although the privacy concern can be addressed by many cryptographic tools, such as secure multi-party computation (MPC), the efficiency of executing such SQL queries is far from satisfactory, especially for high-volume databases. In particular, existing MPC-based solutions treat each SQL query as an isolated task and launch it from scratch, in spite of the nature that many SQL queries are done regularly and somewhat overlap in their functionalities. In this work, we are motivated to exploit this nature to improve the efficiency of MPC-based, privacy-preserving SQL queries. We introduce a cache-like optimization mechanism. To ensure a higher cache hit rate and reduce redundant MPC operators, we present a cache structure different from that of plain databases and design a set of cache strategies. Our optimization mechanism, SMPCache, can be built upon secret-sharing-based MPC frameworks, which attract much attention from the industry. To demonstrate the utility of SMPCache, we implement it on Rosetta, an open-source MPC library, and use real-world datasets to launch extensive experiments on some basic SQL operators (e.g., Filter, Order-by, Aggregation, and Inner-Join) and some representative composite SQL queries. To give a data point, we note that SMPCache can achieve most up to 3536× efficiency improvement on the TPC-DS dataset and 562× on the TPC-H dataset at a moderate storage cost. We also apply SMPCache to the basic SQL operators (Filter, Order-by, Group-by, Aggregation, and Inner-join) of the Secrecy framework, achieving up to 127.3× efficiency improvement.
期刊介绍:
The IEEE Transactions on Knowledge and Data Engineering encompasses knowledge and data engineering aspects within computer science, artificial intelligence, electrical engineering, computer engineering, and related fields. It provides an interdisciplinary platform for disseminating new developments in knowledge and data engineering and explores the practicality of these concepts in both hardware and software. Specific areas covered include knowledge-based and expert systems, AI techniques for knowledge and data management, tools, and methodologies, distributed processing, real-time systems, architectures, data management practices, database design, query languages, security, fault tolerance, statistical databases, algorithms, performance evaluation, and applications.