Private approximate nearest neighbor search for on-chain data based on locality-sensitive hashing

IF 6.2 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Future Generation Computer Systems-The International Journal of Escience Pub Date : 2024-11-05 DOI:10.1016/j.future.2024.107586

Siyuan Shang , Xuehui Du , Xiaohan Wang, Aodi Liu

{"title":"Private approximate nearest neighbor search for on-chain data based on locality-sensitive hashing","authors":"Siyuan Shang , Xuehui Du , Xiaohan Wang, Aodi Liu","doi":"10.1016/j.future.2024.107586","DOIUrl":null,"url":null,"abstract":"<div><div>Blockchain manages data with immutability, decentralization and traceability, offering new solutions for traditional information systems and greatly facilitating data sharing. However, on-chain data query still faces challenges such as low efficiency and difficulty in privacy protection. We propose a private Approximate Nearest Neighbor (ANN) search method for on-chain data based on Locality-Sensitive Hashing (LSH), which mainly includes two steps: query initialization and query implementation. In query initialization, the data management node builds hash tables for on-chain data through improved LSH, which are encrypted and stored on the blockchain using attribute-based encryption. In query implementation, node with correct privileges utilizes random smart contracts to query on-chain data privately by distributed point function and a privacy protection technique called oblivious masking. To validate the effectiveness of this method, we compare the performance with two ANN search algorithms, the query time is reduced by 57% and 59.2%, the average recall is increased by 4.5% and 2%, the average precision is increased by 7.7% and 6.9%, the average F1-score is increased by 6% and 4.3%, the average initialization time is reduced by 34 times and 122 times, respectively. We also compare the performance with private ANN search methods using homomorphic encryption, differential privacy and secure multi-party computation. The results show that our method can reduce the query time by several orders of magnitude, which is more applicable to the blockchain environment. To the best of our knowledge, this is the first private ANN search method for on-chain data, which consider the query efficiency and privacy protection, achieving efficient, accurate, and private data query.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"164 ","pages":"Article 107586"},"PeriodicalIF":6.2000,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X24005508","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Blockchain manages data with immutability, decentralization and traceability, offering new solutions for traditional information systems and greatly facilitating data sharing. However, on-chain data query still faces challenges such as low efficiency and difficulty in privacy protection. We propose a private Approximate Nearest Neighbor (ANN) search method for on-chain data based on Locality-Sensitive Hashing (LSH), which mainly includes two steps: query initialization and query implementation. In query initialization, the data management node builds hash tables for on-chain data through improved LSH, which are encrypted and stored on the blockchain using attribute-based encryption. In query implementation, node with correct privileges utilizes random smart contracts to query on-chain data privately by distributed point function and a privacy protection technique called oblivious masking. To validate the effectiveness of this method, we compare the performance with two ANN search algorithms, the query time is reduced by 57% and 59.2%, the average recall is increased by 4.5% and 2%, the average precision is increased by 7.7% and 6.9%, the average F1-score is increased by 6% and 4.3%, the average initialization time is reduced by 34 times and 122 times, respectively. We also compare the performance with private ANN search methods using homomorphic encryption, differential privacy and secure multi-party computation. The results show that our method can reduce the query time by several orders of magnitude, which is more applicable to the blockchain environment. To the best of our knowledge, this is the first private ANN search method for on-chain data, which consider the query efficiency and privacy protection, achieving efficient, accurate, and private data query.

查看原文本刊更多论文

基于位置敏感哈希算法的链上数据私有近似近邻搜索

区块链管理数据具有不可篡改性、去中心化和可追溯性，为传统信息系统提供了新的解决方案，极大地促进了数据共享。然而，链上数据查询仍面临效率低、隐私保护难等挑战。我们提出了一种基于位置敏感散列（LSH）的链上数据私有近似近邻（ANN）搜索方法，主要包括查询初始化和查询实现两个步骤。在查询初始化中，数据管理节点通过改进的 LSH 为链上数据建立哈希表，并使用基于属性的加密技术将哈希表加密后存储在区块链上。在查询执行过程中，拥有正确权限的节点利用随机智能合约，通过分布式点函数和一种称为遗忘掩码的隐私保护技术，私下查询链上数据。为了验证这种方法的有效性，我们将其与两种 ANN 搜索算法进行了性能比较，结果显示，查询时间分别缩短了 57% 和 59.2%，平均召回率分别提高了 4.5% 和 2%，平均精度分别提高了 7.7% 和 6.9%，平均 F1 分数分别提高了 6% 和 4.3%，平均初始化时间分别缩短了 34 倍和 122 倍。我们还比较了使用同态加密、差分隐私和安全多方计算的私有 ANN 搜索方法的性能。结果表明，我们的方法可以将查询时间缩短几个数量级，更适用于区块链环境。据我们所知，这是第一种考虑查询效率和隐私保护的链上数据私有 ANN 搜索方法，实现了高效、准确和私有的数据查询。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Future Generation Computer Systems-The International Journal of Escience 工程技术-计算机：理论方法

CiteScore

19.90

自引率

2.70%

发文量

376

审稿时长

10.6 months

期刊介绍： Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications. Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration. Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.