Future Generation Computer Systems-The International Journal of Escience最新文献

筛选
英文 中文
Identifying runtime libraries in statically linked linux binaries 识别静态链接 Linux 二进制文件中的运行时库
IF 6.2 2区 计算机科学
Future Generation Computer Systems-The International Journal of Escience Pub Date : 2024-11-13 DOI: 10.1016/j.future.2024.107602
Javier Carrillo-Mondéjar , Ricardo J. Rodríguez
{"title":"Identifying runtime libraries in statically linked linux binaries","authors":"Javier Carrillo-Mondéjar ,&nbsp;Ricardo J. Rodríguez","doi":"10.1016/j.future.2024.107602","DOIUrl":"10.1016/j.future.2024.107602","url":null,"abstract":"<div><div>Vulnerabilities in unpatched applications can originate from third-party dependencies in statically linked applications, as they must be relinked each time to take advantage of libraries that have been updated to fix any vulnerability. Despite this, malware binaries are often statically linked to ensure they run on target platforms and to complicate malware analysis. In this sense, identification of libraries in malware analysis becomes crucial to help filter out those library functions and focus on malware function analysis. In this paper, we introduce <span>MANTILLA</span>, a system for identifying runtime libraries in statically linked Linux-based binaries. Our system is based on <span>radare2</span> to identify functions and extract their features (independent of the underlying architecture of the binary) through static binary analysis and on the K-nearest neighbors supervised machine learning model and a majority rule to predict final values. <span>MANTILLA</span> is evaluated on a dataset consisting of binaries built for different architectures (<span>MIPSeb</span>, <span>ARMel</span>, <span>Intel x86</span>, and <span>Intel x86-64</span>) and different runtime libraries (<span>uClibc</span>, <span>glibc</span>, and <span>musl</span>), achieving very high accuracy. We also evaluate it in two case studies. First, using a dataset of binary files belonging to the <span>binutils</span> collection and second, using an IoT malware dataset. In both cases, good accuracy results are obtained both in terms of runtime library detection (94.4% and 95.5%, respectively) and architecture identification (100% and 98.6%, respectively).</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"164 ","pages":"Article 107602"},"PeriodicalIF":6.2,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142651752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High throughput edit distance computation on FPGA-based accelerators using HLS 利用 HLS 在基于 FPGA 的加速器上实现高吞吐量编辑距离计算
IF 6.2 2区 计算机科学
Future Generation Computer Systems-The International Journal of Escience Pub Date : 2024-11-12 DOI: 10.1016/j.future.2024.107591
Sebastiano Fabio Schifano , Marco Reggiani , Enrico Calore , Rino Micheloni , Alessia Marelli , Cristian Zambelli
{"title":"High throughput edit distance computation on FPGA-based accelerators using HLS","authors":"Sebastiano Fabio Schifano ,&nbsp;Marco Reggiani ,&nbsp;Enrico Calore ,&nbsp;Rino Micheloni ,&nbsp;Alessia Marelli ,&nbsp;Cristian Zambelli","doi":"10.1016/j.future.2024.107591","DOIUrl":"10.1016/j.future.2024.107591","url":null,"abstract":"<div><div>Edit distance is a computational grand challenge problem to quantify the minimum number of editing operations required to modify one string of characters to the other, finding many applications of natural language processing. In recent years, relevant and increasing interest has also emerged from deoxyribonucleic acid (DNA) applications, like Next Generation Sequencing and DNA storage technologies. Both applications share two crucial features: i) the information is coded into the four bases of DNA and ii) the level of operational noise is still high causing errors in the data, requiring inclusion in the workflow of the computation of algorithms such as the edit distance for finding similarities between sequences. To boost this computation many solutions are available in the literature. Among them, the FPGAs are largely used since the data domain of those applications is strings of 4 characters represented as two-bit values, inconveniently fitting the basic data types of ordinary CPUs and GPUs, with additional benefits of providing a high level of parallelism and low processing latency. This contribution presents a computing- and energy-efficient design implementing the edit distance algorithm combining metaprogramming and High-Level Synthesis. We also assess the performance of our design targeting recent FPGA-based accelerators. Our solution uses nearly 90% of FPGA basic-block hardware resources achieving about 90% of computing efficiency delivering a maximum throughput of 16.8 TCUPS and an energy efficiency of 46 Mpair/Joule, enabling the use of FPGAs as a new class of accelerators for High Performance Computing in DNA applications.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"164 ","pages":"Article 107591"},"PeriodicalIF":6.2,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142651742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
In silico framework for genome analysis 基因组分析的硅学框架
IF 6.2 2区 计算机科学
Future Generation Computer Systems-The International Journal of Escience Pub Date : 2024-11-12 DOI: 10.1016/j.future.2024.107585
M. Saqib Nawaz , M. Zohaib Nawaz , Yongshun Gong , Philippe Fournier-Viger , Abdoulaye Baniré Diallo
{"title":"In silico framework for genome analysis","authors":"M. Saqib Nawaz ,&nbsp;M. Zohaib Nawaz ,&nbsp;Yongshun Gong ,&nbsp;Philippe Fournier-Viger ,&nbsp;Abdoulaye Baniré Diallo","doi":"10.1016/j.future.2024.107585","DOIUrl":"10.1016/j.future.2024.107585","url":null,"abstract":"<div><div>Genomes hold the complete genetic information of an organism. Examining and analyzing genomic data plays a critical role in properly understanding an organism, particularly the main characteristics, functionalities, and evolving nature of harmful viruses. However, the rapid increase in genomic data poses new challenges and demands for extracting meaningful and valuable insights from large and complex genomic datasets. In this paper, a novel Framework for Genome Data Analysis (F4GDA), is developed that offers various methods for the analysis of viral genomic data in various forms. The framework’s methods can not only analyze the changes in genomes but also various genome contents. As a case study, the genomes of five SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) VoC (variants of concern), which are divided into three types/groups on the basis of geographical locations, are analyzed using this framework to investigate (1) the nucleotides, amino acids and synonymous codon changes in the whole genomes of VoC as well as in the Spike (S) protein, (2) whether different environments affect the rate of changes in genomes, (3) the variations in nucleotide bases, amino acids, and codon base compositions in VoC genomes, and (4) to compare VoC genomes with the reference genome sequence of SARS-CoV-2.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"164 ","pages":"Article 107585"},"PeriodicalIF":6.2,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142651828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive ensemble optimization for memory-related hyperparameters in retraining DNN at edge 在边缘重新训练 DNN 时对与记忆相关的超参数进行自适应集合优化
IF 6.2 2区 计算机科学
Future Generation Computer Systems-The International Journal of Escience Pub Date : 2024-11-10 DOI: 10.1016/j.future.2024.107600
Yidong Xu , Rui Han , Xiaojiang Zuo , Junyan Ouyang , Chi Harold Liu , Lydia Y. Chen
{"title":"Adaptive ensemble optimization for memory-related hyperparameters in retraining DNN at edge","authors":"Yidong Xu ,&nbsp;Rui Han ,&nbsp;Xiaojiang Zuo ,&nbsp;Junyan Ouyang ,&nbsp;Chi Harold Liu ,&nbsp;Lydia Y. Chen","doi":"10.1016/j.future.2024.107600","DOIUrl":"10.1016/j.future.2024.107600","url":null,"abstract":"<div><div>Edge applications are increasingly empowered by deep neural networks (DNN) and face the challenges of adapting or retraining models for the changes in input data domains and learning tasks. The existing techniques to enable DNN retraining on edge devices are to configure the memory-related hyperparameters, termed <em>m</em>-hyperparameters, via batch size reduction, parameter freezing, and gradient checkpoint. While those methods show promising results for static DNNs, little is known about how to online and opportunistically optimize all their <em>m</em>-hyperparameters, especially for retraining tasks of edge applications. In this paper, we propose, MPOptimizer, which jointly optimizes an ensemble of <em>m</em>-hyperparameters according to the input distribution and available edge resources at runtime. The key feature of MPOptimizer is to easily emulate the execution of retraining tasks under different <em>m</em>-hyperparameters and thus effectively estimate their influence on task performance. We implement MPOptimizer on prevalent DNNs and demonstrate its effectiveness against state-of-the-art techniques, i.e. successfully find the best configuration that improves model accuracy by an average of 13% (up to 25.3%) while reducing memory and training time by 4.1x and 5.3x under the same model accuracies.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"164 ","pages":"Article 107600"},"PeriodicalIF":6.2,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142651663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Convergence-aware optimal checkpointing for exploratory deep learning training jobs 针对探索性深度学习训练工作的收敛感知优化检查点功能
IF 6.2 2区 计算机科学
Future Generation Computer Systems-The International Journal of Escience Pub Date : 2024-11-08 DOI: 10.1016/j.future.2024.107597
Hongliang Li , Zichen Wang , Hairui Zhao , Meng Zhang , Xiang Li , Haixiao Xu
{"title":"Convergence-aware optimal checkpointing for exploratory deep learning training jobs","authors":"Hongliang Li ,&nbsp;Zichen Wang ,&nbsp;Hairui Zhao ,&nbsp;Meng Zhang ,&nbsp;Xiang Li ,&nbsp;Haixiao Xu","doi":"10.1016/j.future.2024.107597","DOIUrl":"10.1016/j.future.2024.107597","url":null,"abstract":"<div><div>Training Deep Learning (DL) models are becoming more time-consuming, thus interruptions to the training processes are inevitable. We can obtain an optimal checkpointing interval to minimize the fault tolerance overhead for a HPC (High Performance Computing) job with the precondition that the job progress is proportional to its execution time. Unfortunately, it is not the case in DL model training, where a DL training job yields diminishing returns across its lifetime. Meanwhile, training DL models is inherently exploratory, with early termination frequently occurring during model training&amp;developing. It makes the early progress of a DL training job more valuable than the later ones. Even placement of checkpoints would either increase the risks in the early stages or waste resources overprotecting the latter stages. Moreover, in data parallelism, the state-of-the-art quality-driven scheduling strategies allocate more resources for the early stages of a job than the later ones to accelerate the training progress, which further amplifies the issue. In summary, the early stage is more important than the later stages. Allocating more fault-tolerant resources to the early stages is beneficial for the model exploration. Based on the aforementioned conclusion, we present COCI, an approach to compute optimal checkpointing configuration for a exploratory DL training job, minimizing the fault tolerance overhead, including checkpoint cost and recovery cost. We implement COCI based on state-of-the-art iteration-level checkpointing mechanism, as a pluggable module compatible with PyTorch without extra user input. The experimental results show that COCI reduces up to 40.18% fault tolerance overhead compared to existing state-of-the-art DL fault tolerance methods in serial scenario, 60.64% in data parallel scenario.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"164 ","pages":"Article 107597"},"PeriodicalIF":6.2,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142651744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FedGen: Personalized federated learning with data generation for enhanced model customization and class imbalance FedGen:带有数据生成功能的个性化联合学习,可增强模型定制和类不平衡性
IF 6.2 2区 计算机科学
Future Generation Computer Systems-The International Journal of Escience Pub Date : 2024-11-07 DOI: 10.1016/j.future.2024.107595
Peng Zhao , Shaocong Guo , Yanan Li , Shusen Yang , Xuebin Ren
{"title":"FedGen: Personalized federated learning with data generation for enhanced model customization and class imbalance","authors":"Peng Zhao ,&nbsp;Shaocong Guo ,&nbsp;Yanan Li ,&nbsp;Shusen Yang ,&nbsp;Xuebin Ren","doi":"10.1016/j.future.2024.107595","DOIUrl":"10.1016/j.future.2024.107595","url":null,"abstract":"<div><div>Federated learning has emerged as a prominent solution for the collaborative training of machine learning models without exchanging local data. However, existing approaches often impose rigid constraints on model heterogeneity, limiting the ability of clients to customize unique models and increasing the vulnerability of models to potential attacks. This paper presents FedGen, a novel personalized federated learning framework based on generative adversarial networks (GANs). FedGen shifts the focus from training task-specific models to generating data, especially for minority classes with imbalanced data. With FedGen, clients can gain knowledge from others by training generators, while maintaining a heterogeneous local model and avoiding sharing model information with other participants. Moreover, to address challenges arising from imbalanced data, we propose AT-GAN, a novel generative model incorporating pseudo augmentation and differentiable augmentation modules to foster healthy competition between the generator and discriminator. To evaluate the effectiveness of our approach, we conduct extensive experiments on real-world tabular datasets. The experimental results demonstrate that FedGen significantly enhances the performance of local models, achieving improvements of up to 11.92% in F1 score and up to 9.14% in MCC score compared to existing methods.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"164 ","pages":"Article 107595"},"PeriodicalIF":6.2,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142651851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Time-constrained persistent deletion for key–value store engine on ZNS SSD 在 ZNS 固态硬盘上为键值存储引擎提供时间受限的持续删除功能
IF 6.2 2区 计算机科学
Future Generation Computer Systems-The International Journal of Escience Pub Date : 2024-11-06 DOI: 10.1016/j.future.2024.107598
Shiqiang Nie, Tong Lei, Jie Niu, Qihan Hu, Song Liu, Weiguo Wu
{"title":"Time-constrained persistent deletion for key–value store engine on ZNS SSD","authors":"Shiqiang Nie,&nbsp;Tong Lei,&nbsp;Jie Niu,&nbsp;Qihan Hu,&nbsp;Song Liu,&nbsp;Weiguo Wu","doi":"10.1016/j.future.2024.107598","DOIUrl":"10.1016/j.future.2024.107598","url":null,"abstract":"<div><div>The inherent out-of-place update characteristic of the Log-Structured Merge tree (LSM tree) cannot guarantee persistent deletion within a specific time window, leading to potential data privacy and security issues. Existing solutions like Lethe-Fade ensure time-constrained persistent deletion but introduce considerable write overhead, worsening the write amplification issue, particularly for key–value stores on ZNS SSD. To address this problem, we propose a zone-aware persistent deletion scheme for key–value store engines. Targeting mitigating the write amplification induced by level compaction, we design an adaptive SSTable selection strategy for each level in the LSM tree. Additionally, as the SSTable with deletion records would become invalid after the persistent deletion timer reaches its threshold, we design a tombstone-aware zone allocation strategy to reduce the data migration induced by garbage collection. In further, we optimize the victim zone selection in GC to reduce the invalid migration of tombstone files. Experimental results demonstrate that our scheme effectively ensures that most outdated physical versions are deleted before reaching the persistent deletion time threshold. When deleting 10% of keys in the key–value store engine, this scheme reduces write amplification by 74.7% and the garbage collection-induced write by 87.3% compared to the Lethe-Fade scheme.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"164 ","pages":"Article 107598"},"PeriodicalIF":6.2,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142651746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RNC-DP: A personalized trajectory data publishing scheme combining road network constraints and GAN RNC-DP:结合路网约束和 GAN 的个性化轨迹数据发布方案
IF 6.2 2区 计算机科学
Future Generation Computer Systems-The International Journal of Escience Pub Date : 2024-11-06 DOI: 10.1016/j.future.2024.107589
Hui Wang , Haiyang Li , Zihao Shen , Peiqian Liu
{"title":"RNC-DP: A personalized trajectory data publishing scheme combining road network constraints and GAN","authors":"Hui Wang ,&nbsp;Haiyang Li ,&nbsp;Zihao Shen ,&nbsp;Peiqian Liu","doi":"10.1016/j.future.2024.107589","DOIUrl":"10.1016/j.future.2024.107589","url":null,"abstract":"<div><div>The popularity of location-based services facilitates people’s lives to a certain extent and generates a large amount of trajectory data. Analyzing these data can contribute to society’s development and provide better location services for users, but it also faces the security problem of personal trajectory privacy leakage. However, existing methods often suffer from either excessive privacy protection or insufficient protection of individual privacy. Therefore, this paper proposes a personalized trajectory data publishing scheme combining road network constraints and GAN (RNC-DP). Firstly, after grid-representing the trajectory data, we remove the unreachable grids and define a trajectory generation constraint. Second, the proposed TraGM model synthesizes the trajectory data to meet the constraints. Again, during the trajectory data publishing process, the proposed TraDP mechanism performs k-means clustering on the synthesized trajectories and assigns appropriate privacy budgets to the clustered generalized trajectory location points. Finally, the protected trajectory data is published. Compared with the existing schemes, the proposed scheme improves privacy protection strength by 10.2%–41.2% while balancing data availability and has low time complexity.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"164 ","pages":"Article 107589"},"PeriodicalIF":6.2,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142651747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CMPNet: A cross-modal multi-scale perception network for RGB-T crowd counting CMPNet:用于 RGB-T 人群计数的跨模态多尺度感知网络
IF 6.2 2区 计算机科学
Future Generation Computer Systems-The International Journal of Escience Pub Date : 2024-11-06 DOI: 10.1016/j.future.2024.107596
Shihui Zhang , Kun Chen , Gangzheng Zhai , He Li , Shaojie Han
{"title":"CMPNet: A cross-modal multi-scale perception network for RGB-T crowd counting","authors":"Shihui Zhang ,&nbsp;Kun Chen ,&nbsp;Gangzheng Zhai ,&nbsp;He Li ,&nbsp;Shaojie Han","doi":"10.1016/j.future.2024.107596","DOIUrl":"10.1016/j.future.2024.107596","url":null,"abstract":"<div><div>The cross-modal crowd counting method demonstrates better scene adaptability under complex conditions by introducing independent supplementary information. However, existing methods still face problems such as insufficient fusion of modal features, underutilization of crowd structure, and the neglect of scale information. In response to the above issues, this paper proposes a cross-modal multi-scale perception network (CMPNet). Specifically, CMPNet mainly consists of a cross-modal perception fusion module and a multi-scale feature aggregation module. The cross-modal perception fusion module effectively suppresses noise features while sharing features between different modalities, thereby significantly improving the robustness of the crowd counting process. The multi-scale feature aggregation module obtains rich crowd structure information through a spatial context aware graph convolution unit, and then integrates feature information from different scales to enhance the network’s perception ability of crowd density. To the best of our knowledge, CMPNet is the first attempt to model the crowd structure and mine its semantics in the field of cross-modal crowd counting. The experimental results show that CMPNet achieves state-of-the-art performance on all RGB-T datasets, providing an effective solution for cross-modal crowd counting. We will release the code at <span><span>https://github.com/KunChenKKK/CMPNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"164 ","pages":"Article 107596"},"PeriodicalIF":6.2,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142651745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Private approximate nearest neighbor search for on-chain data based on locality-sensitive hashing 基于位置敏感哈希算法的链上数据私有近似近邻搜索
IF 6.2 2区 计算机科学
Future Generation Computer Systems-The International Journal of Escience Pub Date : 2024-11-05 DOI: 10.1016/j.future.2024.107586
Siyuan Shang , Xuehui Du , Xiaohan Wang, Aodi Liu
{"title":"Private approximate nearest neighbor search for on-chain data based on locality-sensitive hashing","authors":"Siyuan Shang ,&nbsp;Xuehui Du ,&nbsp;Xiaohan Wang,&nbsp;Aodi Liu","doi":"10.1016/j.future.2024.107586","DOIUrl":"10.1016/j.future.2024.107586","url":null,"abstract":"<div><div>Blockchain manages data with immutability, decentralization and traceability, offering new solutions for traditional information systems and greatly facilitating data sharing. However, on-chain data query still faces challenges such as low efficiency and difficulty in privacy protection. We propose a private Approximate Nearest Neighbor (ANN) search method for on-chain data based on Locality-Sensitive Hashing (LSH), which mainly includes two steps: query initialization and query implementation. In query initialization, the data management node builds hash tables for on-chain data through improved LSH, which are encrypted and stored on the blockchain using attribute-based encryption. In query implementation, node with correct privileges utilizes random smart contracts to query on-chain data privately by distributed point function and a privacy protection technique called oblivious masking. To validate the effectiveness of this method, we compare the performance with two ANN search algorithms, the query time is reduced by 57% and 59.2%, the average recall is increased by 4.5% and 2%, the average precision is increased by 7.7% and 6.9%, the average F1-score is increased by 6% and 4.3%, the average initialization time is reduced by 34 times and 122 times, respectively. We also compare the performance with private ANN search methods using homomorphic encryption, differential privacy and secure multi-party computation. The results show that our method can reduce the query time by several orders of magnitude, which is more applicable to the blockchain environment. To the best of our knowledge, this is the first private ANN search method for on-chain data, which consider the query efficiency and privacy protection, achieving efficient, accurate, and private data query.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"164 ","pages":"Article 107586"},"PeriodicalIF":6.2,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142651664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信