Aaditya Naik, Aalok Thakkar, Adam Stein, R. Alur, Mayur Naik
{"title":"Relational Query Synthesis ⋈ Decision Tree Learning","authors":"Aaditya Naik, Aalok Thakkar, Adam Stein, R. Alur, Mayur Naik","doi":"10.14778/3626292.3626306","DOIUrl":"https://doi.org/10.14778/3626292.3626306","url":null,"abstract":"We study the problem of synthesizing a core fragment of relational queries called select-project-join (SPJ) queries from input-output examples. Search-based synthesis techniques are suited to synthesizing projections and joins by navigating the network of relational tables but require additional supervision for synthesizing comparison predicates. On the other hand, decision tree learning techniques are suited to synthesizing comparison predicates when the input database can be summarized as a single labelled relational table. In this paper, we adapt and interleave methods from the domains of relational query synthesis and decision tree learning, and present an end-to-end framework for synthesizing relational queries with categorical and numerical comparison predicates. Our technique guarantees the completeness of the synthesis procedure and strongly encourages minimality of the synthesized program. We present Libra, an implementation of this technique and evaluate it on a benchmark suite of 1,475 instances of queries over 159 databases with multiple tables. Libra solves 1,361 of these instances in an average of 59 seconds per instance. It outperforms state-of-the-art program synthesis tools Scythe and PatSQL in terms of both the running time and the quality of the synthesized programs.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139326866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jinyang Li, Y. Moskovitch, Julia Stoyanovich, H. V. Jagadish
{"title":"Query Refinement for Diversity Constraint Satisfaction","authors":"Jinyang Li, Y. Moskovitch, Julia Stoyanovich, H. V. Jagadish","doi":"10.14778/3626292.3626295","DOIUrl":"https://doi.org/10.14778/3626292.3626295","url":null,"abstract":"Diversity, group representation, and similar needs often apply to query results, which in turn require constraints on the sizes of various subgroups in the result set. Traditional relational queries only specify conditions as part of the query predicate(s), and do not support such restrictions on the output. In this paper, we study the problem of modifying queries to have the result satisfy constraints on the sizes of multiple subgroups in it. This problem, in the worst case, cannot be solved in polynomial time. Yet, with the help of provenance annotation, we are able to develop a query refinement method that works quite efficiently, as we demonstrate through extensive experiments.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139330918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sijing Duan, Feng Lyu, Xin Zhu, Yi Ding, Haotian Wang, Desheng Zhang, Xue Liu, Yaoxue Zhang, Ju Ren
{"title":"VeLP: Vehicle Loading Plan Learning from Human Behavior in Nationwide Logistics System","authors":"Sijing Duan, Feng Lyu, Xin Zhu, Yi Ding, Haotian Wang, Desheng Zhang, Xue Liu, Yaoxue Zhang, Ju Ren","doi":"10.14778/3626292.3626305","DOIUrl":"https://doi.org/10.14778/3626292.3626305","url":null,"abstract":"For a nationwide logistics transportation system, it is critical to make the vehicle loading plans (i.e., given many packages, deciding vehicle types and numbers) at each sorting and distribution center. This task is currently completed by dispatchers at each center in many logistics companies and consumes a lot of workloads for dispatchers. Existing works formulate such an issue as a cargo loading problem and solve it by combinatorial optimization methods. However, it cannot work in some real-world nationwide applications due to the lack of accurate cargo volume information and effective model design under complicated impact factors as well as temporal correlation. In this paper, we explore a new opportunity to utilize large-scale route and human behavior data (i.e., dispatchers' decision process on planning vehicles) to generate vehicle loading plans (i.e., plans). Specifically, we collect a five-month nationwide operational dataset from JD Logistics in China and comprehensively analyze human behaviors. Based on the data-driven analytics insights, we design a Vehicle Loading Plan learning model, named VeLP, which consists of a pattern mining module and a deep temporal cross neural network, to learn the human behaviors on regular and irregular routes, respectively. Extensive experiments demonstrate the superiority of VeLP, which achieves performance improvement by 35.8% and 50% for trunk and branch routes compared with baselines, respectively. Besides, we deployed VeLP in JDL and applied it in about 400 routes, reducing the time by approximately 20% in creating plans. It saves significant human workload and improves operational efficiency for the logistics company.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139330991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cryptographically Secure Private Record Linkage Using Locality-Sensitive Hashing","authors":"Ruidi Wei, F. Kerschbaum","doi":"10.14778/3626292.3626293","DOIUrl":"https://doi.org/10.14778/3626292.3626293","url":null,"abstract":"Private record linkage (PRL) is the problem of identifying pairs of records that approximately match across datasets in a secure, privacy-preserving manner. Two-party PRL specifically allows each of the parties to obtain records from the other party, only given that each record matches with one of their own. The privacy goal is that no other information about the datasets should be released than the matching records. A fundamental challenge is not to leak information while at the same time not comparing all pairs of records. In plaintext record linkage this is done using a blocking strategy, e.g., locality-sensitive hashing. One recent approach proposed by He et al. (ACM CCS 2017) uses locality-sensitive hashing and then releases a provably differential private representation of the hash bins. However, differential privacy still leaks some, although provable bounded information and does not protect against attacks, such as property inference attacks. Another recent approach by Khurram and Kerschbaum (IEEE ICDE 2020) uses locality-preserving hashing and provides cryptographic security, i.e., it releases no information except the output. However, locality-preserving hash functions are much harder to construct than locality-sensitive hash functions and hence accuracy of this approach is limited, particularly on larger datasets. In this paper, we address the open problem of providing cryptographic security of PRL while using locality-sensitive hash functions. Using recent results in oblivious algorithms, we design a new cryptographically secure PRL with locality-sensitive hash functions. Our prototypical implementation can match 40000 records in the British National Library/Toronto Public Library and the North Carolina Voter Registry datasets with 99.3% and 99.9% accuracy, respectively, in less than an hour which is more than an order of magnitude faster than Khurram and Kerschbaum's work at a higher accuracy.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139325314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Billion-Scale Bipartite Graph Embedding: A Global-Local Induced Approach","authors":"Xueyi Wu, Yuanyuan Xu, Wenjie Zhang, Ying Zhang","doi":"10.14778/3626292.3626300","DOIUrl":"https://doi.org/10.14778/3626292.3626300","url":null,"abstract":"Bipartite graph embedding (BGE), as the fundamental task in bipartite network analysis, is to map each node to compact low-dimensional vectors that preserve intrinsic properties. The existing solutions towards BGE fall into two groups: metric-based methods and graph neural network-based (GNN-based) methods. The latter typically generates higher-quality embeddings than the former due to the strong representation ability of deep learning. Nevertheless, none of the existing GNN-based methods can handle billion-scale bipartite graphs due to the expensive message passing or complex modelling choices. Hence, existing solutions face a challenge in achieving both embedding quality and model scalability. Motivated by this, we propose a novel graph neural network named AnchorGNN based on global-local learning framework, which can generate high-quality BGE and scale to billion-scale bipartite graphs. Concretely, AnchorGNN leverages a novel anchor-based message passing schema for global learning, which enables global knowledge to be incorporated to generate node embeddings. Meanwhile, AnchorGNN offers an efficient one-hop local structure modelling using maximum likelihood estimation for bipartite graphs with rational analysis, avoiding large adjacency matrix construction. Both global information and local structure are integrated to generate distinguishable node embeddings. Extensive experiments demonstrate that AnchorGNN outperforms the best competitor by up to 36% in accuracy and achieves up to 28 times speed-up against the only metric-based baseline on billion-scale bipartite graphs.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139327364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wangze Ni, Pengze Chen, Lei Chen, Peng Cheng, Chen Zhang, Xuemin Lin
{"title":"Utility-aware Payment Channel Network Rebalance","authors":"Wangze Ni, Pengze Chen, Lei Chen, Peng Cheng, Chen Zhang, Xuemin Lin","doi":"10.14778/3626292.3626301","DOIUrl":"https://doi.org/10.14778/3626292.3626301","url":null,"abstract":"The payment channel network (PCN) is a promising solution to increase the throughput of blockchains. However, unidirectional transactions can deplete a user's deposits in a payment channel (PC), reducing the success ratio of transactions (SRoT). To address this depletion issue, rebalance protocols are used to shift tokens from well-deposited PCs to under-deposited PCs. To improve SRoT, it is beneficial to increase the balance of a PC with a lower balance and a higher weight (i.e., more transaction executions rely on the PC). In this paper, we define the utility of a transaction and the utility-aware rebalance (UAR) problem. The utility of a transaction is proportional to the weight of the PC and the amount of the transaction, and inversely proportional to the balance of the receiver. To maximize the effect of improving SRoT, UAR aims to find a set of transactions with maximized utilities, satisfying the budget and conservation constraints. The budget constraint limits the number of tokens shifted in a PC. The conservation constraint requires that the number of tokens each user sends equals the number of tokens received. We prove that UAR is NP-hard and cannot be approximately solved with a constant ratio. Thus, we propose two heuristic algorithms, namely Circuit Greedy and UAR_DC. Extensive experiments show that our approaches outperform the existing approach by at least 3.16 times in terms of utilities.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139325443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Javad Ghareh Chamani, I. Demertzis, Dimitrios Papadopoulos, Charalampos Papamanthou, R. Jalili
{"title":"GraphOS: Towards Oblivious Graph Processing","authors":"Javad Ghareh Chamani, I. Demertzis, Dimitrios Papadopoulos, Charalampos Papamanthou, R. Jalili","doi":"10.14778/3625054.3625067","DOIUrl":"https://doi.org/10.14778/3625054.3625067","url":null,"abstract":"We propose GraphOS, a system that allows a client that owns a graph database to outsource it to an untrusted server for storage and querying. It relies on doubly-oblivious primitives and trusted hardware to achieve a very strong privacy and efficiency notion which we call oblivious graph processing : the server learns nothing besides the number of graph vertexes and edges, and for each query its type and response size. At a technical level, GraphOS stores the graph on a doubly-oblivious data structure , so that all vertex/edge accesses are indistinguishable. For this purpose, we propose Omix++, a novel doubly-oblivious map that outperforms the previous state of the art by up to 34×, and may be of independent interest. Moreover, to avoid any leakage from CPU instruction-fetching during query evaluation, we propose algorithms for four fundamental graph queries (BFS/DFS traversal, minimum spanning tree, and single-source shortest paths) that have a fixed execution trace , i.e., the sequence of executed operations is independent of the input. By combining these techniques, we eliminate all information that a hardware adversary observing the memory access pattern within the protected enclave can infer. We benchmarked GraphOS against the best existing solution, based on oblivious relational DBMS (translating graph queries to relational operators). GraphOS is not only significantly more performant (by up to two orders of magnitude for our tested graphs) but it eliminates leakage related to the graph topology that is practically inherent when a relational DBMS is used unless all operations are \"padded\" to the worst case.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139343351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lina Qiu, Georgios Kellaris, N. Mamoulis, Kobbi Nissim, G. Kollios
{"title":"Doquet: Differentially Oblivious Range and Join Queries with Private Data Structures","authors":"Lina Qiu, Georgios Kellaris, N. Mamoulis, Kobbi Nissim, G. Kollios","doi":"10.14778/3625054.3625055","DOIUrl":"https://doi.org/10.14778/3625054.3625055","url":null,"abstract":"Most cloud service providers offer limited data privacy guarantees, discouraging clients from using them for managing their sensitive data. Cloud providers may use servers with Trusted Execution Environments (TEEs) to protect outsourced data, while supporting remote querying. However, TEEs may leak access patterns and allow communication volume attacks, enabling an honest-but-curious cloud provider to learn sensitive information. Oblivious algorithms can be used to completely hide data access patterns, but their high overhead could render them impractical. To alleviate the latter, the notion of Differential Obliviousness (DO) has been recently proposed. DO applies differential privacy (DP) on access patterns while hiding the communication volume of intermediate and final results; it does so by trading some level of privacy for efficiency. We present Doquet: D ifferentially O blivious Range and Join Que ries with Private Data Struc t ures, a framework for DO outsourced database systems. Doquet is the first approach that supports private data structures, indices, selection, foreign key join, many-to-many join, and their composition select-join in a realistic TEE setting, even when the accesses to the private memory can be eavesdropped on by the adversary. We prove that the algorithms in Doquet satisfy differential obliviousness. Furthermore, we implemented Doquet and tested it on a machine having a second generation of Intel SGX (TEE); the results show that Doquet offers up to an order of magnitude speedup in comparison with other fully oblivious and differentially oblivious approaches.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139344606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Melissourgos, Haibo Wang, Shigang Chen, Chaoyi Ma, Shiping Chen
{"title":"Single Update Sketch with Variable Counter Structure","authors":"D. Melissourgos, Haibo Wang, Shigang Chen, Chaoyi Ma, Shiping Chen","doi":"10.14778/3625054.3625065","DOIUrl":"https://doi.org/10.14778/3625054.3625065","url":null,"abstract":"Per-flow size measurement is key to many streaming applications and management systems, particularly in high-speed networks. Performing such measurement on the data plane of a network device at the line rate requires on-chip memory and computing resources that are shared by other key network functions. It leads to the need for very compact and fast data structures, called sketches, which trade off space for accuracy. Such a need also arises in other application context for extremely large data sets. The goal of sketch design is two-fold: to measure flow size as accurately as possible and to do so as efficiently as possible (for low overhead and thus high processing throughput). The existing sketches can be broadly categorized to multi-update sketches and single update sketches. The former are more accurate but carry larger overhead. The latter incur small overhead but their accuracy is poor. This paper proposes a Single update Sketch with a Variable counter Structure (SSVS), a new sketch design which is several times faster than the existing multi-update sketches with comparable accuracy, and is several times more accurate than the existing single update sketches with comparable overhead. The new sketch design embodies several technical contributions that integrate the enabling properties from both multi-update sketches and single update sketches in a novel structure that effectively controls the measurement error with minimum processing overhead.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139345716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rong Gu, Han Li, Haipeng Dai, Wenjie Huang, Jie Xue, Meng Li, Jiaqi Zheng, Haoran Cai, Yihua Huang, Guihai Chen
{"title":"ShadowAQP: Efficient Approximate Group-by and Join Query via Attribute-oriented Sample Size Allocation and Data Generation","authors":"Rong Gu, Han Li, Haipeng Dai, Wenjie Huang, Jie Xue, Meng Li, Jiaqi Zheng, Haoran Cai, Yihua Huang, Guihai Chen","doi":"10.14778/3625054.3625059","DOIUrl":"https://doi.org/10.14778/3625054.3625059","url":null,"abstract":"Approximate query processing (AQP) is one of the key techniques to cope with big data querying problem on account that it obtains approximate answers efficiently. To address non-trivial sample selection and heavy sampling cost issues in AQP, we propose ShadowAQP, an efficient and accurate approach based on attribute-oriented sample size allocation and data generation. We select samples according to group-by and join attributes, and determine the sample size for each group of unique value combinations to improve query accuracy. We design a conditional variational autoencoder model with automatic table data encoding and model update strategies. To further improve accuracy and efficiency, we propose a set of extensions, including parallel multi-round sampling aggregation, data outlier-aware sampling, and dimension reduction optimization. Evaluation results on diversified datasets show that, compared with SOTA approaches, ShadowAQP achieves 5.8× query speed performance improvement on average (up to 12.8×), while reducing query error by 74% on average (up to 95%) at the same time.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139346226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}