Ben McCamish, Vahid Ghadakchi, Arash Termehchy, B. Touri, Liang Huang
{"title":"The Data Interaction Game","authors":"Ben McCamish, Vahid Ghadakchi, Arash Termehchy, B. Touri, Liang Huang","doi":"10.1145/3183713.3196899","DOIUrl":"https://doi.org/10.1145/3183713.3196899","url":null,"abstract":"As many users do not precisely know the structure and/or the content of databases, their queries do not exactly reflect their information needs. The database management systems (DBMS) may interact with users and leverage their feedback on the returned results to learn the information needs behind users' queries. Current query interfaces assume that users follow a fixed strategy of expressing their information needs, that is, the likelihood by which a user submits a query to express an information need remains unchanged during her interaction with the DBMS. Using a real-world interaction workload, we show that users learn and modify how to express their information needs during their interactions with the DBMS. We also show that users' learning is accurately modeled by a well-known reinforcement learning mechanism. As current data interaction systems assume that users do not modify their strategies, they cannot discover the information needs behind users' queries effectively. We model the interaction between users and DBMS as a game with identical interest between two rational agents whose goal is to establish a common language for representing information needs in form of queries. We propose a reinforcement learning method that learns and answers the information needs behind queries and adapts to the changes in users' strategies and prove that it improves the effectiveness of answering queries stochastically speaking. We analyze the challenges of efficient implementation of this method over large-scale relational databases and propose two efficient adaptations of this algorithm over large-scale relational databases. Our extensive empirical studies over real-world query workloads and large-scale relational databases indicate that our algorithms are efficient. Our empirical results also show that our proposed learning mechanism is more effective than the state-of-the-art query answering method.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"49 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86726385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jinfeng Li, Xiao Yan, Jian Zhang, An Xu, James Cheng, Jie Liu, K. K. Ng, Ti-Chung Cheng
{"title":"A General and Efficient Querying Method for Learning to Hash","authors":"Jinfeng Li, Xiao Yan, Jian Zhang, An Xu, James Cheng, Jie Liu, K. K. Ng, Ti-Chung Cheng","doi":"10.1145/3183713.3183750","DOIUrl":"https://doi.org/10.1145/3183713.3183750","url":null,"abstract":"As an effective solution to the approximate nearest neighbors (ANN) search problem, learning to hash (L2H) is able to learn similarity-preserving hash functions tailored for a given dataset. However, existing L2H research mainly focuses on improving query performance by learning good hash functions, while Hamming ranking (HR) is used as the default querying method. We show by analysis and experiments that Hamming distance, the similarity indicator used in HR, is too coarse-grained and thus limits the performance of query processing. We propose a new fine-grained similarity indicator, quantization distance (QD), which provides more information about the similarity between a query and the items in a bucket. We then develop two efficient querying methods based on QD, which achieve significantly better query performance than HR. Our methods are general and can work with various L2H algorithms. Our experiments demonstrate that a simple and elegant querying method can produce performance gain equivalent to advanced and complicated learning algorithms.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"57 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88236276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Selection of Geospatial Data on Maps for Interactive and Visualized Exploration","authors":"Tao Guo, Kaiyu Feng, G. Cong, Z. Bao","doi":"10.1145/3183713.3183738","DOIUrl":"https://doi.org/10.1145/3183713.3183738","url":null,"abstract":"With the proliferation of mobile devices, large collections of geospatial data are becoming available, such as geo-tagged photos. Map rendering systems play an important role in presenting such large geospatial datasets to end users. We propose that such systems should support the following desirable features: representativeness, visibility constraint, zooming consistency, and panning consistency. The first two constraints are fundamental challenges to a map exploration system, which aims to efficiently select a small set of representative objects from the current region of user's interest, and any two selected objects should not be too close to each other for users to distinguish in the limited space of a screen. We formalize it as the Spatial Object Selection (SOS) problem, prove that it is an NP-hard problem, and develop a novel approximation algorithm with performance guarantees. % To further support interactive exploration of geospatial data on maps, we propose the Interactive SOS (ISOS) problem, in which we enrich the SOS problem with the zooming consistency and panning consistency constraints. The objective of ISOS is to provide seamless experience for end-users to interactively explore the data by navigating the map. We extend our algorithm for the SOS problem to solve the ISOS problem, and propose a new strategy based on pre-fetching to significantly enhance the efficiency. Finally we have conducted extensive experiments to show the efficiency and scalability of our approach.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"63 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72788327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dan Zhang, Ryan McKenna, Ios Kotsogiannis, Michael Hay, Ashwin Machanavajjhala, G. Miklau
{"title":"EKTELO: A Framework for Defining Differentially-Private Computations","authors":"Dan Zhang, Ryan McKenna, Ios Kotsogiannis, Michael Hay, Ashwin Machanavajjhala, G. Miklau","doi":"10.1145/3183713.3196921","DOIUrl":"https://doi.org/10.1145/3183713.3196921","url":null,"abstract":"The adoption of differential privacy is growing but the complexity of designing private, efficient and accurate algorithms is still high. We propose a novel programming framework and system, Ektelo, for implementing both existing and new privacy algorithms. For the task of answering linear counting queries, we show that nearly all existing algorithms can be composed from operators, each conforming to one of a small number of operator classes. While past programming frameworks have helped to ensure the privacy of programs, the novelty of our framework is its significant support for authoring accurate and efficient (as well as private) programs. After describing the design and architecture of the Ektelo system, we show that Ektelo is expressive, that it allows for safer implementations through code reuse, and that it allows both privacy novices and experts to easily design algorithms. We demonstrate the use of Ektelo by designing several new state-of-the-art algorithms.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90054547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Speeding Up Set Intersections in Graph Algorithms using SIMD Instructions","authors":"Shuo Han, Lei Zou, J. Yu","doi":"10.1145/3183713.3196924","DOIUrl":"https://doi.org/10.1145/3183713.3196924","url":null,"abstract":"In this paper, we focus on accelerating a widely employed computing pattern --- set intersection, to boost a group of graph algorithms. Graph's adjacency-lists can be naturally considered as node sets, thus set intersection is a primitive operation in many graph algorithms. We propose QFilter, a set intersection algorithm using SIMD instructions. QFilter adopts a merge-based framework and compares two blocks of elements iteratively by SIMD instructions. The key insight for our improvement is that we quickly filter out most of unnecessary comparisons in one byte-checking step. We also present a binary representation called BSR that encodes sets in a compact layout. By combining QFilter and BSR, we achieve data-parallelism in two levels --- inter-chunk and intra-chunk parallelism. Moreover, we find that node ordering impacts the performance of intersection by affecting the compactness of BSR. We formulate the graph reordering problem as an optimization of the compactness of BSR, and prove its strong NP-completeness. Thus we propose an approximate algorithm that can find a better ordering to enhance the intra-chunk parallelism. We conduct extensive experiments to confirm that our approach can improve the performance of set intersection in graph algorithms significantly.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90747263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Kubernetes and the New Cloud","authors":"E. Brewer","doi":"10.1145/3183713.3183725","DOIUrl":"https://doi.org/10.1145/3183713.3183725","url":null,"abstract":"We are in the midst of shifting the notion of “Cloud” to a higher level of abstraction than virtual machines — one based on services, processes and APIs. Kubernetes epitomizes this shift and has rapidly become the de facto way to manage this new era of container-based applications. It aims to simplify the deployment and management of services, including the construction of applications as sets of interacting but independent services. We explain some of the key concepts in Kubernetes and Istio and show how they work together to simplify evolution, scaling and operations.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"61 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83980180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Manasi Vartak, Joana M. F. da Trindade, S. Madden, M. Zaharia
{"title":"MISTIQUE: A System to Store and Query Model Intermediates for Model Diagnosis","authors":"Manasi Vartak, Joana M. F. da Trindade, S. Madden, M. Zaharia","doi":"10.1145/3183713.3196934","DOIUrl":"https://doi.org/10.1145/3183713.3196934","url":null,"abstract":"Model diagnosis is the process of analyzing machine learning (ML) model performance to identify where the model works well and where it doesn't. It is a key part of the modeling process and helps ML developers iteratively improve model accuracy. Often, model diagnosis is performed by analyzing different datasets or intermediates associated with the model such as the input data and hidden representations learned by the model (e.g., [4, 24, 39,]). The bottleneck in fast model diagnosis is the creation and storage of model intermediates. Storing these intermediates requires tens to hundreds of GB of storage whereas re-running the model for each diagnostic query slows down model diagnosis. To address this bottleneck, we propose a system called MISTIQUE that can work with traditional ML pipelines as well as deep neural networks to efficiently capture, store, and query model intermediates for diagnosis. For each diagnostic query, MISTIQUE intelligently chooses whether to re-run the model or read a previously stored intermediate. For intermediates that are stored in MISTIQUE, we propose a range of optimizations to reduce storage footprint including quantization, summarization, and data de-duplication. We evaluate our techniques on a range of real-world ML models in scikit-learn and Tensorflow. We demonstrate that our optimizations reduce storage by up to 110X for traditional ML pipelines and up to 6X for deep neural networks. Furthermore, by using MISTIQUE, we can speed up diagnostic queries on traditional ML pipelines by up to 390X and 210X on deep neural networks.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76640855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Rating-Ranking Method for Crowdsourced Top-k Computation","authors":"Kaiyu Li, Xiaohang Zhang, Guoliang Li","doi":"10.1145/3183713.3183762","DOIUrl":"https://doi.org/10.1145/3183713.3183762","url":null,"abstract":"Crowdsourced top- k computation aims to utilize the human ability to identify Top- k objects from a given set of objects. Most of existing studies employ a pairwise comparison based method, which first asks workers to compare each pair of objects and then infers the Top- k results based on the pairwise comparison results. Obviously, it is quadratic to compare every object pair and these methods involve huge monetary cost, especially for large datasets. To address this problem, we propose a rating-ranking-based approach, which contains two types of questions to ask the crowd. The first is a rating question, which asks the crowd to give a score for an object. The second is a ranking question, which asks the crowd to rank several (e.g., 3) objects. Rating questions are coarse grained and can roughly get a score for each object, which can be used to prune the objects whose scores are much smaller than those of the Top- k objects. Ranking questions are fine grained and can be used to refine the scores. We propose a unified model to model the rating and ranking questions, and seamlessly combine them together to compute the Top- k results. We also study how to judiciously select appropriate rating or ranking questions and assign them to a coming worker. Experimental results on real datasets show that our method significantly outperforms existing approaches.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87439880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving Join Reorderability with Compensation Operators","authors":"Taining Wang, C. Chan","doi":"10.1145/3183713.3183731","DOIUrl":"https://doi.org/10.1145/3183713.3183731","url":null,"abstract":"A critical task in query optimization is the join reordering problem which is to find an efficient evaluation order for the join operators in a query plan. While the join reordering problem is well studied for queries with only inner-joins, the problem becomes considerably harder when outerjoins/antijoins are involved as such operators are generally not associative. The existing solutions for this problem do not enumerate the complete space of join orderings due to various restrictions on the query rewriting rules considered. In this paper, we present a novel approach for this problem for the class of queries involving inner-joins, single-sided outerjoins, and/or antijoins. Our work is able to support complete join reorderability for this class of queries which supersedes the state-of-the-art approaches.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"66 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83153537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RDSQ: Reliable Queue Protocol over Shared Logs","authors":"Haolin Yu","doi":"10.1145/3183713.3183718","DOIUrl":"https://doi.org/10.1145/3183713.3183718","url":null,"abstract":"","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81957356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}