Proceedings of the 2018 International Conference on Management of Data最新文献_第6页

The Data Interaction Game 数据交互游戏

Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3196899

Ben McCamish, Vahid Ghadakchi, Arash Termehchy, B. Touri, Liang Huang

{"title":"The Data Interaction Game","authors":"Ben McCamish, Vahid Ghadakchi, Arash Termehchy, B. Touri, Liang Huang","doi":"10.1145/3183713.3196899","DOIUrl":"https://doi.org/10.1145/3183713.3196899","url":null,"abstract":"As many users do not precisely know the structure and/or the content of databases, their queries do not exactly reflect their information needs. The database management systems (DBMS) may interact with users and leverage their feedback on the returned results to learn the information needs behind users' queries. Current query interfaces assume that users follow a fixed strategy of expressing their information needs, that is, the likelihood by which a user submits a query to express an information need remains unchanged during her interaction with the DBMS. Using a real-world interaction workload, we show that users learn and modify how to express their information needs during their interactions with the DBMS. We also show that users' learning is accurately modeled by a well-known reinforcement learning mechanism. As current data interaction systems assume that users do not modify their strategies, they cannot discover the information needs behind users' queries effectively. We model the interaction between users and DBMS as a game with identical interest between two rational agents whose goal is to establish a common language for representing information needs in form of queries. We propose a reinforcement learning method that learns and answers the information needs behind queries and adapts to the changes in users' strategies and prove that it improves the effectiveness of answering queries stochastically speaking. We analyze the challenges of efficient implementation of this method over large-scale relational databases and propose two efficient adaptations of this algorithm over large-scale relational databases. Our extensive empirical studies over real-world query workloads and large-scale relational databases indicate that our algorithms are efficient. Our empirical results also show that our proposed learning mechanism is more effective than the state-of-the-art query answering method.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"49 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86726385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

A General and Efficient Querying Method for Learning to Hash 一种通用高效的哈希学习查询方法

Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3183750

Jinfeng Li, Xiao Yan, Jian Zhang, An Xu, James Cheng, Jie Liu, K. K. Ng, Ti-Chung Cheng

引用次数: 11

Efficient Selection of Geospatial Data on Maps for Interactive and Visualized Exploration 面向交互式和可视化勘探的地图地理空间数据高效选择

Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3183738

Tao Guo, Kaiyu Feng, G. Cong, Z. Bao

{"title":"Efficient Selection of Geospatial Data on Maps for Interactive and Visualized Exploration","authors":"Tao Guo, Kaiyu Feng, G. Cong, Z. Bao","doi":"10.1145/3183713.3183738","DOIUrl":"https://doi.org/10.1145/3183713.3183738","url":null,"abstract":"With the proliferation of mobile devices, large collections of geospatial data are becoming available, such as geo-tagged photos. Map rendering systems play an important role in presenting such large geospatial datasets to end users. We propose that such systems should support the following desirable features: representativeness, visibility constraint, zooming consistency, and panning consistency. The first two constraints are fundamental challenges to a map exploration system, which aims to efficiently select a small set of representative objects from the current region of user's interest, and any two selected objects should not be too close to each other for users to distinguish in the limited space of a screen. We formalize it as the Spatial Object Selection (SOS) problem, prove that it is an NP-hard problem, and develop a novel approximation algorithm with performance guarantees. % To further support interactive exploration of geospatial data on maps, we propose the Interactive SOS (ISOS) problem, in which we enrich the SOS problem with the zooming consistency and panning consistency constraints. The objective of ISOS is to provide seamless experience for end-users to interactively explore the data by navigating the map. We extend our algorithm for the SOS problem to solve the ISOS problem, and propose a new strategy based on pre-fetching to significantly enhance the efficiency. Finally we have conducted extensive experiments to show the efficiency and scalability of our approach.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"63 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72788327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 46

EKTELO: A Framework for Defining Differentially-Private Computations EKTELO:定义微分私有计算的框架

Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3196921

Dan Zhang, Ryan McKenna, Ios Kotsogiannis, Michael Hay, Ashwin Machanavajjhala, G. Miklau

引用次数: 54

Speeding Up Set Intersections in Graph Algorithms using SIMD Instructions 使用SIMD指令加速图算法中的集合交叉点

Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3196924

Shuo Han, Lei Zou, J. Yu

{"title":"Speeding Up Set Intersections in Graph Algorithms using SIMD Instructions","authors":"Shuo Han, Lei Zou, J. Yu","doi":"10.1145/3183713.3196924","DOIUrl":"https://doi.org/10.1145/3183713.3196924","url":null,"abstract":"In this paper, we focus on accelerating a widely employed computing pattern --- set intersection, to boost a group of graph algorithms. Graph's adjacency-lists can be naturally considered as node sets, thus set intersection is a primitive operation in many graph algorithms. We propose QFilter, a set intersection algorithm using SIMD instructions. QFilter adopts a merge-based framework and compares two blocks of elements iteratively by SIMD instructions. The key insight for our improvement is that we quickly filter out most of unnecessary comparisons in one byte-checking step. We also present a binary representation called BSR that encodes sets in a compact layout. By combining QFilter and BSR, we achieve data-parallelism in two levels --- inter-chunk and intra-chunk parallelism. Moreover, we find that node ordering impacts the performance of intersection by affecting the compactness of BSR. We formulate the graph reordering problem as an optimization of the compactness of BSR, and prove its strong NP-completeness. Thus we propose an approximate algorithm that can find a better ordering to enhance the intra-chunk parallelism. We conduct extensive experiments to confirm that our approach can improve the performance of set intersection in graph algorithms significantly.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90747263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 49

Kubernetes and the New Cloud Kubernetes和新云

Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3183725

E. Brewer

引用次数: 7

MISTIQUE: A System to Store and Query Model Intermediates for Model Diagnosis MISTIQUE:一个用于模型诊断的模型中间体存储和查询系统

Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3196934

Manasi Vartak, Joana M. F. da Trindade, S. Madden, M. Zaharia

{"title":"MISTIQUE: A System to Store and Query Model Intermediates for Model Diagnosis","authors":"Manasi Vartak, Joana M. F. da Trindade, S. Madden, M. Zaharia","doi":"10.1145/3183713.3196934","DOIUrl":"https://doi.org/10.1145/3183713.3196934","url":null,"abstract":"Model diagnosis is the process of analyzing machine learning (ML) model performance to identify where the model works well and where it doesn't. It is a key part of the modeling process and helps ML developers iteratively improve model accuracy. Often, model diagnosis is performed by analyzing different datasets or intermediates associated with the model such as the input data and hidden representations learned by the model (e.g., [4, 24, 39,]). The bottleneck in fast model diagnosis is the creation and storage of model intermediates. Storing these intermediates requires tens to hundreds of GB of storage whereas re-running the model for each diagnostic query slows down model diagnosis. To address this bottleneck, we propose a system called MISTIQUE that can work with traditional ML pipelines as well as deep neural networks to efficiently capture, store, and query model intermediates for diagnosis. For each diagnostic query, MISTIQUE intelligently chooses whether to re-run the model or read a previously stored intermediate. For intermediates that are stored in MISTIQUE, we propose a range of optimizations to reduce storage footprint including quantization, summarization, and data de-duplication. We evaluate our techniques on a range of real-world ML models in scikit-learn and Tensorflow. We demonstrate that our optimizations reduce storage by up to 110X for traditional ML pipelines and up to 6X for deep neural networks. Furthermore, by using MISTIQUE, we can speed up diagnostic queries on traditional ML pipelines by up to 390X and 210X on deep neural networks.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76640855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 57

A Rating-Ranking Method for Crowdsourced Top-k Computation 一种众包Top-k计算的分级排序方法

Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3183762

Kaiyu Li, Xiaohang Zhang, Guoliang Li

{"title":"A Rating-Ranking Method for Crowdsourced Top-k Computation","authors":"Kaiyu Li, Xiaohang Zhang, Guoliang Li","doi":"10.1145/3183713.3183762","DOIUrl":"https://doi.org/10.1145/3183713.3183762","url":null,"abstract":"Crowdsourced top- k computation aims to utilize the human ability to identify Top- k objects from a given set of objects. Most of existing studies employ a pairwise comparison based method, which first asks workers to compare each pair of objects and then infers the Top- k results based on the pairwise comparison results. Obviously, it is quadratic to compare every object pair and these methods involve huge monetary cost, especially for large datasets. To address this problem, we propose a rating-ranking-based approach, which contains two types of questions to ask the crowd. The first is a rating question, which asks the crowd to give a score for an object. The second is a ranking question, which asks the crowd to rank several (e.g., 3) objects. Rating questions are coarse grained and can roughly get a score for each object, which can be used to prune the objects whose scores are much smaller than those of the Top- k objects. Ranking questions are fine grained and can be used to refine the scores. We propose a unified model to model the rating and ranking questions, and seamlessly combine them together to compute the Top- k results. We also study how to judiciously select appropriate rating or ranking questions and assign them to a coming worker. Experimental results on real datasets show that our method significantly outperforms existing approaches.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87439880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Improving Join Reorderability with Compensation Operators 利用补偿算子改进连接可排序性

Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3183731

Taining Wang, C. Chan

引用次数: 2

RDSQ: Reliable Queue Protocol over Shared Logs RDSQ:共享日志上的可靠队列协议

Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3183718

Haolin Yu

引用次数: 0