Proceedings of the 2018 International Conference on Management of Data最新文献_第7页

Crowdsourcing Analytics With CrowdCur 使用CrowdCur进行众包分析

Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3193563

M. Esfandiari, Kavan Patel, S. Amer-Yahia, Senjuti Basu Roy

引用次数: 2

On the Calculation of Optimality Ranges for Relational Query Execution Plans 关系型查询执行计划的最优范围计算

Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3183742

Florian Wolf, Norman May, P. Willems, K. Sattler

{"title":"On the Calculation of Optimality Ranges for Relational Query Execution Plans","authors":"Florian Wolf, Norman May, P. Willems, K. Sattler","doi":"10.1145/3183713.3183742","DOIUrl":"https://doi.org/10.1145/3183713.3183742","url":null,"abstract":"Cardinality estimation is a crucial task in query optimization and typically relies on heuristics and basic statistical approximations. At execution time, estimation errors might result in situations where intermediate result sizes may differ from the estimated ones, so that the originally chosen plan is not the optimal plan anymore. In this paper we analyze the deviation from the estimate, and denote the cardinality range of an intermediate result, where the optimal plan remains optimal as the optimality range. While previous work used simple heuristics to calculate similar ranges, we generate the precise bounds for the optimality range considering all relevant plan alternatives. Our experimental results show that the fixed optimality ranges used in previous work fail to characterize the range of cardinalities where a plan is optimal. We derive theoretical worst case bounds for the number of enumerated plans required to compute the precise optimality range, and experimentally show that in real queries this number is significantly smaller. Our experiments also show the benefit for applications like Mid-Query Re-Optimization in terms of significant execution time improvement.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"40 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79765130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Tighter Upper Bounds for Join Cardinality Estimates 连接基数估计的更严格上界

Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3183714

Walter Cai

引用次数: 0

SuRF: Practical Range Query Filtering with Fast Succinct Tries SuRF:实用范围查询过滤与快速简洁的尝试

Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3196931

Huanchen Zhang, Hyeontaek Lim, Viktor Leis, D. Andersen, M. Kaminsky, Kimberly Keeton, Andrew Pavlo

引用次数: 117

Cold Filter: A Meta-Framework for Faster and More Accurate Stream Processing 冷过滤器:一个元框架，更快，更准确的流处理

Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3183726

Yang Zhou, Tong Yang, Jie Jiang, B. Cui, Minlan Yu, Xiaoming Li, S. Uhlig

{"title":"Cold Filter: A Meta-Framework for Faster and More Accurate Stream Processing","authors":"Yang Zhou, Tong Yang, Jie Jiang, B. Cui, Minlan Yu, Xiaoming Li, S. Uhlig","doi":"10.1145/3183713.3183726","DOIUrl":"https://doi.org/10.1145/3183713.3183726","url":null,"abstract":"Approximate stream processing algorithms, such as Count-Min sketch, Space-Saving, etc., support numerous applications in databases, storage systems, networking, and other domains. However, the unbalanced distribution in real data streams poses great challenges to existing algorithms. To enhance these algorithms, we propose a meta-framework, called Cold Filter (CF), that enables faster and more accurate stream processing. Different from existing filters that mainly focus on hot items, our filter captures cold items in the first stage, and hot items in the second stage. Also, existing filters require two-direction communication - with frequent exchanges between the two stages; our filter on the other hand is one-direction - each item enters one stage at most once. Our filter can accurately estimate both cold and hot items, giving it a genericity that makes it applicable to many stream processing tasks. To illustrate the benefits of our filter, we deploy it on three typical stream processing tasks and experimental results show speed improvements of up to 4.7 times, and accuracy improvements of up to 51 times. All source code is made publicly available at Github.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78561034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 99

Accelerating Machine Learning Inference with Probabilistic Predicates 用概率谓词加速机器学习推理

Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3183751

Yao Lu, Aakanksha Chowdhery, Srikanth Kandula, S. Chaudhuri

引用次数: 85

Fine-grained Concept Linking using Neural Networks in Healthcare 在医疗保健中使用神经网络的细粒度概念链接

Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3196907

Jian Dai, Meihui Zhang, Gang Chen, Ju Fan, K. Ngiam, B. Ooi

{"title":"Fine-grained Concept Linking using Neural Networks in Healthcare","authors":"Jian Dai, Meihui Zhang, Gang Chen, Ju Fan, K. Ngiam, B. Ooi","doi":"10.1145/3183713.3196907","DOIUrl":"https://doi.org/10.1145/3183713.3196907","url":null,"abstract":"To unlock the wealth of the healthcare data, we often need to link the real-world text snippets to the referred medical concepts described by the canonical descriptions. However, existing healthcare concept linking methods, such as dictionary-based and simple machine learning methods, are not effective due to the word discrepancy between the text snippet and the canonical concept description, and the overlapping concept meaning among the fine-grained concepts. To address these challenges, we propose a Neural Concept Linking (NCL) approach for accurate concept linking using systematically integrated neural networks. We call the novel neural network architecture as the COMposite AttentIonal encode-Decode neural network (COM-AID). COM-AID performs an encode-decode process that encodes a concept into a vector and decodes the vector into a text snippet with the help of two devised contexts. On the one hand, it injects the textual context into the neural network through the attention mechanism, so that the word discrepancy can be overcome from the semantic perspective. On the other hand, it incorporates the structural context into the neural network through the attention mechanism, so that minor concept meaning differences can be enlarged and effectively differentiated. Empirical studies on two real-world datasets confirm that the NCL produces accurate concept linking results and significantly outperforms state-of-the-art techniques.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80129732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

ZigZag: Supporting Similarity Queries on Vector Space Models ZigZag:支持向量空间模型上的相似性查询

Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3196936

Wenhai Li, Lingfeng Deng, Yang Li, Chen Li

引用次数: 3

Machine Learning for Data Management: Problems and Solutions 数据管理中的机器学习:问题与解决方案

Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3199515

Pedro M. Domingos

{"title":"Machine Learning for Data Management: Problems and Solutions","authors":"Pedro M. Domingos","doi":"10.1145/3183713.3199515","DOIUrl":"https://doi.org/10.1145/3183713.3199515","url":null,"abstract":"Machine learning has made great strides in recent years, and its applications are spreading rapidly. Unfortunately, the standard machine learning formulation does not match well with data management problems. For example, most learning algorithms assume that the data is contained in a single table, and consists of i.i.d. (independent and identically distributed) samples. This leads to a proliferation of ad hoc solutions, slow development, and suboptimal results. Fortunately, a body of machine learning theory and practice is being developed that dispenses with such assumptions, and promises to make machine learning for data management much easier and more effective [1]. In particular, representations like Markov logic, which includes many types of deep networks as special cases, allow us to define very rich probability distributions over non-i.i.d., multi-relational data [2]. Despite their generality, learning the parameters of these models is still a convex optimization problem, allowing for efficient solution. Learning structure-in the case of Markov logic, a set of formulas in first-order logic-is intractable, as in more traditional representations, but can be done effectively using inductive logic programming techniques. Inference is performed using probabilistic generalizations of theorem proving, and takes linear time and space in tractable Markov logic, an object-oriented specialization of Markov logic [3]. These techniques have led to state-of-the-art, principled solutions to problems like entity resolution, schema matching, ontology alignment, and information extraction. Using tractable Markov logic, we have extracted from the Web a probabilistic knowledge base with millions of objects and billions of parameters, which can be queried exactly in subsecond times using an RDBMS backend [3]. With these foundations in place, we expect the pace of machine learning applications in data management to continue to accelerate in coming years.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"52 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84682798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Auto-Detect: Data-Driven Error Detection in Tables 自动检测:表中数据驱动的错误检测

Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3196889

Zhipeng Huang, Yeye He

{"title":"Auto-Detect: Data-Driven Error Detection in Tables","authors":"Zhipeng Huang, Yeye He","doi":"10.1145/3183713.3196889","DOIUrl":"https://doi.org/10.1145/3183713.3196889","url":null,"abstract":"Given a single column of values, existing approaches typically employ regex-like rules to detect errors by finding anomalous values inconsistent with others. Such techniques make local decisions based only on values in the given input column, without considering a more global notion of compatibility that can be inferred from large corpora of clean tables. We propose sj, a statistics-based technique that leverages co-occurrence statistics from large corpora for error detection, which is a significant departure from existing rule-based methods. Our approach can automatically detect incompatible values, by leveraging an ensemble of judiciously selected generalization languages, each of which uses different generalizations and is sensitive to different types of errors. Errors so detected are based on global statistics, which is robust and aligns well with human intuition of errors. We test sj on a large set of public Wikipedia tables, as well as proprietary enterprise Excel files. While both of these test sets are supposed to be of high-quality, sj makes surprising discoveries of over tens of thousands of errors in both cases, which are manually verified to be of high precision (over 0.98). Our labeled benchmark set on Wikipedia tables is released for future research.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"47 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85874798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 29