Proceedings of the Second International Workshop on Exploiting Artificial Intelligence Techniques for Data Management最新文献

Question answering via web extracted tables 通过网络抽取表格回答问题

Proceedings of the Second International Workshop on Exploiting Artificial Intelligence Techniques for Data Management Pub Date : 2019-07-05 DOI: 10.1145/3329859.3329879

Bhavya Karki, Fan Hu, Nithin Haridas, S. Barot, Zihua Liu, Lucile Callebert, Matthias Grabmair, A. Tomasic

{"title":"Question answering via web extracted tables","authors":"Bhavya Karki, Fan Hu, Nithin Haridas, S. Barot, Zihua Liu, Lucile Callebert, Matthias Grabmair, A. Tomasic","doi":"10.1145/3329859.3329879","DOIUrl":"https://doi.org/10.1145/3329859.3329879","url":null,"abstract":"Question answering (QA) provides answers to a wide range of questions but is still limited in the complexity of reasoning and the breadth of accessible data sources. In this paper, we describe a dataset and baseline results for a question answering system that utilizes web tables. The dataset is derived from commonly asked questions on the web, and their corresponding answers found in tables on websites. Our dataset is novel in that every question is paired with a table of a different signature, so learning must automatically generalize across domains. Each QA training instance comprises a table, a natural language question, and a corresponding structured SQL query. We build our model by dividing question answering into a sequence of tasks, including table retrieval and question element classification, and conduct experiments to measure the performance of each task. In a traditional machine learning design manner, we extract various features specific to each task, apply a neural model, and then compose a full pipeline which constructs the SQL query from its parts. Our work provides quantitative results and error analysis for each task, and identifies in detail the reasoning required to generate SQL expressions from natural language questions. This analysis of reasoning informs future models based on neural machine learning.","PeriodicalId":118194,"journal":{"name":"Proceedings of the Second International Workshop on Exploiting Artificial Intelligence Techniques for Data Management","volume":"2017 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127556813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Scheduling OLTP transactions via learned abort prediction 通过学习中止预测调度OLTP事务

Proceedings of the Second International Workshop on Exploiting Artificial Intelligence Techniques for Data Management Pub Date : 2019-07-05 DOI: 10.1145/3329859.3329871

Yangjun Sheng, A. Tomasic, Tieying Zhang, Andrew Pavlo

{"title":"Scheduling OLTP transactions via learned abort prediction","authors":"Yangjun Sheng, A. Tomasic, Tieying Zhang, Andrew Pavlo","doi":"10.1145/3329859.3329871","DOIUrl":"https://doi.org/10.1145/3329859.3329871","url":null,"abstract":"Current main memory database system architectures are still challenged by high contention workloads and this challenge will continue to grow as the number of cores in processors continues to increase [23]. These systems schedule transactions randomly across cores to maximize concurrency and to produce a uniform load across cores. Scheduling never considers potential conflicts. Performance could be improved if scheduling balanced between concurrency to maximize throughput and scheduling transactions linearly to avoid conflicts. In this paper, we present the design of several intelligent transaction scheduling algorithms that consider both potential transaction conflicts and concurrency. To incorporate reasoning about transaction conflicts, we develop a supervised machine learning model that estimates the probability of conflict. This model is incorporated into several scheduling algorithms. In addition, we integrate an unsupervised machine learning algorithm into an intelligent scheduling algorithm. We then empirically measure the performance impact of different scheduling algorithms on OLTP and social networking workloads. Our results show that, with appropriate settings, intelligent scheduling can increase throughput by 54% and reduce abort rate by 80% on a 20-core machine, relative to random scheduling. In summary, the paper provides preliminary evidence that intelligent scheduling significantly improves DBMS performance.","PeriodicalId":118194,"journal":{"name":"Proceedings of the Second International Workshop on Exploiting Artificial Intelligence Techniques for Data Management","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131571911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

Learning to optimize federated queries 学习优化联邦查询

Proceedings of the Second International Workshop on Exploiting Artificial Intelligence Techniques for Data Management Pub Date : 2019-07-05 DOI: 10.1145/3329859.3329873

Liqi Xu, R. Cole, Daniel Ting

引用次数: 4

Interpreting deep learning models for entity resolution: an experience report using LIME 解释实体解析的深度学习模型:使用LIME的体验报告

Proceedings of the Second International Workshop on Exploiting Artificial Intelligence Techniques for Data Management Pub Date : 2019-07-05 DOI: 10.1145/3329859.3329878

Vincenzo Di Cicco, D. Firmani, Nick Koudas, P. Merialdo, D. Srivastava

{"title":"Interpreting deep learning models for entity resolution: an experience report using LIME","authors":"Vincenzo Di Cicco, D. Firmani, Nick Koudas, P. Merialdo, D. Srivastava","doi":"10.1145/3329859.3329878","DOIUrl":"https://doi.org/10.1145/3329859.3329878","url":null,"abstract":"Entity Resolution (ER) seeks to understand which records refer to the same entity (e.g., matching products sold on multiple websites). The sheer number of ways humans represent and misrepresent information about real-world entities makes ER a challenging problem. Deep Learning (DL) has provided impressive results in the field of natural language processing, thus recent works started exploring DL approaches to the ER problem, with encouraging results. However, we are still far from understanding why and when these approaches work in the ER setting. We are developing a methodology, Mojito, to produce explainable interpretations of the output of DL models for the ER task. Our methodology is based on LIME, a popular tool for producing prediction explanations for generic classification tasks. In this paper we report our first experiences in interpreting recent DL models for the ER task. Our results demonstrate the importance of explanations in the DL space, and suggest that, when assessing performance of DL algorithms for ER, accuracy alone may not be sufficient to demonstrate generality and reproducibility in a production environment.","PeriodicalId":118194,"journal":{"name":"Proceedings of the Second International Workshop on Exploiting Artificial Intelligence Techniques for Data Management","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132137426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 32

Considerations for handling updates in learned index structures 在学习索引结构中处理更新的注意事项

Proceedings of the Second International Workshop on Exploiting Artificial Intelligence Techniques for Data Management Pub Date : 2019-07-05 DOI: 10.1145/3329859.3329874

A. Hadian, T. Heinis

引用次数: 19

Cardinality estimation with local deep learning models 基于局部深度学习模型的基数估计

Proceedings of the Second International Workshop on Exploiting Artificial Intelligence Techniques for Data Management Pub Date : 2019-07-05 DOI: 10.1145/3329859.3329875

Lucas Woltmann, Claudio Hartmann, Maik Thiele, Dirk Habich, Wolfgang Lehner

{"title":"Cardinality estimation with local deep learning models","authors":"Lucas Woltmann, Claudio Hartmann, Maik Thiele, Dirk Habich, Wolfgang Lehner","doi":"10.1145/3329859.3329875","DOIUrl":"https://doi.org/10.1145/3329859.3329875","url":null,"abstract":"Cardinality estimation is a fundamental task in database query processing and optimization. Unfortunately, the accuracy of traditional estimation techniques is poor resulting in non-optimal query execution plans. With the recent expansion of machine learning into the field of data management, there is the general notion that data analysis, especially neural networks, can lead to better estimation accuracy. Up to now, all proposed neural network approaches for the cardinality estimation follow a global approach considering the whole database schema at once. These global models are prone to sparse data at training leading to misestimates for queries which were not represented in the sample space used for generating training queries. To overcome this issue, we introduce a novel local-oriented approach in this paper, therefore the local context is a specific sub-part of the schema. As we will show, this leads to better representation of data correlation and thus better estimation accuracy. Compared to global approaches, our novel approach achieves an improvement by two orders of magnitude in accuracy and by a factor of four in training time performance for local models.","PeriodicalId":118194,"journal":{"name":"Proceedings of the Second International Workshop on Exploiting Artificial Intelligence Techniques for Data Management","volume":"16 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123580966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 72

Towards learning a partitioning advisor with deep reinforcement learning 用深度强化学习学习分区建议器

Proceedings of the Second International Workshop on Exploiting Artificial Intelligence Techniques for Data Management Pub Date : 2019-07-05 DOI: 10.1145/3329859.3329876

Benjamin Hilprecht, Carsten Binnig, Uwe Röhm

引用次数: 20

Termite: a system for tunneling through heterogeneous data 白蚁:通过异构数据隧道的系统

Proceedings of the Second International Workshop on Exploiting Artificial Intelligence Techniques for Data Management Pub Date : 2019-03-12 DOI: 10.1145/3329859.3329877

R. Fernandez, S. Madden

{"title":"Termite: a system for tunneling through heterogeneous data","authors":"R. Fernandez, S. Madden","doi":"10.1145/3329859.3329877","DOIUrl":"https://doi.org/10.1145/3329859.3329877","url":null,"abstract":"Data-driven analysis is important in virtually every modern organization. Yet, most data is underutilized because it remains locked in silos inside of organizations; large organizations have thousands of databases, and billions of files that are not integrated together in a single, queryable repository. Despite 40+ years of continuous effort by the database community, data integration still remains an open challenge. In this paper, we advocate a different approach: rather than trying to infer a common schema, we aim to find another common representation for diverse, heterogeneous data. Specifically, we argue for an embedding (i.e., a vector space) in which all entities, rows, columns, and paragraphs are represented as points. In the embedding, the distance between points indicates their degree of relatedness. We present Termite, a prototype we have built to learn the best embedding from the data. Because the best representation is learned, this allows Termite to avoid much of the human effort associated with traditional data integration tasks. On top of Termite, we have implemented a Termite-Join operator, which allows people to identify related concepts, even when these are stored in databases with different schemas and in unstructured data such as text files, webpages, etc. Finally, we show preliminary evaluation results of our prototype via a user study, and describe a list of future directions we have identified.","PeriodicalId":118194,"journal":{"name":"Proceedings of the Second International Workshop on Exploiting Artificial Intelligence Techniques for Data Management","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114161104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 30

Proceedings of the Second International Workshop on Exploiting Artificial Intelligence Techniques for Data Management 第二届数据管理人工智能技术国际研讨会论文集

Proceedings of the Second International Workshop on Exploiting Artificial Intelligence Techniques for Data Management Pub Date : 1900-01-01 DOI: 10.1145/3329859

引用次数: 5