Companion of the 2023 International Conference on Management of Data最新文献_第3页

SynopsisDB: Distributed Synopsis-based Data Processing System SynopsisDB:分布式基于概要的数据处理系统

Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3589394

Xin Zhang

引用次数: 0

Demystifying Artificial Intelligence for Data Preparation 为数据准备揭秘人工智能

Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3589406

Chengliang Chai, N. Tang, Ju Fan, Yuyu Luo

{"title":"Demystifying Artificial Intelligence for Data Preparation","authors":"Chengliang Chai, N. Tang, Ju Fan, Yuyu Luo","doi":"10.1145/3555041.3589406","DOIUrl":"https://doi.org/10.1145/3555041.3589406","url":null,"abstract":"Data preparation -- the process of discovering, integrating, transforming, cleaning, and annotating data -- is one of the oldest, hardest, yet inevitable data management problems. Unfortunately, data preparation is known to be iterative, requires high human cost, and is error-prone. Recent advances in artificial intelligence (AI) have shown very promising results on many data preparation tasks. At a high level, AI for data preparation (AI4DP) should have the following abilities. First, the AI model should capture real-world knowledge so as to solve various tasks. Second, it is important to easily adapt to new datasets/tasks. Third, data preparation is a complicated pipeline with many operations, which results in a large number of candidates to select the optimum, and thus it is crucial to effectively and efficiently explore the large space of possible pipelines. In this tutorial, we will cover three important topics to address the above issues: demystifying foundation models to inject knowledge for data preparation, tuning and adapting pre-trained language models for data preparation, and orchestrating data preparation pipelines for different downstream applications.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126625170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Characterizing and Verifying Queries Via CINSGEN 通过CINSGEN表征和验证查询

Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3589721

Hanze Meng, Zhengjie Miao, Amir Gilad, Sudeepa Roy, Jun Yang

引用次数: 0

NEXUS: On Explaining Confounding Bias NEXUS:解释混淆偏差

Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3589728

Brit Youngmann, Michael J. Cafarella, Y. Moskovitch, Babak Salimi

{"title":"NEXUS: On Explaining Confounding Bias","authors":"Brit Youngmann, Michael J. Cafarella, Y. Moskovitch, Babak Salimi","doi":"10.1145/3555041.3589728","DOIUrl":"https://doi.org/10.1145/3555041.3589728","url":null,"abstract":"When analyzing large datasets, analysts are often interested in the explanations for unexpected results produced by their queries. In this work, we focus on aggregate SQL queries that expose correlations in the data. A major challenge that hinders the interpretation of such queries is confounding bias, which can lead to an unexpected association between variables. For example, a SQL query computes the average Covid-19 death rate in each country, may expose a puzzling correlation between the country and the death rate. In this work, we demonstrate NEXUS, a system that generates explanations in terms of a set of potential confounding variables that explain the unexpected correlation observed in a query. NEXUS mines candidate confounding variables from external sources since, in many real-life scenarios, the explanations are not solely contained in the input data. For instance, NEXUS might extract data about factors explaining the association between countries and the Covid-19 death rate, such as information about countries' economies and health outcomes. We will demonstrate the utility of NEXUS for investigating unexpected query results by interacting with the SIGMOD'23 participants, who will act as data analysts.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131897757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SHACTOR: Improving the Quality of Large-Scale Knowledge Graphs with Validating Shapes shaactor:通过验证形状来提高大规模知识图的质量

Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3589723

Kashif Rabbani, Matteo Lissandrini, K. Hose

引用次数: 2

Mixed Methods Machine Learning 混合方法机器学习

Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3589337

Vanessa Murdock

{"title":"Mixed Methods Machine Learning","authors":"Vanessa Murdock","doi":"10.1145/3555041.3589337","DOIUrl":"https://doi.org/10.1145/3555041.3589337","url":null,"abstract":"Machine learning is ubiquitous: many of our everyday interactions, both online and offline, are backed by machine learning. Typically, machine learned systems start as an idea from the business or engineering team for a service or an app that helps the customer achieve a goal. The app is built iteratively, starting with the minimum lovable version, and undergoes several rounds of improvements to become more sophisticated. Success is measured with an online A/B test on live traffic, on the assumption that if customers engage with the app, it is serving their needs. We propose a different approach to developing such systems, that employs mixed-methods research to understand what to build, and how to make it satisfying and helpful for the customer. The Mixed Methods Machine Learning (MXML) paradigm, starts with a user study, to understand how people behave in an everyday setting (such as shopping for groceries in a grocery store), and to identify points of friction that can be automated, or experiences that can be made more enjoyable. The study observations are mapped to interactions recorded in the system's behavioral log data, which is the basis for the machine learned system. Mapping the study observations to the log data is a key step in directing the machine learning to solve a customer problem. The MXML system is evaluated with a follow-on user study, in addition to the traditional online A/B test, to assess whether the system is satisfying, helpful and delightful. In this talk we present the MXML paradigm, with real-world examples.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129642845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SMILE: A Cost-Effective System for Serving Massive Pretrained Language Models in The Cloud SMILE:一种在云端服务大量预训练语言模型的经济高效的系统

Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3589720

Jue Wang, Ke Chen, L. Shou, Dawei Jiang, Gang Chen

{"title":"SMILE: A Cost-Effective System for Serving Massive Pretrained Language Models in The Cloud","authors":"Jue Wang, Ke Chen, L. Shou, Dawei Jiang, Gang Chen","doi":"10.1145/3555041.3589720","DOIUrl":"https://doi.org/10.1145/3555041.3589720","url":null,"abstract":"Deep learning models, particularly pre-trained language models (PLMs), have become increasingly important for a variety of applications that require text/language processing. However, these models are resource-intensive and often require costly hardware such as dedicated GPU servers. In response to this issue, we present SMILE, a novel prototype system for efficient deployment and management of such models in the cloud. Our goal is to build a cloud platform from which tenants can easily derive their own custom models, and rent PLM processors to run inference services on these models at reduced costs. To facilitate this, we present a co-design of cost-effective storage and computation scheme for managing massive customized PLMs with constrained hardware resources via effective resource sharing and multiplexing. Our system consists of four core components: vPLM creator, vPLM storage appliance, vPLM trainer, and vPLM processor, which allow tenants to easily create, store, train, and use their customized PLM in the cloud without the need for dedicated hardware or maintenance. In particular, vPLM processors are virtualized from a physical machine, and are designed to have a multi-tenant nature, enabling efficient utilization of resources by precomputing the intermediate representation of PLMs and using adapters to provide customization instead of training the entire model. This allows tenants to host their PLMs in the cloud at minor costs. In our demonstration, we show that over 10,000 models can be hosted on one single machine without compromising the inference speed and accuracy. Overall, our system provides a convenient and cost-effective solution for tenants to host and manage PLMs in the cloud for their customized tasks.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"316 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132878717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Pay "Attention" to Chart Images for What You Read on Text 要“注意”在阅读文字时使用图表图片

Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3589714

Chenyu Yang, Ruixue Fan, Nan Tang, Meihui Zhang, Xiaoman Zhao, Ju Fan, Xiaoyong Du

{"title":"Pay \"Attention\" to Chart Images for What You Read on Text","authors":"Chenyu Yang, Ruixue Fan, Nan Tang, Meihui Zhang, Xiaoman Zhao, Ju Fan, Xiaoyong Du","doi":"10.1145/3555041.3589714","DOIUrl":"https://doi.org/10.1145/3555041.3589714","url":null,"abstract":"Data visualization is changing how we understand data, by showing why's, how's, and what's behind important patterns/trends in almost every corner of the world, such as in academic papers, news articles, financial reports, etc. However, along with the increasing complexity and richness of data visualizations, given a text description (e.g., \"fewer teens say they attended school completely online (8%)\"), it becomes harder for users to pinpoint where to pay attention to on a chart (e.g., a grouped bar chart). In this demonstration paper, we present a system HiChart for text-chart image highlighting: when a user selects a span of text, HiChart automatically analyzes the chart image (e.g., a jpeg or a png file) and highlights the parts that are relevant to the span. From a technical perspective, HiChart devises the following techniques. Reverse-engineering visualizations: given a chart image, HiChart uses computer vision techniques to generate a visualization specification using Vega-Lite language, as well as the underlying dataset; Visualization calibration by data tuning: HiChart calibrates the re-generated chart by tuning the recovered dataset through value perturbation; and Chart highlighting for a span: HiChart maps the span to corresponding data cells and uses the built-in highlighting functions of Vega-Lite to highlight the chart.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121930409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DataEd'23 - 2nd International Workshop on Data Systems Education: Bridging Education Practice with Education Research 第二届数据系统教育国际研讨会:连接教育实践与教育研究

Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3590823

Efthimia Aivaloglou, G. Fletcher, Daphne Miedema

引用次数: 0

Seventh Workshop on Human-In-the-Loop Data Analytics (HILDA) 第七届人在循环数据分析研讨会(HILDA)

Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3590822

Dominik Moritz, Behrooz Omidvar-Tehrani, Sudeepa Roy

引用次数: 0