Companion of the 2023 International Conference on Management of Data最新文献

筛选
英文 中文
SynopsisDB: Distributed Synopsis-based Data Processing System SynopsisDB:分布式基于概要的数据处理系统
Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3589394
Xin Zhang
{"title":"SynopsisDB: Distributed Synopsis-based Data Processing System","authors":"Xin Zhang","doi":"10.1145/3555041.3589394","DOIUrl":"https://doi.org/10.1145/3555041.3589394","url":null,"abstract":"As the data volume continues to expand at an unprecedented rate, data scientists face the challenge of effectively processing and exploring vast amounts of data. To carry out tasks such as analyzing wildfire clusters, querying diverse datasets, and visualizing results with tools like IncVisage, Pangloss, Marviq, and GeoSparkViz, data scientists require data processing systems that are efficient, flexible, and capable of handling different types of queries across various data sources. Two critical features that these systems should possess are the ability to process data efficiently and handle a wide range of queries for diverse data types.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116171279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Demystifying Artificial Intelligence for Data Preparation 为数据准备揭秘人工智能
Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3589406
Chengliang Chai, N. Tang, Ju Fan, Yuyu Luo
{"title":"Demystifying Artificial Intelligence for Data Preparation","authors":"Chengliang Chai, N. Tang, Ju Fan, Yuyu Luo","doi":"10.1145/3555041.3589406","DOIUrl":"https://doi.org/10.1145/3555041.3589406","url":null,"abstract":"Data preparation -- the process of discovering, integrating, transforming, cleaning, and annotating data -- is one of the oldest, hardest, yet inevitable data management problems. Unfortunately, data preparation is known to be iterative, requires high human cost, and is error-prone. Recent advances in artificial intelligence (AI) have shown very promising results on many data preparation tasks. At a high level, AI for data preparation (AI4DP) should have the following abilities. First, the AI model should capture real-world knowledge so as to solve various tasks. Second, it is important to easily adapt to new datasets/tasks. Third, data preparation is a complicated pipeline with many operations, which results in a large number of candidates to select the optimum, and thus it is crucial to effectively and efficiently explore the large space of possible pipelines. In this tutorial, we will cover three important topics to address the above issues: demystifying foundation models to inject knowledge for data preparation, tuning and adapting pre-trained language models for data preparation, and orchestrating data preparation pipelines for different downstream applications.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126625170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Characterizing and Verifying Queries Via CINSGEN 通过CINSGEN表征和验证查询
Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3589721
Hanze Meng, Zhengjie Miao, Amir Gilad, Sudeepa Roy, Jun Yang
{"title":"Characterizing and Verifying Queries Via CINSGEN","authors":"Hanze Meng, Zhengjie Miao, Amir Gilad, Sudeepa Roy, Jun Yang","doi":"10.1145/3555041.3589721","DOIUrl":"https://doi.org/10.1145/3555041.3589721","url":null,"abstract":"Example database instances can be very helpful in understanding complex queries. Different examples may illustrate alternative situations in which answers emerge in the query results and can be useful for testing. Examples can also help reveal semantic differences between queries that are supposed to be equivalent, e.g., when students try to understand how their queries behave differently from a reference solution, or when programmers try to pinpoint mistakes inadvertently introduced by rewrites meant to improve readability or performance. In this paper, we propose to demonstrate CinsGen, a system that can characterize queries and help distinguish between two queries. Given a query, CinsGen generates minimal conditional instances (c-instances) that satisfy it. In turn, each c-instance is a generalization of multiple database instances, yielding a compact representation. Thus, using CinsGen enables users to obtain a comprehensive and compact view of all scenarios that satisfy a specified query, allowing for query characterization or distinction between two queries.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125303535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NEXUS: On Explaining Confounding Bias NEXUS:解释混淆偏差
Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3589728
Brit Youngmann, Michael J. Cafarella, Y. Moskovitch, Babak Salimi
{"title":"NEXUS: On Explaining Confounding Bias","authors":"Brit Youngmann, Michael J. Cafarella, Y. Moskovitch, Babak Salimi","doi":"10.1145/3555041.3589728","DOIUrl":"https://doi.org/10.1145/3555041.3589728","url":null,"abstract":"When analyzing large datasets, analysts are often interested in the explanations for unexpected results produced by their queries. In this work, we focus on aggregate SQL queries that expose correlations in the data. A major challenge that hinders the interpretation of such queries is confounding bias, which can lead to an unexpected association between variables. For example, a SQL query computes the average Covid-19 death rate in each country, may expose a puzzling correlation between the country and the death rate. In this work, we demonstrate NEXUS, a system that generates explanations in terms of a set of potential confounding variables that explain the unexpected correlation observed in a query. NEXUS mines candidate confounding variables from external sources since, in many real-life scenarios, the explanations are not solely contained in the input data. For instance, NEXUS might extract data about factors explaining the association between countries and the Covid-19 death rate, such as information about countries' economies and health outcomes. We will demonstrate the utility of NEXUS for investigating unexpected query results by interacting with the SIGMOD'23 participants, who will act as data analysts.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131897757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SHACTOR: Improving the Quality of Large-Scale Knowledge Graphs with Validating Shapes shaactor:通过验证形状来提高大规模知识图的质量
Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3589723
Kashif Rabbani, Matteo Lissandrini, K. Hose
{"title":"SHACTOR: Improving the Quality of Large-Scale Knowledge Graphs with Validating Shapes","authors":"Kashif Rabbani, Matteo Lissandrini, K. Hose","doi":"10.1145/3555041.3589723","DOIUrl":"https://doi.org/10.1145/3555041.3589723","url":null,"abstract":"We demonstrate SHACTOR, a system for extracting and analyzing validating shapes from very large Knowledge Graphs (KGs). Shapes represent a specific form of data patterns, akin to schemas for entities. Standard shape extraction approaches are likely to produce thousands of shapes, and some of those represent spurious constraints extracted due to the presence of erroneous data in the KG. Given a KG having tens of millions of triples and thousands of classes, SHACTOR parses the KG using our efficient and scalable shapes extraction algorithm and outputs SHACL shapes constraints. The extracted shapes are further annotated with statistical information regarding their support in the graph, which allows to identify both erroneous and missing triples in the KG. Hence, SHACTOR can be used to extract, analyze, and clean shape constraints from very large KGs. Furthermore, it enables the user to also find and correct errors by automatically generating SPARQL queries over the graph to retrieve nodes and facts that are the source of the spurious shapes and to intervene by amending the data.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120945773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Mixed Methods Machine Learning 混合方法机器学习
Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3589337
Vanessa Murdock
{"title":"Mixed Methods Machine Learning","authors":"Vanessa Murdock","doi":"10.1145/3555041.3589337","DOIUrl":"https://doi.org/10.1145/3555041.3589337","url":null,"abstract":"Machine learning is ubiquitous: many of our everyday interactions, both online and offline, are backed by machine learning. Typically, machine learned systems start as an idea from the business or engineering team for a service or an app that helps the customer achieve a goal. The app is built iteratively, starting with the minimum lovable version, and undergoes several rounds of improvements to become more sophisticated. Success is measured with an online A/B test on live traffic, on the assumption that if customers engage with the app, it is serving their needs. We propose a different approach to developing such systems, that employs mixed-methods research to understand what to build, and how to make it satisfying and helpful for the customer. The Mixed Methods Machine Learning (MXML) paradigm, starts with a user study, to understand how people behave in an everyday setting (such as shopping for groceries in a grocery store), and to identify points of friction that can be automated, or experiences that can be made more enjoyable. The study observations are mapped to interactions recorded in the system's behavioral log data, which is the basis for the machine learned system. Mapping the study observations to the log data is a key step in directing the machine learning to solve a customer problem. The MXML system is evaluated with a follow-on user study, in addition to the traditional online A/B test, to assess whether the system is satisfying, helpful and delightful. In this talk we present the MXML paradigm, with real-world examples.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129642845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SMILE: A Cost-Effective System for Serving Massive Pretrained Language Models in The Cloud SMILE:一种在云端服务大量预训练语言模型的经济高效的系统
Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3589720
Jue Wang, Ke Chen, L. Shou, Dawei Jiang, Gang Chen
{"title":"SMILE: A Cost-Effective System for Serving Massive Pretrained Language Models in The Cloud","authors":"Jue Wang, Ke Chen, L. Shou, Dawei Jiang, Gang Chen","doi":"10.1145/3555041.3589720","DOIUrl":"https://doi.org/10.1145/3555041.3589720","url":null,"abstract":"Deep learning models, particularly pre-trained language models (PLMs), have become increasingly important for a variety of applications that require text/language processing. However, these models are resource-intensive and often require costly hardware such as dedicated GPU servers. In response to this issue, we present SMILE, a novel prototype system for efficient deployment and management of such models in the cloud. Our goal is to build a cloud platform from which tenants can easily derive their own custom models, and rent PLM processors to run inference services on these models at reduced costs. To facilitate this, we present a co-design of cost-effective storage and computation scheme for managing massive customized PLMs with constrained hardware resources via effective resource sharing and multiplexing. Our system consists of four core components: vPLM creator, vPLM storage appliance, vPLM trainer, and vPLM processor, which allow tenants to easily create, store, train, and use their customized PLM in the cloud without the need for dedicated hardware or maintenance. In particular, vPLM processors are virtualized from a physical machine, and are designed to have a multi-tenant nature, enabling efficient utilization of resources by precomputing the intermediate representation of PLMs and using adapters to provide customization instead of training the entire model. This allows tenants to host their PLMs in the cloud at minor costs. In our demonstration, we show that over 10,000 models can be hosted on one single machine without compromising the inference speed and accuracy. Overall, our system provides a convenient and cost-effective solution for tenants to host and manage PLMs in the cloud for their customized tasks.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"316 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132878717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pay "Attention" to Chart Images for What You Read on Text 要“注意”在阅读文字时使用图表图片
Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3589714
Chenyu Yang, Ruixue Fan, Nan Tang, Meihui Zhang, Xiaoman Zhao, Ju Fan, Xiaoyong Du
{"title":"Pay \"Attention\" to Chart Images for What You Read on Text","authors":"Chenyu Yang, Ruixue Fan, Nan Tang, Meihui Zhang, Xiaoman Zhao, Ju Fan, Xiaoyong Du","doi":"10.1145/3555041.3589714","DOIUrl":"https://doi.org/10.1145/3555041.3589714","url":null,"abstract":"Data visualization is changing how we understand data, by showing why's, how's, and what's behind important patterns/trends in almost every corner of the world, such as in academic papers, news articles, financial reports, etc. However, along with the increasing complexity and richness of data visualizations, given a text description (e.g., \"fewer teens say they attended school completely online (8%)\"), it becomes harder for users to pinpoint where to pay attention to on a chart (e.g., a grouped bar chart). In this demonstration paper, we present a system HiChart for text-chart image highlighting: when a user selects a span of text, HiChart automatically analyzes the chart image (e.g., a jpeg or a png file) and highlights the parts that are relevant to the span. From a technical perspective, HiChart devises the following techniques. Reverse-engineering visualizations: given a chart image, HiChart uses computer vision techniques to generate a visualization specification using Vega-Lite language, as well as the underlying dataset; Visualization calibration by data tuning: HiChart calibrates the re-generated chart by tuning the recovered dataset through value perturbation; and Chart highlighting for a span: HiChart maps the span to corresponding data cells and uses the built-in highlighting functions of Vega-Lite to highlight the chart.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121930409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DataEd'23 - 2nd International Workshop on Data Systems Education: Bridging Education Practice with Education Research 第二届数据系统教育国际研讨会:连接教育实践与教育研究
Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3590823
Efthimia Aivaloglou, G. Fletcher, Daphne Miedema
{"title":"DataEd'23 - 2nd International Workshop on Data Systems Education: Bridging Education Practice with Education Research","authors":"Efthimia Aivaloglou, G. Fletcher, Daphne Miedema","doi":"10.1145/3555041.3590823","DOIUrl":"https://doi.org/10.1145/3555041.3590823","url":null,"abstract":"Interest in data systems education is increasing, especially with the rise in demand for well-trained and re-trained data scientists. The database and the computing education research communities have complementary perspectives and experiences to share with each other. The DataEd workshop is organized as a dedicated venue for these communities to come together to share findings, cross-pollinate perspectives and methods, and shed light on opportunities for mutual progress in data systems education. In the DataEd workshop, we will present and discuss data management systems education experiences and research via keynotes and paper and poster presentations.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129948835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Seventh Workshop on Human-In-the-Loop Data Analytics (HILDA) 第七届人在循环数据分析研讨会(HILDA)
Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3590822
Dominik Moritz, Behrooz Omidvar-Tehrani, Sudeepa Roy
{"title":"Seventh Workshop on Human-In-the-Loop Data Analytics (HILDA)","authors":"Dominik Moritz, Behrooz Omidvar-Tehrani, Sudeepa Roy","doi":"10.1145/3555041.3590822","DOIUrl":"https://doi.org/10.1145/3555041.3590822","url":null,"abstract":"HILDA brings together researchers and practitioners to exchange ideas and results on human-data interaction. It explores how data management and analysis can be made more effective when taking into account the people who design and build these processes as well as those who are impacted by their results. Following last year, we plan to continue to focus on this year's workshop on early-stage research that is promising and exciting. A core part of this plan is that every paper gets a mentor. The theme for this edition of the workshop is commodifying human-in-the-loop data analytics, i.e., making systems ready for end-user consumption. However, the workshop is not limited to this theme and other topics are also of interest. In this summary, we describe the workshop, its main focus areas and our review and mentorship plan.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122693951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信