Proceedings of the 2021 International Conference on Management of Data最新文献

筛选
英文 中文
QuTE: Answering Quantity Queries from Web Tables QuTE:回答来自Web表的数量查询
Proceedings of the 2021 International Conference on Management of Data Pub Date : 2021-06-09 DOI: 10.1145/3448016.3452763
Vinh Thinh Ho, K. Pal, G. Weikum
{"title":"QuTE: Answering Quantity Queries from Web Tables","authors":"Vinh Thinh Ho, K. Pal, G. Weikum","doi":"10.1145/3448016.3452763","DOIUrl":"https://doi.org/10.1145/3448016.3452763","url":null,"abstract":"Quantities are financial, technological, physical and other measures that denote relevant properties of entities, such as revenue of companies, energy efficiency of cars or distance and brightness of stars and galaxies. Queries with filter conditions on quantities are an important building block for downstream analytics, and pose challenges when the content of interest is spread across a huge number of web tables and other ad-hoc datasets. Search engines support quantity lookups, but largely fail on quantity filters. The QuTE system presented in this paper aims to overcome these problems. It comprises methods for automatically extracting entity-quantity facts from web tables, as well as methods for online query processing, with new techniques for query matching and answer ranking.","PeriodicalId":360379,"journal":{"name":"Proceedings of the 2021 International Conference on Management of Data","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115152048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Citus
Proceedings of the 2021 International Conference on Management of Data Pub Date : 2021-06-09 DOI: 10.1145/3448016.3457551
Umur Cubukcu, Ozgün Erdogan, Sumedh Pathak, Sudhakar Sannakkayala, M. Ślot
{"title":"Citus","authors":"Umur Cubukcu, Ozgün Erdogan, Sumedh Pathak, Sudhakar Sannakkayala, M. Ślot","doi":"10.1145/3448016.3457551","DOIUrl":"https://doi.org/10.1145/3448016.3457551","url":null,"abstract":"Citus is an open source distributed database engine for PostgreSQL that is implemented as an extension. Citus gives users the ability to distribute data, queries, and transactions in PostgreSQL across a cluster of PostgreSQL servers to handle the needs of data-intensive applications. The development of Citus has largely been driven by conversations with companies looking to scale PostgreSQL beyond a single server and their workload requirements. This paper describes the requirements of four common workload patterns and how Citus addresses those requirements. It also shares benchmark results demonstrating the performance and scalability of Citus in each of the workload patterns and describes how Microsoft uses Citus to address one of its most challenging data problems.","PeriodicalId":360379,"journal":{"name":"Proceedings of the 2021 International Conference on Management of Data","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115228379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Structure-Aware Machine Learning over Multi-Relational Databases 基于多关系数据库的结构感知机器学习
Proceedings of the 2021 International Conference on Management of Data Pub Date : 2021-06-09 DOI: 10.1145/3448016.3461670
Maximilian Schleich
{"title":"Structure-Aware Machine Learning over Multi-Relational Databases","authors":"Maximilian Schleich","doi":"10.1145/3448016.3461670","DOIUrl":"https://doi.org/10.1145/3448016.3461670","url":null,"abstract":"We consider the problem of computing machine learning models over multi-relational databases. The mainstream approach involves a costly repeated loop that data scientists have to deal with on a daily basis: select features from data residing in relational databases using feature extraction queries involving joins, projections, and aggregations; export the training dataset defined by such queries; convert this dataset into the format of an external learning tool; and learn the desired model using this tool. In this thesis, we advocate for an alternative approach that avoids this loop and instead tightly integrates the query and learning tasks into one unified solution. By integrating these two tasks, we can exploit structure in the data and the query to optimize the end-to-end learning problem. We provide a framework for structure-aware learning for a variety of commonly used machine learning models that achieves runtime guarantees that can be asymptotically faster than the mainstream approach that first constructs the training dataset. In practice, this asymptotic gap translates into several orders of magnitude performance improvements over state-of-the-art machine learning packages such as TensorFlow, MADlib, scikit-learn, and mlpack. The thesis is composed of three parts. First, we present the methodology and theoretical foundation of structure-aware learning. Then, we report on the design and implementation of LMFAO, an in-memory engine for structure-aware learning over databases. Finally, we present an extensive experimental evaluation. In following, we briefly highlight each of these three parts.","PeriodicalId":360379,"journal":{"name":"Proceedings of the 2021 International Conference on Management of Data","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124247227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
MxTasks: How to Make Efficient Synchronization and Prefetching Easy MxTasks:如何使高效的同步和预取变得容易
Proceedings of the 2021 International Conference on Management of Data Pub Date : 2021-06-09 DOI: 10.1145/3448016.3457268
J. Mühlig, J. Teubner
{"title":"MxTasks: How to Make Efficient Synchronization and Prefetching Easy","authors":"J. Mühlig, J. Teubner","doi":"10.1145/3448016.3457268","DOIUrl":"https://doi.org/10.1145/3448016.3457268","url":null,"abstract":"The hardware environment has changed rapidly in recent years: Many cores, multiple sockets, and large amounts of main memory have become a commodity. To benefit from these highly parallel systems, the software has to be adapted. Sophisticated latch-free data structures and algorithms are often meant to address the situation. But they are cumbersome to develop and may still not provide the desired scalability. As a remedy, we present MxTasking, a task-based framework that assists the design of latch-free and parallel data structures. MxTasking eases the information exchange between applications and the operating system, resulting in novel opportunities to manage resources in a truly hardware- and application-conscious way.","PeriodicalId":360379,"journal":{"name":"Proceedings of the 2021 International Conference on Management of Data","volume":"200 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124932881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Framework for Differentially Private Data Analysis with Multiple Accuracy Requirements 具有多重精度要求的差分私有数据分析框架
Proceedings of the 2021 International Conference on Management of Data Pub Date : 2021-06-09 DOI: 10.1145/3448016.3450587
K. Knopf
{"title":"Framework for Differentially Private Data Analysis with Multiple Accuracy Requirements","authors":"K. Knopf","doi":"10.1145/3448016.3450587","DOIUrl":"https://doi.org/10.1145/3448016.3450587","url":null,"abstract":"Organizations who collect sensitive data, such as hospitals or governments, may want to share the data with others. There could be multiple applications or analysts that want to use this data. Directly releasing the data could violate the privacy of individual data contributors. To address this privacy concern, differential privacy [1,2] has arisen as a popular technique for allow for sensitive data analysis. It frequently works through the addition of randomized noise to the output of the analysis, which is controlled through the privacy parameter or budget ε. This noise affects the utility of the analyses, where a smaller budget allocation results in larger noise values, and some applications may set accuracy requirements on the output to restrict the amount of noise added [3,9,10]. The total privacy loss of a sequence of differentially private mechanisms can be composed by summing up the privacy budgets they use, under the property of sequential composition [2]. Hence, if we intend to run multiple applications or analyses on the same dataset, given a total privacy budget, we can support each application by splitting the privacy budget evenly among them. However, if there are many applications, the privacy budget received per application could be very small, resulting in poor overall utility.","PeriodicalId":360379,"journal":{"name":"Proceedings of the 2021 International Conference on Management of Data","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125836404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Efficiently Supporting Adaptive Multi-Level Serializability Models in Distributed Database Systems 分布式数据库系统中有效支持自适应多级序列化模型
Proceedings of the 2021 International Conference on Management of Data Pub Date : 2021-06-09 DOI: 10.1145/3448016.3450579
Zhanhao Zhao
{"title":"Efficiently Supporting Adaptive Multi-Level Serializability Models in Distributed Database Systems","authors":"Zhanhao Zhao","doi":"10.1145/3448016.3450579","DOIUrl":"https://doi.org/10.1145/3448016.3450579","url":null,"abstract":"Informally, serializability means that transactions appear to have occurred in some total order. In this paper, we show that only the serializability guarantee with some total order is not enough for many real applications. As a complement, extra partial orders of transactions, like real-time order and program order, need to be introduced. Motivated by this observation, we present a framework that models serializable transactions by adding extra partial orders, namely multi-level serializability models. Following this framework, we propose a novel concurrency control algorithm, called bi-directionally timestamp adjustment (BDTA), to supporting multi-level serializability models in distributed database systems. We integrate the framework and BDTA into Greenplum and Deneva to show the benefits of our work. Our experiments show the performance gaps among serializability levels and confirm BDTA achieves up to 1.7× better than state-of-the-art concurrency control algorithms.","PeriodicalId":360379,"journal":{"name":"Proceedings of the 2021 International Conference on Management of Data","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116420615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
P 2 B-Trace p2b - trace
Proceedings of the 2021 International Conference on Management of Data Pub Date : 2021-06-09 DOI: 10.1145/3448016.3459237
Zhe Peng, Cheng Xu., Haixin Wang, Jinbin Huang, Jianliang Xu, Xiaowen Chu
{"title":"P\u0000 2\u0000 B-Trace","authors":"Zhe Peng, Cheng Xu., Haixin Wang, Jinbin Huang, Jianliang Xu, Xiaowen Chu","doi":"10.1145/3448016.3459237","DOIUrl":"https://doi.org/10.1145/3448016.3459237","url":null,"abstract":"The eruption of a pandemic, such as COVID-19, can cause an unprecedented global crisis. Contact tracing, as a pillar of communicable disease control in public health for decades, has shown its effectiveness on pandemic control. Despite intensive research on contact tracing, existing schemes are vulnerable to attacks and can hardly simultaneously meet the requirements of data integrity and user privacy. The design of a privacy-preserving contact tracing framework to ensure the integrity of the tracing procedure has not been sufficiently studied and remains a challenge. In this paper, we propose P2B-Trace, a privacy-preserving contact tracing initiative based on blockchain. First, we design a decentralized architecture with blockchain to record an authenticated data structure of the user's contact records, which prevents the user from intentionally modifying his local records afterward. Second, we develop a zero-knowledge proximity verification scheme to further verify the user's proximity claim while protecting user privacy. We implement P2B-Trace and conduct experiments to evaluate the cost of privacy-preserving tracing integrity verification. The evaluation results demonstrate the effectiveness of our proposed system.","PeriodicalId":360379,"journal":{"name":"Proceedings of the 2021 International Conference on Management of Data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122619095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
An In-Depth Benchmarking of Text-to-SQL Systems 文本到sql系统的深度基准测试
Proceedings of the 2021 International Conference on Management of Data Pub Date : 2021-06-09 DOI: 10.1145/3448016.3452836
Orest Gkini, Theofilos Belmpas, G. Koutrika, Y. Ioannidis
{"title":"An In-Depth Benchmarking of Text-to-SQL Systems","authors":"Orest Gkini, Theofilos Belmpas, G. Koutrika, Y. Ioannidis","doi":"10.1145/3448016.3452836","DOIUrl":"https://doi.org/10.1145/3448016.3452836","url":null,"abstract":"Text-to-SQL systems allow users to explore relational databases by posing free-form queries, alleviating the need for using structured languages, such as SQL. Although numerous systems have been developed so far, existing system evaluations lack in rigour. In this work, we build a text-to-SQL benchmark that covers different classes of queries, and we evaluate the effectiveness of several systems in the field. To evaluate system efficiency, we measure execution time and resource consumption for the different query classes. Our comprehensive evaluation aims at filling in a big gap in understanding the capabilities and boundaries of existing systems and it reveals several open challenges.","PeriodicalId":360379,"journal":{"name":"Proceedings of the 2021 International Conference on Management of Data","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122622227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
GraphGem: Optimized Scalable System for Graph Convolutional Networks GraphGem:图卷积网络的优化可扩展系统
Proceedings of the 2021 International Conference on Management of Data Pub Date : 2021-06-09 DOI: 10.1145/3448016.3450573
Advitya Gemawat
{"title":"GraphGem: Optimized Scalable System for Graph Convolutional Networks","authors":"Advitya Gemawat","doi":"10.1145/3448016.3450573","DOIUrl":"https://doi.org/10.1145/3448016.3450573","url":null,"abstract":"Deep Learning (DL), especially Graph Convolutional Networks (GCNs) have revolutionized several domains and applications dealing with unstructured data with non-euclidean and graphical relationships. Constructing large-scale Deep GCNs, however, are bottlenecked by glaring systems issues due to memory blow-ups, runtime slowdowns with random access, and I/O costs. This research abstract identifies various systems and scalability issues and proposes a novel system called GraphGem to handle GCN-centric DL tasks end-to-end. GraphGem tackles the bottlenecks by elevating entire GCN workloads for convenient input declarations by the user, and is inspired by lessons from the databases and machine learning systems worlds. This abstract also highlights the bigger picture of the potential research impact alongside tacking systems constraints and what it may mean for data science and deep learning practitioners going forward.","PeriodicalId":360379,"journal":{"name":"Proceedings of the 2021 International Conference on Management of Data","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114287437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Efficient and Effective Algorithms for Revenue Maximization in Social Advertising 社交广告中收益最大化的高效算法
Proceedings of the 2021 International Conference on Management of Data Pub Date : 2021-06-09 DOI: 10.1145/3448016.3459243
Kai Han, Benwei Wu, Jing Tang, Shuang Cui, Çigdem Aslay, L. Lakshmanan
{"title":"Efficient and Effective Algorithms for Revenue Maximization in Social Advertising","authors":"Kai Han, Benwei Wu, Jing Tang, Shuang Cui, Çigdem Aslay, L. Lakshmanan","doi":"10.1145/3448016.3459243","DOIUrl":"https://doi.org/10.1145/3448016.3459243","url":null,"abstract":"We consider the revenue maximization problem in social advertising, where a social network platform owner needs to select seed users for a group of advertisers, each with a payment budget, such that the total expected revenue that the owner gains from the advertisers by propagating their ads in the network is maximized. Previous studies on this problem show that it is intractable and present approximation algorithms. We revisit this problem from a fresh perspective and develop novel efficient approximation algorithms, both under the setting where an exact influence oracle is assumed and under one where this assumption is relaxed. Our approximation ratios significantly improve upon the previous ones. Furthermore, we empirically show, using extensive experiments on four datasets, that our algorithms considerably outperform the existing methods on both the solution quality and computation efficiency.","PeriodicalId":360379,"journal":{"name":"Proceedings of the 2021 International Conference on Management of Data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114480774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信