Proceedings of the 2021 International Conference on Management of Data最新文献

QuTE: Answering Quantity Queries from Web Tables QuTE:回答来自Web表的数量查询

Proceedings of the 2021 International Conference on Management of Data Pub Date : 2021-06-09 DOI: 10.1145/3448016.3452763

Vinh Thinh Ho, K. Pal, G. Weikum

引用次数: 6

Citus

Proceedings of the 2021 International Conference on Management of Data Pub Date : 2021-06-09 DOI: 10.1145/3448016.3457551

Umur Cubukcu, Ozgün Erdogan, Sumedh Pathak, Sudhakar Sannakkayala, M. Ślot

引用次数: 2

Structure-Aware Machine Learning over Multi-Relational Databases 基于多关系数据库的结构感知机器学习

Proceedings of the 2021 International Conference on Management of Data Pub Date : 2021-06-09 DOI: 10.1145/3448016.3461670

Maximilian Schleich

{"title":"Structure-Aware Machine Learning over Multi-Relational Databases","authors":"Maximilian Schleich","doi":"10.1145/3448016.3461670","DOIUrl":"https://doi.org/10.1145/3448016.3461670","url":null,"abstract":"We consider the problem of computing machine learning models over multi-relational databases. The mainstream approach involves a costly repeated loop that data scientists have to deal with on a daily basis: select features from data residing in relational databases using feature extraction queries involving joins, projections, and aggregations; export the training dataset defined by such queries; convert this dataset into the format of an external learning tool; and learn the desired model using this tool. In this thesis, we advocate for an alternative approach that avoids this loop and instead tightly integrates the query and learning tasks into one unified solution. By integrating these two tasks, we can exploit structure in the data and the query to optimize the end-to-end learning problem. We provide a framework for structure-aware learning for a variety of commonly used machine learning models that achieves runtime guarantees that can be asymptotically faster than the mainstream approach that first constructs the training dataset. In practice, this asymptotic gap translates into several orders of magnitude performance improvements over state-of-the-art machine learning packages such as TensorFlow, MADlib, scikit-learn, and mlpack. The thesis is composed of three parts. First, we present the methodology and theoretical foundation of structure-aware learning. Then, we report on the design and implementation of LMFAO, an in-memory engine for structure-aware learning over databases. Finally, we present an extensive experimental evaluation. In following, we briefly highlight each of these three parts.","PeriodicalId":360379,"journal":{"name":"Proceedings of the 2021 International Conference on Management of Data","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124247227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

MxTasks: How to Make Efficient Synchronization and Prefetching Easy MxTasks:如何使高效的同步和预取变得容易

Proceedings of the 2021 International Conference on Management of Data Pub Date : 2021-06-09 DOI: 10.1145/3448016.3457268

J. Mühlig, J. Teubner

引用次数: 3

Framework for Differentially Private Data Analysis with Multiple Accuracy Requirements 具有多重精度要求的差分私有数据分析框架

Proceedings of the 2021 International Conference on Management of Data Pub Date : 2021-06-09 DOI: 10.1145/3448016.3450587

K. Knopf

{"title":"Framework for Differentially Private Data Analysis with Multiple Accuracy Requirements","authors":"K. Knopf","doi":"10.1145/3448016.3450587","DOIUrl":"https://doi.org/10.1145/3448016.3450587","url":null,"abstract":"Organizations who collect sensitive data, such as hospitals or governments, may want to share the data with others. There could be multiple applications or analysts that want to use this data. Directly releasing the data could violate the privacy of individual data contributors. To address this privacy concern, differential privacy [1,2] has arisen as a popular technique for allow for sensitive data analysis. It frequently works through the addition of randomized noise to the output of the analysis, which is controlled through the privacy parameter or budget ε. This noise affects the utility of the analyses, where a smaller budget allocation results in larger noise values, and some applications may set accuracy requirements on the output to restrict the amount of noise added [3,9,10]. The total privacy loss of a sequence of differentially private mechanisms can be composed by summing up the privacy budgets they use, under the property of sequential composition [2]. Hence, if we intend to run multiple applications or analyses on the same dataset, given a total privacy budget, we can support each application by splitting the privacy budget evenly among them. However, if there are many applications, the privacy budget received per application could be very small, resulting in poor overall utility.","PeriodicalId":360379,"journal":{"name":"Proceedings of the 2021 International Conference on Management of Data","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125836404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Efficiently Supporting Adaptive Multi-Level Serializability Models in Distributed Database Systems 分布式数据库系统中有效支持自适应多级序列化模型

Proceedings of the 2021 International Conference on Management of Data Pub Date : 2021-06-09 DOI: 10.1145/3448016.3450579

Zhanhao Zhao

引用次数: 0

P 2 B-Trace p2b - trace

Proceedings of the 2021 International Conference on Management of Data Pub Date : 2021-06-09 DOI: 10.1145/3448016.3459237

Zhe Peng, Cheng Xu., Haixin Wang, Jinbin Huang, Jianliang Xu, Xiaowen Chu

{"title":"P\u0000 2\u0000 B-Trace","authors":"Zhe Peng, Cheng Xu., Haixin Wang, Jinbin Huang, Jianliang Xu, Xiaowen Chu","doi":"10.1145/3448016.3459237","DOIUrl":"https://doi.org/10.1145/3448016.3459237","url":null,"abstract":"The eruption of a pandemic, such as COVID-19, can cause an unprecedented global crisis. Contact tracing, as a pillar of communicable disease control in public health for decades, has shown its effectiveness on pandemic control. Despite intensive research on contact tracing, existing schemes are vulnerable to attacks and can hardly simultaneously meet the requirements of data integrity and user privacy. The design of a privacy-preserving contact tracing framework to ensure the integrity of the tracing procedure has not been sufficiently studied and remains a challenge. In this paper, we propose P2B-Trace, a privacy-preserving contact tracing initiative based on blockchain. First, we design a decentralized architecture with blockchain to record an authenticated data structure of the user's contact records, which prevents the user from intentionally modifying his local records afterward. Second, we develop a zero-knowledge proximity verification scheme to further verify the user's proximity claim while protecting user privacy. We implement P2B-Trace and conduct experiments to evaluate the cost of privacy-preserving tracing integrity verification. The evaluation results demonstrate the effectiveness of our proposed system.","PeriodicalId":360379,"journal":{"name":"Proceedings of the 2021 International Conference on Management of Data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122619095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 33

An In-Depth Benchmarking of Text-to-SQL Systems 文本到sql系统的深度基准测试

Proceedings of the 2021 International Conference on Management of Data Pub Date : 2021-06-09 DOI: 10.1145/3448016.3452836

Orest Gkini, Theofilos Belmpas, G. Koutrika, Y. Ioannidis

引用次数: 12

GraphGem: Optimized Scalable System for Graph Convolutional Networks GraphGem:图卷积网络的优化可扩展系统

Proceedings of the 2021 International Conference on Management of Data Pub Date : 2021-06-09 DOI: 10.1145/3448016.3450573

Advitya Gemawat

引用次数: 1

Efficient and Effective Algorithms for Revenue Maximization in Social Advertising 社交广告中收益最大化的高效算法

Proceedings of the 2021 International Conference on Management of Data Pub Date : 2021-06-09 DOI: 10.1145/3448016.3459243

Kai Han, Benwei Wu, Jing Tang, Shuang Cui, Çigdem Aslay, L. Lakshmanan

引用次数: 5