Companion of the 2023 International Conference on Management of Data最新文献_第2页

Link Local Differential Privacy in GNNs via Bayesian Estimation 基于贝叶斯估计的gnn链路局部差分隐私

Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3589398

Xiaochen Zhu

引用次数: 0

Disaggregated Database Systems 分类数据库系统

Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3589403

Jianguo Wang, Qizhen Zhang

{"title":"Disaggregated Database Systems","authors":"Jianguo Wang, Qizhen Zhang","doi":"10.1145/3555041.3589403","DOIUrl":"https://doi.org/10.1145/3555041.3589403","url":null,"abstract":"Disaggregated database systems achieve unprecedented excellence in elasticity and resource utilization at the cloud scale and have gained great momentum from both industry and academia recently. Such systems are developed in response to the emerging trend of disaggregated data centers where resources are physically separated and connected through fast data center networks. Database management systems have been traditionally built based on monolithic architectures, so disaggregation fundamentally challenges the designs. On the other hand, disaggregation offers benefits like independent scaling of compute, memory, and storage. Nonetheless, there is a lack of systematic investigation into new research challenges and opportunities in recent disaggregated database systems. To provide database researchers and practitioners with insights into different forms of resource disaggregation, we take a snapshot of state-of-the-art disaggregated database systems and related techniques and present an in-depth tutorial. The primary goal is to better understand the enabling techniques and characteristics of resource disaggregation and its implications for next-generation database systems. To that end, we survey recent work on storage disaggregation, which separates secondary storage devices (e.g., SSDs) from compute servers and is widely deployed in current cloud data centers, and memory disaggregation, which further splits compute and memory with Remote Direct Memory Access (RDMA) and is driving the transformation of clouds. In addition, we mention two techniques that bring novel perspectives to the above two paradigms: persistent memory and Compute Express Link (CXL). Finally, we identify several directions that shed light on the future development of disaggregated database systems.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121471158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

International Workshop on Data Management on New Hardware (DaMoN) 新硬件数据管理国际研讨会

Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3590816

Norman May, Nesime Tatbul

引用次数: 0

Aggregation and Exploration of High-Dimensional Data Using the Sudokube Data Cube Engine 使用数独数据立方体引擎聚合和探索高维数据

Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3589729

Sachin Basil John, P. Lindner, Zhekai Jiang, Christoph E. Koch

引用次数: 0

Fast Natural Language Based Data Exploration with Samples 基于样本的快速自然语言数据探索

Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3589724

Shubham Agarwal, G. Chan, Shaddy Garg, Tong Yu, Subrata Mitra

{"title":"Fast Natural Language Based Data Exploration with Samples","authors":"Shubham Agarwal, G. Chan, Shaddy Garg, Tong Yu, Subrata Mitra","doi":"10.1145/3555041.3589724","DOIUrl":"https://doi.org/10.1145/3555041.3589724","url":null,"abstract":"The ability to extract insights from large amounts of data in a timely manner is a crucial problem. Exploratory Data Analysis (EDA) is commonly used by analysts to uncover insights using a sequence of SQL commands and associated visualizations. However, in many cases, this process is carried out by non-programmers who must work within tight time constraints, such as in a marketing campaign where a marketer must quickly analyse large amounts of data to reach a target revenue. This paper presents ApproxEDA - a system that combines a natural language processing (NLP) interface for insight discovery with an underlying sample-based EDA engine. The NLP interface can convert high-level questions into contextual SQL queries of the dataset, while the backend EDA engine significantly speeds up insight discovery by selecting the most optimum sample from among many pre-created samples using various sampling strategies. We demonstrate that ApproxEDA addresses two key aspects: converting high-level NLP inputs to contextual SQL and intelligently selecting samples using a reinforcement learning agent. This protects users from diverging from their original intent of analysis, which can occur due to approximation errors in results and visualizations, while still providing optimal latency reduction through the use of samples.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121651707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

49 Years of Queries 49年的查询

Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3589336

D. Chamberlin

引用次数: 1

BCNF* - From Normalized- to Star-Schemas and Back Again BCNF* -从规范化到星型模式再回来

Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3589712

Marie Fischer, Paul Roessler, Paul Sieben, Janina Adamcic, Christoph Kirchherr, Tobias Straeubig, Youri Kaminsky, Felix Naumann

引用次数: 0

Fast Joint Shapley Values 快速关节夏普利值

Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3589393

Mihail Stoian

引用次数: 0

Faster FFT-based Wildcard Pattern Matching 更快的基于fft的通配符模式匹配

Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3589391

Mihail Stoian

{"title":"Faster FFT-based Wildcard Pattern Matching","authors":"Mihail Stoian","doi":"10.1145/3555041.3589391","DOIUrl":"https://doi.org/10.1145/3555041.3589391","url":null,"abstract":"We study the problem of pattern matching with wildcards, which naturally occurs in the SQL expression like. It consists in finding the occurrences of a pattern P, |P| = m, in a text T, |T| = n, where the pattern possibly contains wildcards, i.e., special characters that can match any letter of the alphabet. The naive algorithm to this problem achieves O(nm) since in O(m) we need to check at each position of T whether a match is possible. For this purpose, several algorithms have been proposed, the simplest one being a deterministic FFT-based algorithm where pattern matching is interpreted in algebraic form, i.e., P = T iff (P-T)^2 = 0. This naturally leads to an O(n log n) algorithm via FFT, as we can evaluate the binomial and search for zero-valued coefficients. Clifford et al. introduce a trick to achieve O(n log m): Instead of matching the entire text to the pattern, the text is divided into n / m overlapping slices of length 2m, which are then matched to the pattern in O(m log m). The total time complexity is then O((n / m) m log m) = O(n log m). We mention that other works, especially in pattern matching with errors, rely on this trick. However, the O-expression hides in this case a factor of 4, assuming m = 2^k. This is because FFT-based matching between strings of length m and 2m, respectively, actually requires 4m log 4m steps, since the result is of size 3m - 1 and FFT requires a power of two as the size. We argue that this trick incurs redundancy, and show how it can be discarded to achieve a twice as fast O(n log m) algorithm without compromise. Furthermore, we show by experiments that the proposed algorithm approaches the theoretical improvement.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132528597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Auto-WLM: Machine Learning Enhanced Workload Management in Amazon Redshift Auto-WLM: Amazon Redshift中机器学习增强的工作负载管理

Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3589677

Gaurav Saxena, Mohammad Rahman, Naresh Chainani, Chunbin Lin, George C. Caragea, Fahim Chowdhury, Ryan Marcus, Tim Kraska, I. Pandis, Balakrishnan Narayanaswamy

{"title":"Auto-WLM: Machine Learning Enhanced Workload Management in Amazon Redshift","authors":"Gaurav Saxena, Mohammad Rahman, Naresh Chainani, Chunbin Lin, George C. Caragea, Fahim Chowdhury, Ryan Marcus, Tim Kraska, I. Pandis, Balakrishnan Narayanaswamy","doi":"10.1145/3555041.3589677","DOIUrl":"https://doi.org/10.1145/3555041.3589677","url":null,"abstract":"There has been a lot of excitement around using machine learning to improve the performance and usability of database systems. However, few of these techniques have actually been used in the critical path of customer-facing database services. In this paper, we describe Auto-WLM, a machine learning based automatic workload manager currently used in production in Amazon Redshift. Auto-WLM is an example of how machine learning can improve the performance of large data-warehouses in practice and at scale. Auto-WLM intelligently schedules workloads to maximize throughput and horizontally scales clusters in response to workload spikes. While traditional heuristic-based workload management requires a lot of manual tuning (e.g. of the concurrency level, memory allocated to queries etc.) for each specific workload, Auto-WLM does this tuning automatically and as a result is able to quickly adapt and react to workload changes and demand spikes. At its core, Auto-WLM uses locally-trained query performance models to predict the query execution time and memory needs for each query, and uses this to make intelligent scheduling decisions. Currently, Auto-WLM makes millions of decisions every day, and constantly optimizes the performance for each individual Amazon Redshift cluster. In this paper, we will describe the advantages and challenges of implementing and deploying Auto-WLM, as well as outline areas of research that may be of interest to those in the \"ML for systems'' community with an eye for practicality.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116045221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1