Companion of the 2023 International Conference on Management of Data最新文献_第7页

Making Data Clouds Smarter at Keebo: Automated Warehouse Optimization using Data Learning 在Keebo使数据云更智能:使用数据学习实现自动化仓库优化

Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3589681

Barzan Mozafari, Radu Alexandru Burcuta, Alan Cabrera, A. Constantin, Derek Francis, David Grömling, Alekh Jindal, Maciej Konkolowicz, Valentin Marian Spac, Yongjoo Park, Russell Razo Carranzo, Nicholas M. Richardson, Abhishek Roy, Aayushi Srivastava, Isha Tarte, B. Westphal, Chi Zhang

{"title":"Making Data Clouds Smarter at Keebo: Automated Warehouse Optimization using Data Learning","authors":"Barzan Mozafari, Radu Alexandru Burcuta, Alan Cabrera, A. Constantin, Derek Francis, David Grömling, Alekh Jindal, Maciej Konkolowicz, Valentin Marian Spac, Yongjoo Park, Russell Razo Carranzo, Nicholas M. Richardson, Abhishek Roy, Aayushi Srivastava, Isha Tarte, B. Westphal, Chi Zhang","doi":"10.1145/3555041.3589681","DOIUrl":"https://doi.org/10.1145/3555041.3589681","url":null,"abstract":"Data clouds in general, and cloud data warehouses (CDWs) in particular, have lowered the upfront expertise and infrastructure barriers, making it easy for a wider range of users to query large and diverse sources of data. This has made modern data pipelines more complex, harder to optimize, and therefore less resource efficient. As a result, the ongoing cost of data clouds can easily become prohibitively expensive. Further, since CDWs are general-purpose solutions that must serve a wide range of workloads, their out-of-box performance is sub-optimal for any single workload. Data teams therefore spend significant effort manually optimizing their queries and cloud infrastructure to curb costs while achieving reasonable performance. Aside from the opportunity cost of diverting data teams from business goals, manual optimization of millions of constantly changing queries is simply daunting. To the best of our knowledge, Keebo's Warehouse Optimization is the first fully-automated solution capable of making real-time optimization decisions that minimize the CDWs' overall cost while meeting the users' performance goals. Keebo learns from how users and applications interact with their CDW and uses its trained models to automatically optimize the warehouse settings, adjusts its resources (e.g., compute, memory), scale it up or down, suspend or resume it, and also self-correct in real-time based on the impact of its own actions.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125469631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Growing and Serving Large Open-domain Knowledge Graphs 增长和服务大型开放领域知识图谱

Companion of the 2023 International Conference on Management of Data Pub Date : 2023-05-16 DOI: 10.1145/3555041.3589672

I. Ilyas, JP Lacerda, Yunyao Li, U. F. Minhas, A. Mousavi, Jeffrey Pound, Theodoros Rekatsinas, C. Sumanth

{"title":"Growing and Serving Large Open-domain Knowledge Graphs","authors":"I. Ilyas, JP Lacerda, Yunyao Li, U. F. Minhas, A. Mousavi, Jeffrey Pound, Theodoros Rekatsinas, C. Sumanth","doi":"10.1145/3555041.3589672","DOIUrl":"https://doi.org/10.1145/3555041.3589672","url":null,"abstract":"Applications of large open-domain knowledge graphs (KGs) to real-world problems pose many unique challenges. In this paper, we present extensions to Saga our platform for continuous construction and serving of knowledge at scale. In particular, we describe a pipeline for training knowledge graph embeddings that powers key capabilities such as fact ranking, fact verification, a related entities service, and support for entity linking. We then describe how our platform, including graph embeddings, can be leveraged to create a Semantic Annotation service that links unstructured Web documents to entities in our KG. Semantic annotation of the Web effectively expands our knowledge graph with edges to open-domain Web content which can be used in various search and ranking problems. Finally, we leverage annotated Web documents to drive Open-domain Knowledge Extraction. This targeted extraction framework identifies important coverage issues in the KG, then finds relevant data sources for target entities on the Web and extracts missing information to enrich the KG. Finally, we describe adaptations to our knowledge platform needed to construct and serve private personal knowledge on-device. This includes private incremental KG construction, cross- device knowledge sync, and global knowledge enrichment.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127675953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DIALITE: Discover, Align and Integrate Open Data Tables DIALITE:发现，对齐和集成开放数据表

Companion of the 2023 International Conference on Management of Data Pub Date : 2023-04-17 DOI: 10.1145/3555041.3589732

Aamod Khatiwada, Roee Shraga, Renée J. Miller

引用次数: 3

Dangoron: Network Construction on Large-scale Time Series Data across Sliding Windows 跨滑动窗口的大规模时间序列数据的网络构建

Companion of the 2023 International Conference on Management of Data Pub Date : 2023-04-11 DOI: 10.1145/3555041.3589399

Yunlong Xu, Peizhen Yang, Zhengbin Tao

引用次数: 0

Data Processing with FPGAs on Modern Architectures 现代架构下fpga的数据处理

Companion of the 2023 International Conference on Management of Data Pub Date : 2023-04-06 DOI: 10.1145/3555041.3589410

Wen Jiang, Dario Korolija, G. Alonso

{"title":"Data Processing with FPGAs on Modern Architectures","authors":"Wen Jiang, Dario Korolija, G. Alonso","doi":"10.1145/3555041.3589410","DOIUrl":"https://doi.org/10.1145/3555041.3589410","url":null,"abstract":"Trends in hardware, the prevalence of the cloud, and the rise of highly demanding applications have ushered an era of specialization that is quickly changing the way data is processed at scale. These changes are likely to continue and accelerate in the next years as new technologies are adopted and deployed: smart NICs, smart storage, smart memory, disaggregated storage, disaggregated memory, specialized accelerators (GPUS, TPUs, FPGAs), as well as a wealth of ASICS specifically created to deal with computationally expensive tasks (e.g., cryptography or compression). In this tutorial we focus on data processing on FPGAs, a technology that has received less attention than, e.g., TPUs or GPUs but that is, however, increasingly being deployed in the cloud for data processing tasks due to the architectural flexibility of FPGAs and their ability to process data at line rate, something not possible with other type of processors or accelerators. In the tutorial we will cover what are FPGAs, their characteristics, their advantages and disadvantages over other design options, as well as examples from deployments in industry and how they are used in a variety of data processing tasks. Then we will provide a brief introduction to FPGA programming with High Level Synthesis (HLS) tools as well as briefly describe resources available to researchers in the form of academic clusters and open source systems that simplify the first steps. The tutorial will also include several case studies borrowed from research done in collaboration with companies that illustrate both the potential of FPGAs in data processing but also how software and hardware architectures are evolving to take advantage of the possibilities offered by FPGAs.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115461664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Sampling over Union of Joins 连接并上的抽样

Companion of the 2023 International Conference on Management of Data Pub Date : 2023-03-02 DOI: 10.1145/3555041.3589400

Yurong Liu, Yunlong Xu, F. Nargesian

引用次数: 0

Main Memory Database Recovery Strategies 主内存数据库恢复策略

Companion of the 2023 International Conference on Management of Data Pub Date : 2022-09-19 DOI: 10.1145/3555041.3589402

Arlino Magalhães, Angelo Brayner, José Maria S. Monteiro

{"title":"Main Memory Database Recovery Strategies","authors":"Arlino Magalhães, Angelo Brayner, José Maria S. Monteiro","doi":"10.1145/3555041.3589402","DOIUrl":"https://doi.org/10.1145/3555041.3589402","url":null,"abstract":"Most of the current application scenarios, such as trading, real-time bidding, advertising, weather forecasting, social gaming, etc., require massive real-time data processing. Main memory database systems have proved to be an efficient alternative to such applications. These systems maintain the primary copy of the database in the main memory to achieve high throughput rates and low latency. However, a database in RAM is more vulnerable to failures than in traditional disk-oriented databases because of the memory volatility. DBMSs implement recovery activities (logging, checkpoint, and restart) for recovery proposes. Although the recovery component looks similar in disk- and memory-oriented systems, these systems differ dramatically in the way they implement their architectural components, such as data storage, indexing, concurrency control, query processing, durability, and recovery. This tutorial aims to provide a thorough review of in-memory database recovery techniques. To achieve this goal, we intend to review the main concepts of database recovery and architectural choices to implement an in-memory database system. Only then, we present the techniques to recover in-memory databases and discuss the recovery strategies of a representative sample of modern in-memory databases. Besides, the tutorial presents some aspects related to challenges and future directions of research in MMDBs in order to provide guidance for other researchers.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128503394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Companion of the 2023 International Conference on Management of Data 《2023年数据管理国际会议指南

Companion of the 2023 International Conference on Management of Data Pub Date : 1900-01-01 DOI: 10.1145/3555041

引用次数: 0