Proceedings of the Vldb Endowment最新文献_第2页

Progressive Partitioning for Parallelized Query Execution in Google's Napa b谷歌的Napa并行查询执行的渐进式分区

3区计算机科学

Proceedings of the Vldb Endowment Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611541

Junichi Tatemura, Tao Zou, Jagan Sankaranarayanan, Yanlai Huang, Jim Chen, Yupu Zhang, Kevin Lai, Hao Zhang, Gokul Nath Babu Manoharan, Goetz Graefe, Divyakant Agrawal, Brad Adelberg, Shilpa Kolhar, Indrajit Roy

{"title":"Progressive Partitioning for Parallelized Query Execution in Google's Napa","authors":"Junichi Tatemura, Tao Zou, Jagan Sankaranarayanan, Yanlai Huang, Jim Chen, Yupu Zhang, Kevin Lai, Hao Zhang, Gokul Nath Babu Manoharan, Goetz Graefe, Divyakant Agrawal, Brad Adelberg, Shilpa Kolhar, Indrajit Roy","doi":"10.14778/3611540.3611541","DOIUrl":"https://doi.org/10.14778/3611540.3611541","url":null,"abstract":"Napa holds Google's critical data warehouses in log-structured merge trees for real-time data ingestion and sub-second response for billions of queries per day. These queries are often multi-key look-ups in highly skewed tables and indexes. In our production experience, only progressive query-specific partitioning can achieve Napa's strict query latency SLOs. Here we advocate good-enough partitioning that keeps the per-query partitioning time low without risking uneven work distribution. Our design combines pragmatic system choices and algorithmic innovations. For instance, B-trees are augmented with statistics of key distributions, thus serving the dual purpose of aiding lookups and partitioning. Furthermore, progressive partitioning is designed to be \"good enough\" thereby balancing partitioning time with performance. The resulting system is robust and successfully serves day-in-day-out billions of queries with very high quality of service forming a core infrastructure at Google.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135003653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel PyTorch FSDP:扩展完全分片数据并行的经验

3区计算机科学

Proceedings of the Vldb Endowment Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611569

Yanli Zhao, Andrew Gu, Rohan Varma, Liang Luo, Chien-Chin Huang, Min Xu, Less Wright, Hamid Shojanazeri, Myle Ott, Sam Shleifer, Alban Desmaison, Can Balioglu, Pritam Damania, Bernard Nguyen, Geeta Chauhan, Yuchen Hao, Ajit Mathews, Shen Li

{"title":"PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel","authors":"Yanli Zhao, Andrew Gu, Rohan Varma, Liang Luo, Chien-Chin Huang, Min Xu, Less Wright, Hamid Shojanazeri, Myle Ott, Sam Shleifer, Alban Desmaison, Can Balioglu, Pritam Damania, Bernard Nguyen, Geeta Chauhan, Yuchen Hao, Ajit Mathews, Shen Li","doi":"10.14778/3611540.3611569","DOIUrl":"https://doi.org/10.14778/3611540.3611569","url":null,"abstract":"It is widely acknowledged that large models have the potential to deliver superior performance across a broad range of domains. Despite the remarkable progress made in the field of machine learning systems research, which has enabled the development and exploration of large models, such abilities remain confined to a small group of advanced users and industry leaders, resulting in an implicit technical barrier for the wider community to access and leverage these technologies. In this paper, we introduce PyTorch Fully Sharded Data Parallel (FSDP) as an industry-grade solution for large model training. FSDP has been closely co-designed with several key PyTorch core components including Tensor implementation, dispatcher system, and CUDA memory caching allocator, to provide non-intrusive user experiences and high training efficiency. Additionally, FSDP natively incorporates a range of techniques and settings to optimize resource utilization across a variety of hardware configurations. The experimental results demonstrate that FSDP is capable of achieving comparable performance to Distributed Data Parallel while providing support for significantly larger models with near-linear scalability in terms of TFLOPS.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135165172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 43

Demonstrating ADOPT: Adaptively Optimizing Attribute Orders for Worst-Case Optimal Joins via Reinforcement Learning 示范采用:通过强化学习自适应优化最坏情况下最优连接的属性顺序

3区计算机科学

Proceedings of the Vldb Endowment Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611629

Junxiong Wang, Mitchell Gray, Immanuel Trummer, Ahmet Kara, Dan Olteanu

{"title":"Demonstrating ADOPT: Adaptively Optimizing Attribute Orders for Worst-Case Optimal Joins via Reinforcement Learning","authors":"Junxiong Wang, Mitchell Gray, Immanuel Trummer, Ahmet Kara, Dan Olteanu","doi":"10.14778/3611540.3611629","DOIUrl":"https://doi.org/10.14778/3611540.3611629","url":null,"abstract":"Performance of worst-case optimal join algorithms depends on the order in which the join attributes are processed. It is challenging to identify suitable orders prior to query execution due to the huge search space of possible orders and unreliable execution cost estimates in case of data skew or data correlation. We demonstrate ADOPT, a novel query engine that integrates adaptive query processing with a worst-case optimal join algorithm. ADOPT divides query execution into episodes, during which different attribute orders are invoked. With runtime feedback on performance of different attribute orders, ADOPT rapidly approaches near-optimal orders. Moreover, ADOPT uses a unique data structure which keeps track of the processed input data to prevent redundant work across different episodes. It selects attribute orders to try via reinforcement learning, balancing the need for exploring new orders with the desire to exploit promising orders. In experiments, ADOPT outperforms baselines, including commercial and open-source systems utilizing worst-case optimal join algorithms, particularly for complex queries that are difficult to optimize.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134996879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Demonstrating GPT-DB: Generating Query-Specific and Customizable Code for SQL Processing with GPT-4 演示GPT-DB:使用GPT-4为SQL处理生成特定于查询和可定制的代码

3区计算机科学

Proceedings of the Vldb Endowment Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611630

Immanuel Trummer

{"title":"Demonstrating GPT-DB: Generating Query-Specific and Customizable Code for SQL Processing with GPT-4","authors":"Immanuel Trummer","doi":"10.14778/3611540.3611630","DOIUrl":"https://doi.org/10.14778/3611540.3611630","url":null,"abstract":"GPT-DB generates code for SQL processing in general-purpose programming languages such as Python. Generated code can be freely customized using user-provided natural language instructions. This enables users, for instance, to try out specific libraries for SQL processing or to generate non-standard output while processing. GPT-DB is based on OpenAI's GPT model series, neural networks capable of translating natural language instructions into code. By default, GPT-DB exploits the most recently released GPT-4 model whereas visitors may also select prior versions for comparison. GPT-DB automatically generates query-specific prompts, instructing GPT on code generation. These prompts include a description of the target database, as well as logical query plans described as natural language text, and instructions for customization. GPT-DB automatically verifies, and possibly re-generates, code using a reference database system for result comparisons. It enables users to select code samples for training, thereby increasing accuracy for future queries. The proposed demonstration showcases code generation for various queries and with varying instructions for code customization.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134996884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

EQUI-VOCAL Demonstration: Synthesizing Video Queries from User Interactions EQUI-VOCAL演示:从用户交互中合成视频查询

3区计算机科学

Proceedings of the Vldb Endowment Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611600

Enhao Zhang, Maureen Daum, Dong He, Manasi Ganti, Brandon Haynes, Ranjay Krishna, Magdalena Balazinska

引用次数: 0

A Demonstration of DLBD: Database Logic Bug Detection System 数据库逻辑错误检测系统DLBD的演示

3区计算机科学

Proceedings of the Vldb Endowment Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611584

Xiu Tang, Sai Wu, Dongxiang Zhang, Ziyue Wang, Gongsheng Yuan, Gang Chen

引用次数: 0

Explaining Differentially Private Query Results with DPXPlain 用DPXPlain解释不同的私有查询结果

3区计算机科学

Proceedings of the Vldb Endowment Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611596

Tingyu Wang, Yuchao Tao, Amir Gilad, Ashwin Machanavajjhala, Sudeepa Roy

引用次数: 0

DataRinse: Semantic Transforms for Data Preparation Based on Code Mining 数据挖掘:基于代码挖掘的数据准备语义转换

3区计算机科学

Proceedings of the Vldb Endowment Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611628

Ibrahim Abdelaziz, Julian Dolby, Udayan Khurana, Horst Samulowitz, Kavitha Srinivas

引用次数: 0

Demonstration of SPARQL ^ML : An Interfacing Language for Supporting Graph Machine Learning for RDF Graphs SPARQL ML的演示:一种支持RDF图的图机器学习的接口语言

3区计算机科学

Proceedings of the Vldb Endowment Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611599

Hussein Abdallah, Waleed Afandi, Essam Mansour

引用次数: 0

FastMosaic in Action: A New Mosaic Operator for Array DBMSs FastMosaic的实际应用:用于数组dbms的一种新的马赛克运算符

3区计算机科学

Proceedings of the Vldb Endowment Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611590

Ramon Antonio Rodriges Zalipynis

引用次数: 0