Proceedings of the Vldb Endowment最新文献

筛选
英文 中文
Angel-PTM: A Scalable and Economical Large-Scale Pre-Training System in Tencent Angel-PTM:一种可扩展且经济的腾讯大规模预训练系统
3区 计算机科学
Proceedings of the Vldb Endowment Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611564
Xiaonan Nie, Yi Liu, Fangcheng Fu, Jinbao Xue, Dian Jiao, Xupeng Miao, Yangyu Tao, Bin Cui
{"title":"Angel-PTM: A Scalable and Economical Large-Scale Pre-Training System in Tencent","authors":"Xiaonan Nie, Yi Liu, Fangcheng Fu, Jinbao Xue, Dian Jiao, Xupeng Miao, Yangyu Tao, Bin Cui","doi":"10.14778/3611540.3611564","DOIUrl":"https://doi.org/10.14778/3611540.3611564","url":null,"abstract":"Recent years have witnessed the unprecedented achievements of large-scale pre-trained models, especially Transformer models. Many products and services in Tencent Inc., such as WeChat, QQ, and Tencent Advertisement, have been opted in to gain the power of pre-trained models. In this work, we present Angel-PTM, a productive deep learning system designed for pre-training and fine-tuning Transformer models. Angel-PTM can train extremely large-scale models with hierarchical memory efficiently. The key designs of Angel-PTM are a fine-grained memory management via the Page abstraction and a unified scheduling method that coordinates computations, data movements, and communications. Furthermore, Angel-PTM supports extreme model scaling with SSD storage and implements a lock-free updating mechanism to address the SSD I/O bottlenecks. Experimental results demonstrate that Angel-PTM outperforms existing systems by up to 114.8% in terms of maximum model scale as well as up to 88.9% in terms of training throughput. Additionally, experiments on GPT3-175B and T5-MoE-1.2T models utilizing hundreds of GPUs verify our strong scalability.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135003930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FS-Real: A Real-World Cross-Device Federated Learning Platform FS-Real:一个真实世界的跨设备联合学习平台
3区 计算机科学
Proceedings of the Vldb Endowment Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611617
Dawei Gao, Daoyuan Chen, Zitao Li, Yuexiang Xie, Xuchen Pan, Yaliang Li, Bolin Ding, Jingren Zhou
{"title":"FS-Real: A Real-World Cross-Device Federated Learning Platform","authors":"Dawei Gao, Daoyuan Chen, Zitao Li, Yuexiang Xie, Xuchen Pan, Yaliang Li, Bolin Ding, Jingren Zhou","doi":"10.14778/3611540.3611617","DOIUrl":"https://doi.org/10.14778/3611540.3611617","url":null,"abstract":"Federated learning (FL) is a general distributed machine learning paradigm that provides solutions for tasks where data cannot be shared directly. Due to the difficulties in communication management and heterogeneity of distributed data and devices, initiating and using an FL algorithm for real-world cross-device scenarios requires significant repetitive effort but may not be transferable to similar projects. To reduce the effort required for developing and deploying FL algorithms, we present FS-Real, an open-source FL platform designed to address the need of a general and efficient infrastructure for real-world cross-device FL. In this paper, we introduce the key components of FS-Real and demonstrate that FS-Real has the following capabilities: 1) reducing the programming burden of FL algorithm development with plug-and-play and adaptable runtimes on Android and other Internet of Things (IoT) devices; 2) handling a large number of heterogeneous devices efficiently and robustly with our communication management components; 3) supporting a wide range of advanced FL algorithms with flexible configuration and extension; 4) alleviating the costs and efforts for deployment, evaluation, simulation, and performance optimization of FL algorithms with automatized tool kits.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134996888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
RESCU-SQL: Oblivious Querying for the Zero Trust Cloud sql:零信任云的遗忘查询
3区 计算机科学
Proceedings of the Vldb Endowment Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611627
Xiling Li, Gefei Tan, Xiao Wang, Jennie Rogers, Soamar Homsi
{"title":"RESCU-SQL: Oblivious Querying for the Zero Trust Cloud","authors":"Xiling Li, Gefei Tan, Xiao Wang, Jennie Rogers, Soamar Homsi","doi":"10.14778/3611540.3611627","DOIUrl":"https://doi.org/10.14778/3611540.3611627","url":null,"abstract":"Cloud service providers offer robust infrastructure for rent to organizations of all kinds. High stakes applications, such as the ones in defense and healthcare, are turning to the public cloud for a cost-effective, geographically distributed, always available solution to their hosting needs. Many such users are unwilling or unable to delegate their data to this third-party infrastructure. In this demonstration, we introduce RESCU-SQL, a zero-trust platform for resilient and secure SQL querying outsourced to one or more cloud service providers. RESCU-SQL users can query their DBMS using cloud infrastructure alone without revealing their private records to anyone. It does so by executing the query over secure multiparty computation. We call this system zero trust because it can tolerate any number of malicious servers provided one of them remains honest. Our demo will offer an interactive dashboard with which attendees can observe the performance of RESCU-SQL deployed on several in-cloud nodes for the TPC-H benchmark. Attendees can select a computing party and inject messages from it to explore how quickly it detects and reacts to a malicious party. This is the first SQL system to support all-but-one maliciously secure querying over a semi-honest coordinator for efficiency.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134996889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Odyssey: An Engine Enabling the Time-Series Clustering Journey Odyssey:一个支持时间序列集群旅程的引擎
3区 计算机科学
Proceedings of the Vldb Endowment Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611622
John Paparrizos, Sai Prasanna Teja Reddy
{"title":"Odyssey: An Engine Enabling the Time-Series Clustering Journey","authors":"John Paparrizos, Sai Prasanna Teja Reddy","doi":"10.14778/3611540.3611622","DOIUrl":"https://doi.org/10.14778/3611540.3611622","url":null,"abstract":"Clustering is one of the most popular time-series tasks because it enables unsupervised data exploration and often serves as a subroutine or preprocessing step for other tasks. Despite being the subject of active research across disciplines for decades, only limited efforts focused on benchmarking clustering methods for time series. Unfortunately, these studies have (i) omitted popular methods and entire classes of methods; (ii) considered limited choices for underlying distance measures; (iii) performed evaluations on a small number of datasets; or (iv) avoided rigorous statistical validation of the findings. In addition, the sudden enthusiasm and recent slew of proposed deep learning methods underscore the vital need for a comprehensive study. Motivated by the aforementioned limitations, we present Odyssey, a modular and extensible web engine to comprehensively evaluate 80 time-series clustering methods spanning 9 different classes from the data mining, machine learning, and deep learning literature. Odyssey enables rigorous statistical analysis across 128 diverse time-series datasets. Through its interactive interface, Odyssey (i) reveals the best-performing method per class; (ii) identifies classes performing exceptionally well that were previously omitted; (iii) challenges claims about the use of elastic measures in clustering; (iv) highlights the effects of parameter tuning; and (v) debunks claims of superiority of deep learning methods. Odyssey does not only facilitate the most extensive study ever performed in this area but, importantly, reveals an illusion of progress while, in reality, none of the evaluated methods could outperform a traditional method, namely, k -Shape, with a statistically significant difference. Overall, Odyssey lays the foundations for advancing the state of the art in time-series clustering.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134997923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
mlwhatif: What If You Could Stop Re-Implementing Your Machine Learning Pipeline Analyses over and over? 如果你可以停止一遍又一遍地重新实现你的机器学习管道分析呢?
3区 计算机科学
Proceedings of the Vldb Endowment Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611606
Stefan Grafberger, Shubha Guha, Paul Groth, Sebastian Schelter
{"title":"mlwhatif: What If You Could Stop Re-Implementing Your Machine Learning Pipeline Analyses over and over?","authors":"Stefan Grafberger, Shubha Guha, Paul Groth, Sebastian Schelter","doi":"10.14778/3611540.3611606","DOIUrl":"https://doi.org/10.14778/3611540.3611606","url":null,"abstract":"Software systems that learn from data with machine learning (ML) are used in critical decision-making processes. Unfortunately, real-world experience shows that the pipelines for data preparation, feature encoding and model training in ML systems are often brittle with respect to their input data. As a consequence, data scientists have to run different kinds of data centric what-if analyses to evaluate the robustness and reliability of such pipelines, e.g., with respect to data errors or preprocessing techniques. These what-if analyses follow a common pattern: they take an existing ML pipeline, create a pipeline variant by introducing a small change, and execute this variant to see how the change impacts the pipeline's output score. We recently proposed mlwhatif, a library that enables data scientists to declaratively specify what-if analyses for an ML pipeline, and to automatically generate, optimize and execute the required pipeline variants. We demonstrate how data scientists can leverage mlwhatif for a variety of pipelines and three different what-if analyses focusing on the robustness of a pipeline against data errors, the impact of data cleaning operations, and the impact of data preprocessing operations on fairness. In particular, we demonstrate step-by-step how mlwhatif generates and optimizes the required execution plans for the pipeline analyses. Our library is publicly available at https://github.com/stefan-grafberger/mlwhatif.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"55 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134997924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TsQuality: Measuring Time Series Data Quality in Apache IoTDB 在Apache IoTDB中测量时间序列数据质量
3区 计算机科学
Proceedings of the Vldb Endowment Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611601
Yuanhui Qiu, Chenguang Fang, Shaoxu Song, Xiangdong Huang, Chen Wang, Jianmin Wang
{"title":"TsQuality: Measuring Time Series Data Quality in Apache IoTDB","authors":"Yuanhui Qiu, Chenguang Fang, Shaoxu Song, Xiangdong Huang, Chen Wang, Jianmin Wang","doi":"10.14778/3611540.3611601","DOIUrl":"https://doi.org/10.14778/3611540.3611601","url":null,"abstract":"Time series has been found with various data quality issues, e.g., owing to sensor failure or network transmission errors in the Internet of Things (IoT). It is highly demanded to have an overview of the data quality issues on the millions of time series stored in a database. In this demo, we design and implement TsQuality, a system for measuring the data quality in Apache IoTDB. Four time series data quality measures, completeness, consistency, timeliness, and validity, are implemented as functions in Apache IoTDB or operators in Apache Spark. These data quality measures are also interpreted by navigating dirty points in different granularity. It is also well-integrated with the big data eco-system, connecting to Apache Zeppelin for SQL query, and Apache Superset for an overview of data quality.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134997927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VisualNeo: Bridging the Gap between Visual Query Interfaces and Graph Query Engines VisualNeo:弥合可视化查询接口和图查询引擎之间的差距
3区 计算机科学
Proceedings of the Vldb Endowment Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611608
Kai Huang, Houdong Liang, Chongchong Yao, Xi Zhao, Yue Cui, Yao Tian, Ruiyuan Zhang, Xiaofang Zhou
{"title":"VisualNeo: Bridging the Gap between Visual Query Interfaces and Graph Query Engines","authors":"Kai Huang, Houdong Liang, Chongchong Yao, Xi Zhao, Yue Cui, Yao Tian, Ruiyuan Zhang, Xiaofang Zhou","doi":"10.14778/3611540.3611608","DOIUrl":"https://doi.org/10.14778/3611540.3611608","url":null,"abstract":"Visual Graph Query Interfaces (VQIs) empower non-programmers to query graph data by constructing visual queries intuitively. Devising efficient technologies in Graph Query Engines (GQEs) for interactive search and exploration has also been studied for years. However, these two vibrant scientific fields are traditionally independent of each other, causing a vast barrier for users who wish to explore the full-stack operations of graph querying. In this demonstration, we propose a novel VQI system built upon Neo4j called VisualNeo that facilities an efficient subgraph query in large graph databases. VisualNeo inherits several advanced features from recent advanced VQIs, which include the data-driven gui design and canned pattern generation. Additionally, it embodies a database manager module in order that users can connect to generic Neo4j databases. It performs query processing through the Neo4j driver and provides an aesthetic query result exploration.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134998131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Web Connector: A Unified API Wrapper to Simplify Web Data Collection Web连接器:简化Web数据收集的统一API包装器
3区 计算机科学
Proceedings of the Vldb Endowment Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611616
Weiyuan Wu, Pei Wang, Yi Xie, Yejia Liu, George Chow, Jiannan Wang
{"title":"Web Connector: A Unified API Wrapper to Simplify Web Data Collection","authors":"Weiyuan Wu, Pei Wang, Yi Xie, Yejia Liu, George Chow, Jiannan Wang","doi":"10.14778/3611540.3611616","DOIUrl":"https://doi.org/10.14778/3611540.3611616","url":null,"abstract":"Collecting structured data from Web APIs, such as the Twitter API, Yelp Fusion API, Spotify API, and DBLP API, is a common task in the data science lifecycle, but it requires advanced programming skills for data scientists. To simplify web data collection and lower the barrier to entry, API wrappers have been developed to wrap API calls into easy-to-use functions. However, existing API wrappers are not standardized, which means that users must download and maintain multiple API wrappers and learn how to use each of them, while developers must spend considerable time creating an API wrapper for any new website. In this demo, we present the Web Connector, which unifies API wrappers to overcome these limitations. First, the Web Connector has an easy-to-use program-ming interface, designed to provide a user experience similar to that of reading data from relational databases. Second, the Web Connector's novel system architecture requires minimal effort to fetch data for end-users with an existing API description file. Third, the Web Connector includes a semi-automatic API description file generator that leverages the concept of generation by example to create new API wrappers without writing code.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134998297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Private Information Retrieval in Large Scale Public Data Repositories 大规模公共数据库中的私有信息检索
3区 计算机科学
Proceedings of the Vldb Endowment Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611572
Ishtiyaque Ahmad, Divyakant Agrawal, Amr El Abbadi, Trinabh Gupta
{"title":"Private Information Retrieval in Large Scale Public Data Repositories","authors":"Ishtiyaque Ahmad, Divyakant Agrawal, Amr El Abbadi, Trinabh Gupta","doi":"10.14778/3611540.3611572","DOIUrl":"https://doi.org/10.14778/3611540.3611572","url":null,"abstract":"The tutorial focuses on Private Information Retrieval (PIR), which allows clients to privately query public or server-owned databases without disclosing their queries. The tutorial covers the basic concepts of PIR such as its types, construction, and critical building blocks, including homomorphic encryption. It also discusses the performance of PIR, existing optimizations for scalability, real-life applications of PIR, and ways to extend its functionalities.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135003656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Visualizing Spreadsheet Formula Graphs Compactly 可视化电子表格公式图形紧凑
3区 计算机科学
Proceedings of the Vldb Endowment Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611613
Fanchao Chen, Dixin Tang, Haotian Li, Aditya G. Parameswaran
{"title":"Visualizing Spreadsheet Formula Graphs Compactly","authors":"Fanchao Chen, Dixin Tang, Haotian Li, Aditya G. Parameswaran","doi":"10.14778/3611540.3611613","DOIUrl":"https://doi.org/10.14778/3611540.3611613","url":null,"abstract":"Spreadsheets are a ubiquitous data analysis tool, empowering non-programmers and programmers alike to easily express their computations by writing formulae alongside data. The dependencies created by formulae are tracked as formula graphs, which play a central role in many spreadsheet applications and are critical to the interactivity and usability of spreadsheet systems. Unfortunately, as formula graphs become large and complex, it becomes harder for end-users to make sense of formula graphs and trace the dependents or precedents of cells to check the accuracy of individual formulae and identify sources of errors. In this paper, we demonstrate a spreadsheet formula graph visualization tool, TACO-Lens, developed as a plugin for Microsoft Excel. Our plugin leverages TACO, our framework for compactly and efficiently representing formula graphs. TACO compresses formula graphs using a key spreadsheet property: tabular locality, which means that cells close to each other are likely to have similar formula structures. This compact representation enables end-users to more easily consume complex dependencies and reduces the response time for tracing dependents and precedents. TACO-Lens, our visualization plugin, depicts the compact representation of TACO and supports users in visually tracing dependents and precedents. In this demonstration, attendees can compare the visualizations of different formula graphs using TACO, Excel's built-in dependency tracing tool, and an approach that does not compress formula graphs, and quantitatively compare the different response time of different approaches.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135003658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信