Proceedings of the Vldb Endowment最新文献_第9页

Lynx: A Graph Query Framework for Multiple Heterogeneous Data Sources Lynx:面向多个异构数据源的图形查询框架

3区计算机科学

Proceedings of the Vldb Endowment Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611587

Zhihong Shen, Chuan Hu, Zihao Zhao

引用次数: 0

MINT: Detecting Fraudulent Behaviors from Time-Series Relational Data MINT:从时间序列关系数据中检测欺诈行为

3区计算机科学

Proceedings of the Vldb Endowment Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611551

Fei Xiao, Yuncheng Wu, Meihui Zhang, Gang Chen, Beng Chin Ooi

{"title":"MINT: Detecting Fraudulent Behaviors from Time-Series Relational Data","authors":"Fei Xiao, Yuncheng Wu, Meihui Zhang, Gang Chen, Beng Chin Ooi","doi":"10.14778/3611540.3611551","DOIUrl":"https://doi.org/10.14778/3611540.3611551","url":null,"abstract":"The e-commerce platforms, such as Shopee, have accumulated a huge volume of time-series relational data, which contains useful information on differentiating fraud users from benign users. Existing fraud behavior detection approaches typically model the time-series data with a vanilla Recurrent Neural Network (RNN) or combine the whole sequence as a single intention without considering the temporal behavioral patterns, row-level interactions, and different view intentions. In this paper, we present MINT, a M ultiview row- IN teractive T ime-aware framework to detect fraudulent behaviors from time-series structured data. The key idea of MINT is to build a time-aware behavior graph for each user's time-series relational data with each row represented as an action node. We utilize the user's temporal information to construct three different graph convolutional matrices for hierarchically learning the user's intentions from different views, that is, short-term, medium-term, and long-term intentions. To capture more meaningful row-level interactions and alleviate the over-smoothing issue in a vanilla time-aware behavior graph, we propose a novel gated neighbor interaction mechanism to calibrate the aggregated information by each action node. Since the receptive fields of the three graph convolutional layers are designed to grow nearly exponentially, our MINT requires many fewer layers than traditional deep graph neural networks (GNNs) to capture multi-hop neighboring information, and avoids recurrent feedforward propagation, thus leading to higher training efficiency and scalability. Our extensive experiments on the large-scale e-commerce datasets from Shopee with up to 4.6 billion records and a public dataset from Amazon show that MINT achieves superior performance over 10 state-of-the-art models and provides better interpretability and scalability.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134998306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Interpretable Clustering of Multivariate Time Series with Time2Feat 基于Time2Feat的多元时间序列可解释聚类

3区计算机科学

Proceedings of the Vldb Endowment Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611604

Angela Bonifati, Francesco Del Buono, Francesco Guerra, Miki Lombardi, Donato Tiano

引用次数: 0

DuckPGQ: Bringing SQL/PGQ to DuckDB DuckPGQ:将SQL/PGQ引入DuckDB

3区计算机科学

Proceedings of the Vldb Endowment Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611614

Daniel ten Wolde, Gábor Szárnyas, Peter Boncz

引用次数: 0

ScalarDB: Universal Transaction Manager for Polystores 用于polystore的通用事务管理器

3区计算机科学

Proceedings of the Vldb Endowment Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611563

Hiroyuki Yamada, Toshihiro Suzuki, Yuji Ito, Jun Nemoto

引用次数: 1

Lindorm TSDB: A Cloud-Native Time-Series Database for Large-Scale Monitoring Systems Lindorm TSDB:用于大规模监控系统的云原生时间序列数据库

3区计算机科学

Proceedings of the Vldb Endowment Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611559

Chunhui Shen, Qianyu Ouyang, Feibo Li, Zhipeng Liu, Longcheng Zhu, Yujie Zou, Qing Su, Tianhuan Yu, Yi Yi, Jianhong Hu, Cen Zheng, Bo Wen, Hanbang Zheng, Lunfan Xu, Sicheng Pan, Bin Wu, Xiao He, Ye Li, Jian Tan, Sheng Wang, Dan Pei, Wei Zhang, Feifei Li

{"title":"Lindorm TSDB: A Cloud-Native Time-Series Database for Large-Scale Monitoring Systems","authors":"Chunhui Shen, Qianyu Ouyang, Feibo Li, Zhipeng Liu, Longcheng Zhu, Yujie Zou, Qing Su, Tianhuan Yu, Yi Yi, Jianhong Hu, Cen Zheng, Bo Wen, Hanbang Zheng, Lunfan Xu, Sicheng Pan, Bin Wu, Xiao He, Ye Li, Jian Tan, Sheng Wang, Dan Pei, Wei Zhang, Feifei Li","doi":"10.14778/3611540.3611559","DOIUrl":"https://doi.org/10.14778/3611540.3611559","url":null,"abstract":"Internet services supported by large-scale distributed systems have become essential for our daily life. To ensure the stability and high quality of services, diverse metric data are constantly collected and managed in a time-series database to monitor the service status. However, when the number of metrics becomes massive, existing time-series databases are inefficient in handling high-rate data ingestion and queries hitting multiple metrics. Besides, they all lack the support of machine learning functions, which are crucial for sophisticated analysis of large-scale time series. In this paper, we present Lindorm TSDB, a distributed time-series database designed for handling monitoring metrics at scale. It sustains high write throughput and low query latency with massive active metrics. It also allows users to analyze data with anomaly detection and time series forecasting algorithms directly through SQL. Furthermore, Lindorm TSDB retains stable performance even during node scaling. We evaluate Lindorm TSDB under different data scales, and the results show that it outperforms two popular open-source time-series databases on both writing and query, while executing time-series machine learning tasks efficiently.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135003303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PolarDB-SCC: A Cloud-Native Database Ensuring Low Latency for Strongly Consistent Reads PolarDB-SCC:一个云原生数据库，确保低延迟的强一致读取

3区计算机科学

Proceedings of the Vldb Endowment Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611562

Xinjun Yang, Yingqiang Zhang, Hao Chen, Chuan Sun, Feifei Li, Wenchao Zhou

{"title":"PolarDB-SCC: A Cloud-Native Database Ensuring Low Latency for Strongly Consistent Reads","authors":"Xinjun Yang, Yingqiang Zhang, Hao Chen, Chuan Sun, Feifei Li, Wenchao Zhou","doi":"10.14778/3611540.3611562","DOIUrl":"https://doi.org/10.14778/3611540.3611562","url":null,"abstract":"A classic design of cloud-native databases adopts an architecture that consists of one read/write (RW) node and one or more read-only (RO) nodes. In such a design, the propagation of write-ahead logs (WALs) from the RW node to the RO node(s) is typically performed asynchronously. Consequently, system designers either have to accept a loose consistency guarantee, where a read from the RO node may return stale data, or tolerate significant performance degradation in terms of read latency, as it then needs to wait for the log to be propagated and applied. Most commercial cloud-native databases, such as Amazon Aurora, choose performance over strong consistency. As a result, it makes RO nodes useless for many applications requiring read-after-write consistency (a form of strong consistency), and the support for serverless databases (i.e., allowing the RO nodes to be scaled out automatically) is impossible as they require a single endpoint. This paper proposes PolarDB-SCC (PolarDB-Strongly Consistent Cluster), a cloud-native database architecture that guarantees strongly consistent reads with very low latency. The core idea is to eliminate unnecessary waits and reduce the necessary wait time on RO nodes while still supporting strong consistency. To achieve this, it tracks the RW node's modification timestamp at three progressively finer-grained levels. We further design a Linear Lamport timestamp to reduce the RO node's timestamp fetching operations and leverage the RDMA network for all the data transferring ( e.g. , timestamp fetching and log shipment) to minimize network overhead and extra CPU usage. Our evaluation shows that PolarDB-SCC does not incur any noticeable overhead for ensuring strongly consistent reads compared with the eventually consistent (stale) read policy. To the best of our knowledge, PolarDB-SCC is the first \"read-write splitting\" cloud-native database that supports strongly consistent read with negligible overhead. Compared with a straightforward read-wait design, PolarDB-SCC improves throughput by up to 4.51× and reduces median latency by up to 3.66× in SysBench's read-write workload. PolarDB-SCC is already commercially available at Alibaba Cloud.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135003304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

OceanBase Paetica: A Hybrid Shared-Nothing/Shared-Everything Database for Supporting Single Machine and Distributed Cluster OceanBase Paetica:支持单机和分布式集群的无共享/万物共享混合数据库

3区计算机科学

Proceedings of the Vldb Endowment Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611560

Zhifeng Yang, Quanqing Xu, Shanyan Gao, Chuanhui Yang, Guoping Wang, Yuzhong Zhao, Fanyu Kong, Hao Liu, Wanhong Wang, Jinliang Xiao

{"title":"OceanBase Paetica: A Hybrid Shared-Nothing/Shared-Everything Database for Supporting Single Machine and Distributed Cluster","authors":"Zhifeng Yang, Quanqing Xu, Shanyan Gao, Chuanhui Yang, Guoping Wang, Yuzhong Zhao, Fanyu Kong, Hao Liu, Wanhong Wang, Jinliang Xiao","doi":"10.14778/3611540.3611560","DOIUrl":"https://doi.org/10.14778/3611540.3611560","url":null,"abstract":"In the ongoing evolution of the OceanBase database system, it is essential to enhance its adaptability to small-scale enterprises. The OceanBase database system has demonstrated its stability and effectiveness within the Ant Group and other commercial organizations, besides through the TPC-C and TPC-H tests. In this paper, we have designed a stand-alone and distributed integrated architecture named Paetica to address the overhead caused by the distributed components in the stand-alone mode, with respect to the OceanBase system. Paetica enables adaptive configuration of the database that allows OceanBase to support both serial and parallel executions in stand-alone and distributed scenarios, thus providing efficiency and economy. This design has been implemented in version 4.0 of the OceanBase system, and the experiments show that Paetica exhibits notable scalability and outperforms alternative stand-alone or distributed databases. Furthermore, it enables the transition of OceanBase from primarily serving large enterprises to truly catering to small and medium enterprises, by employing a single OceanBase database for the successive stages of enterprise or business development, without the requirement for migration. Our experiments confirm that Paetica has achieved linear scalability with the increasing CPU core number within the stand-alone mode. It also outperforms MySQL and Greenplum in the Sysbench and TPC-H evaluations.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135003654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The Story of AWS Glue AWS Glue的故事

3区计算机科学

Proceedings of the Vldb Endowment Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611547

Mohit Saxena, Benjamin Sowell, Daiyan Alamgir, Nitin Bahadur, Bijay Bisht, Santosh Chandrachood, Chitti Keswani, G. Krishnamoorthy, Austin Lee, Bohou Li, Zach Mitchell, Vaibhav Porwal, Maheedhar Reddy Chappidi, Brian Ross, Noritaka Sekiyama, Omer Zaki, Linchi Zhang, Mehul A. Shah

{"title":"The Story of AWS Glue","authors":"Mohit Saxena, Benjamin Sowell, Daiyan Alamgir, Nitin Bahadur, Bijay Bisht, Santosh Chandrachood, Chitti Keswani, G. Krishnamoorthy, Austin Lee, Bohou Li, Zach Mitchell, Vaibhav Porwal, Maheedhar Reddy Chappidi, Brian Ross, Noritaka Sekiyama, Omer Zaki, Linchi Zhang, Mehul A. Shah","doi":"10.14778/3611540.3611547","DOIUrl":"https://doi.org/10.14778/3611540.3611547","url":null,"abstract":"AWS Glue is Amazon's serverless data integration cloud service that makes it simple and cost effective to extract, clean, enrich, load, and organize data. Originally launched in August 2017, AWS Glue began as an extract-transform-load (ETL) service designed to relieve developers and data engineers of the undifferentiated heavy lifting needed to load databases, data warehouses, and build data lakes on Amazon S3. Since then, it has evolved to serve a larger audience including ETL specialists and data scientists, and includes a broader suite of data integration capabilities. Today, hundreds of thousands of customers use AWS Glue every month. In this paper, we describe the use cases and challenges cloud customers face in preparing data for analytics and the tenets we chose to drive Glue's design. We chose early on to focus on ease-of-use, scale, and extensibility. At its core, Glue offers serverless Apache Spark and Python engines backed by a purpose-built resource manager for fast startup and auto-scaling. In Spark, it offers a new data structure --- DynamicFrames --- for manipulating messy schema-free semi-structured data such as event logs, a variety of transformations and tooling to simplify data preparation, and a new shuffle plugin to offload to cloud storage. It also includes a Hivemetastore compatible Data Catalog with Glue crawlers to build and manage metadata, e.g. for data lakes on Amazon S3. Finally, Glue Studio is its visual interface for authoring Spark and Python-based ETL jobs. We describe the innovations that differentiate AWS Glue and drive its popularity and how it has evolved over the years.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134996886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CDSBen: Benchmarking the Performance of Storage Services in Cloud-Native Database System at ByteDance CDSBen:在ByteDance上对云原生数据库系统中的存储服务性能进行基准测试

3区计算机科学

Proceedings of the Vldb Endowment Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611549

Jiashu Zhang, Wen Jiang, Bo Tang, Haoxiang Ma, Lixun Cao, Zhongbin Jiang, Yuanyuan Nie, Fan Wang, Lei Zhang, Yuming Liang

引用次数: 0