2023 IEEE 39th International Conference on Data Engineering (ICDE)最新文献_第7页

A Lightweight Framework for Fast Trajectory Simplification 快速轨迹简化的轻量级框架

2023 IEEE 39th International Conference on Data Engineering (ICDE) Pub Date : 2023-04-01 DOI: 10.1109/ICDE55515.2023.00184

Ziquan Fang, Changhao He, Lu Chen, Danlei Hu, Qichen Sun, Linsen Li, Yunjun Gao

{"title":"A Lightweight Framework for Fast Trajectory Simplification","authors":"Ziquan Fang, Changhao He, Lu Chen, Danlei Hu, Qichen Sun, Linsen Li, Yunjun Gao","doi":"10.1109/ICDE55515.2023.00184","DOIUrl":"https://doi.org/10.1109/ICDE55515.2023.00184","url":null,"abstract":"The ubiquitous GPS sensors collect massive trajectory data from moving objects, which is useful in data mining applications. However, trajectory data is enormous in volume, and thus, directly storing and processing the raw data is expensive. Using trajectory simplification, a trajectory can be reduced to a set of continuous line segments with acceptable data loss, which is an efficient method. Although many algorithms are proposed, they still suffer from the following issues including (i) non-data driven capability as most studies rely on human-crafted rules or pre-defined parameters, (ii) bound with error measures that yield high computational cost, and (iii) focusing only on the local information preservation in trajectories, but failing in capturing the global mobility patterns for trajectory compression.To address the above issues, we propose a Seq2Seq2Seq framework, abbreviated S3, which consists of two chained Seq2Seq. With differentiable reconstruction learning, S3 enables self-supervised trajectory simplification in a lightweight manner. Besides, we deploy S3 over the graph neural architecture to capture the context-aware mobility patterns and enhance the representation paradigm of trajectories with geographical semantics, where a context-aware distance measure is designed for quality evaluation. An online extension of S3 is also developed to enable streaming trajectory simplifications. Finally, extensive experiments using two real-world datasets in both offline and online scenarios show that S3 achieves much higher efficiency (e.g., it achieves up to one order of magnitude speed-up gains) and comparable compression quality, compared with both non-learning and state-of-the-art learning-based methods.","PeriodicalId":434744,"journal":{"name":"2023 IEEE 39th International Conference on Data Engineering (ICDE)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123640653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

BERT-Trip: Effective and Scalable Trip Representation using Attentive Contrast Learning BERT-Trip:使用注意对比学习的有效和可扩展的旅行表征

2023 IEEE 39th International Conference on Data Engineering (ICDE) Pub Date : 2023-04-01 DOI: 10.1109/ICDE55515.2023.00053

Ai-Te Kuo, Haiquan Chen, Wei-Shinn Ku

{"title":"BERT-Trip: Effective and Scalable Trip Representation using Attentive Contrast Learning","authors":"Ai-Te Kuo, Haiquan Chen, Wei-Shinn Ku","doi":"10.1109/ICDE55515.2023.00053","DOIUrl":"https://doi.org/10.1109/ICDE55515.2023.00053","url":null,"abstract":"Trip recommendation has drawn considerable attention over the past decade. In trip recommendation, a sequence of point-of-interests (POIs) are recommended for a given query which includes an origin and a destination. Recently the emergence of the attention mechanism and many attention-incorporated models have achieved great success in various fields. Trip recommendation problems demonstrate similar characteristics that can potentially benefit from the attention mechanism. However, applying the attention mechanism for trip recommendation is non-trivial. We are motivated to answer the following two research questions. (1) How can we learn trip representation effectively without labels? Unlike most of the natural language processing tasks, there are no ground-truth labels available for trip recommendation. (2) How can we learn trip representation effectively without handcrafting negative samples? In this paper, we cast the trip representation learning into a natural language processing (NLP) task. We propose BERT-Trip, a self-supervised contrast learning framework, to learn effective and scalable trip representation in support of time-sensitive and user-personalized trip recommendation. BERT-Trip builds on a Siamese network to maximize the similarity between the augmentations of trips with BERT as the backbone encoder. We utilize the masking strategy for generating augmented views (positive sample pairs) of trips in the Siamese network and employ the stop-gradient on one side of the Siamese network to eliminate the need to use any negative sample pairs or momentum encoders. Extensive experiments on real-world datasets demonstrate that BERT-Trip consistently outperformed the state-of-the-art methods in terms of all effectiveness metrics. Compared with the state-of-the-art methods, BERT-Trip is able to yield up to 24 percent and 40 percent increases in F1 score on the Flickr and the Weeplaces datasets, respectively. A rigorous performance evaluation of BERT-Trip on scalability up to 12800 POIs is also provided.","PeriodicalId":434744,"journal":{"name":"2023 IEEE 39th International Conference on Data Engineering (ICDE)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122076944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Visualization Recommendation Through Visual Relation Learning and Visual Preference Learning 通过视觉关系学习和视觉偏好学习的可视化推荐

2023 IEEE 39th International Conference on Data Engineering (ICDE) Pub Date : 2023-04-01 DOI: 10.1109/ICDE55515.2023.00145

Daomin Ji, Hui Luo, Z. Bao

{"title":"Visualization Recommendation Through Visual Relation Learning and Visual Preference Learning","authors":"Daomin Ji, Hui Luo, Z. Bao","doi":"10.1109/ICDE55515.2023.00145","DOIUrl":"https://doi.org/10.1109/ICDE55515.2023.00145","url":null,"abstract":"Visualization recommendation (VisRec) is to automatically generate the most relevant visualization for a table of interest to a user. In this paper, we present a novel machine learning-based VisRec method, VisFormer, which solves VisRec in three stages: 1) Table representation learning, which is to learn accurate column-level representations for a table. To achieve it, we resort to Transformer, a powerful language model that can learn accurate word embeddings by modeling context. Specifically, we propose a hierarchical Transformer-based architecture to learn expressive column representations by capturing two types of context, intra-column context and cross-column context; 2) Visual Relation Learning, which is to capture column relations. To achieve it, we regard each visualization as a relation tuple with a special relation, visual relation, between the columns. Then for each visual relation, we use a neural network to evaluate the corresponding visualizations; 3) Visual Preference Learning, which is to extract visual preference features that can affect users’ decision from a visualization. To achieve so, we use a Convolution Neural Network to extract such features and explore how to use them to refine the recommendation results. We conduct experiments to compare with three state-of-the-art ML-based methods on a large real-world dataset, Plotly community feed. The experimental results show that compared with the most competitive baseline, the relative improvements of VisFormer on Recall@1, Recall@2, and Recall@3 are 8.8%, 20.6%, and 21.0%, respectively.","PeriodicalId":434744,"journal":{"name":"2023 IEEE 39th International Conference on Data Engineering (ICDE)","volume":"174 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122099699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient Deep Ensemble Inference via Query Difficulty-dependent Task Scheduling 基于查询难度相关任务调度的高效深度集成推理

2023 IEEE 39th International Conference on Data Engineering (ICDE) Pub Date : 2023-04-01 DOI: 10.1109/ICDE55515.2023.00082

Zichong Li, Lan Zhang, Mu Yuan, Miao-Hui Song, Qianjun Song

引用次数: 0

Delivery Time Prediction Using Large-Scale Graph Structure Learning Based on Quantile Regression 基于分位数回归的大规模图结构学习交付时间预测

2023 IEEE 39th International Conference on Data Engineering (ICDE) Pub Date : 2023-04-01 DOI: 10.1109/ICDE55515.2023.00261

L. Zhang, Xin Zhou, Zhiwei Zeng, Yiming Cao, Yonghui Xu, Mingliang Wang, Xin Wu, Yong Liu, Li-zhen Cui, Zhiqi Shen

{"title":"Delivery Time Prediction Using Large-Scale Graph Structure Learning Based on Quantile Regression","authors":"L. Zhang, Xin Zhou, Zhiwei Zeng, Yiming Cao, Yonghui Xu, Mingliang Wang, Xin Wu, Yong Liu, Li-zhen Cui, Zhiqi Shen","doi":"10.1109/ICDE55515.2023.00261","DOIUrl":"https://doi.org/10.1109/ICDE55515.2023.00261","url":null,"abstract":"Predicting Estimated Time of Arrival (ETA) for packages is a critical problem in e-commerce. The prediction is often made based on spatial (sending and receiving addresses), temporal (payment time), and context (merchants) attributes. Existing methods usually formalize this task as an Origin-Destination (OD) ETA prediction problem and exploit the attribute relations with graph learning. However, most existing methods make use of fixed and manually defined graph structures, which are often not optimal for downstream ETA task and hence lead to unsatisfactory prediction results. In addition, current ETA models tend to focus on prediction accuracy without considering fulfillment rate. This may lead to a low fulfillment rate in practice, i.e., actual delivery time is much longer than estimations provided by models, which consequently exacerbates the frustrating experiences for users. To address these issues, we propose a novel Graph Structure Learning-based Quantile Regression (GSL-QR) model for e-commerce ETA prediction in this paper. Specifically, we utilize graph structure learning to dynamically update the spatial and temporal relation graphs of orders and learn optimal graph structures and graph embeddings guided by downstream ETA prediction task. To guarantee both prediction accuracy and order fulfillment rate, we design a multi-objective quantile regression in GSL-QR that can find the Pareto solution of the problem. In order to extend GSL to large-scale real-world graphs, we devise a Fast Sampling-based Graph Structure Learning (FS-GSL) method, which can significantly reduce the computational complexity of graph structure learning. Finally, we conduct comprehensive experiments on three industrial datasets collected from Alibaba e-commerce platform. The results demonstrate that the proposed model can significantly outperform baselines on both ETA prediction accuracy and order fulfillment rate.","PeriodicalId":434744,"journal":{"name":"2023 IEEE 39th International Conference on Data Engineering (ICDE)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122613095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

A Holistic Approach for Answering Logical Queries on Knowledge Graphs 知识图逻辑查询的整体回答方法

2023 IEEE 39th International Conference on Data Engineering (ICDE) Pub Date : 2023-04-01 DOI: 10.1109/ICDE55515.2023.00181

Yuhan Wu, Yuanyuan Xu, Xuemin Lin, W. Zhang

{"title":"A Holistic Approach for Answering Logical Queries on Knowledge Graphs","authors":"Yuhan Wu, Yuanyuan Xu, Xuemin Lin, W. Zhang","doi":"10.1109/ICDE55515.2023.00181","DOIUrl":"https://doi.org/10.1109/ICDE55515.2023.00181","url":null,"abstract":"Logical queries on Knowledge Graphs (KGs) is a fundamental sub-task of knowledge graph reasoning. A promising paradigm for answering logical queries, recently, has been proposed based on versatile deep learning techniques. In this line, the query is first broken down into a series of first-order logical predicates, and then both the query and knowledge graph entities are jointly encoded in the same embedding space. Some approaches are able to support the full range of traditional First-Order Logic (FOL) operations for complex queries in real-world scenarios, while others have attempted to create a new combination of FOL operations by replacing the negation operation with the difference operation due to the poor performance of the negation operation. Our empirical observations show that the difference operator is more effective for multi-hop reasoning, while the negation operator is better suited for use as the final operation in the query, particularly in single-hop settings. In addition, other fundamental limitations such as linear transformation assumption for negation operator and the fixed-lossy problem for difference operator further degrade the performance of these methods. In light of these, we propose the HaLk, a holistic approach for answering logical queries that, to our knowledge, is the first to support a full set of logical operators in a unified end-to-end framework. In this approach, we propose specific neural models for each operator by considering their own intrinsic properties, based on which HaLk effectively mitigates the cascading error of projection and negation operators as well as delicately provides closed-formed solutions for difference operator. Extensive experimental results on three datasets demonstrate that HaLk outperforms all competitors and achieves up to 32% improvement in accuracy.","PeriodicalId":434744,"journal":{"name":"2023 IEEE 39th International Conference on Data Engineering (ICDE)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122754896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Reinforcement Learning based Tree Decomposition for Distance Querying in Road Networks 基于强化学习的树分解道路网络距离查询

2023 IEEE 39th International Conference on Data Engineering (ICDE) Pub Date : 2023-04-01 DOI: 10.1109/ICDE55515.2023.00132

Bolong Zheng, Yong Ma, J. Wan, Yongyong Gao, Kai Huang, Xiaofang Zhou, Christian S. Jensen

{"title":"Reinforcement Learning based Tree Decomposition for Distance Querying in Road Networks","authors":"Bolong Zheng, Yong Ma, J. Wan, Yongyong Gao, Kai Huang, Xiaofang Zhou, Christian S. Jensen","doi":"10.1109/ICDE55515.2023.00132","DOIUrl":"https://doi.org/10.1109/ICDE55515.2023.00132","url":null,"abstract":"Computing the shortest path distance between two vertices in a road network is a building block in numerous applications. To do so efficiently, the state-of-the-art proposals adopt a tree decomposition process with heuristic strategies to build 2-hop label indexes. However, these indexes suffer from large space overheads caused by either tree imbalance or a large tree height. Independently of this, reinforcement learning has recently show impressive performance at sequential decision making in spatial data management tasks. We observe that tree decomposition is naturally a sequential decision making problem that decides which vertex to process at each step. In this paper, we propose a reinforcement learning based tree decomposition (RLTD) approach that reduces the space overhead significantly. We model tree decomposition as a Markov Decision Process, exploiting features of both the network topological structure and the tree structure. We further optimize the tree decomposition process by taking the network density into account, which yields a great generalization of the model on large road networks. Extensive experiments with real-world data offer insights into the performance of the proposals, showing that they are able to reduce the space overhead by about 51% and achieve on average about 14% speedup for queries with almost the same preprocessing time when compared with the state-of-the-art proposals.","PeriodicalId":434744,"journal":{"name":"2023 IEEE 39th International Conference on Data Engineering (ICDE)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123833247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

2023 IEEE 39th International Conference on Data Engineering (ICDE) Pub Date : 2023-04-01 DOI: 10.1109/ICDE55515.2023.00255

Tony Wong, Smriti Thakkar, Kao-Feng Hsieh, Zachary Tom, Hetaben Saraiya, Philip Shilane

{"title":"Dataset Similarity Detection for Global Deduplication in the DD File System","authors":"Tony Wong, Smriti Thakkar, Kao-Feng Hsieh, Zachary Tom, Hetaben Saraiya, Philip Shilane","doi":"10.1109/ICDE55515.2023.00255","DOIUrl":"https://doi.org/10.1109/ICDE55515.2023.00255","url":null,"abstract":"Deduplication has become a widely used technique to reduce space requirements for storage systems by replacing redundant chunks of data with references. While storage systems continue to grow in size, there remain practical limits to the size of any deduplication node, and enterprise businesses may have dozens to hundreds of nodes. It is important to place datasets on nodes in a multi-node environment to take advantage of deduplication savings globally. For customers of the DD File System (DDFS)1, we provide the Global Deduplication Service that advises customers on data placement to maximize deduplication-related space savings. This paper describes our currently shipping approach that uses a Fingerprint Dictionary to intelligently cluster customer data and generate a plan to relocate datasets to improve global deduplication. We report results from thousands of deployed systems at customer sites. We have also developed a further improvement using MinHashes that lowers resource requirements, and we provide proofs of the similarity estimates. Our results on a real-world dataset show that MinHashes improve the clustering speed up to 400X relative to our previous method and reduce memory consumption up to 260X.","PeriodicalId":434744,"journal":{"name":"2023 IEEE 39th International Conference on Data Engineering (ICDE)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124166428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SVQ-ACT: Querying for Actions over Videos SVQ-ACT:通过视频查询动作

2023 IEEE 39th International Conference on Data Engineering (ICDE) Pub Date : 2023-04-01 DOI: 10.1109/ICDE55515.2023.00277

Daren Chao, Kaiwen Chen, Nick Koudas

{"title":"SVQ-ACT: Querying for Actions over Videos","authors":"Daren Chao, Kaiwen Chen, Nick Koudas","doi":"10.1109/ICDE55515.2023.00277","DOIUrl":"https://doi.org/10.1109/ICDE55515.2023.00277","url":null,"abstract":"We present SVQ-ACT, a system capable of evaluating declarative action and object queries over input videos. Our approach is independent of the underlying object and action detection models utilized. Users may issue queries involving action and specific objects (e.g., a human riding a bicycle, close to a traffic light and a car left of the bicycle) and identify video clips that satisfy query constraints. Our system is capable of operating in two main settings, namely online and offline. In the online setting, the user specifies a video source (e.g., a surveillance video) and a declarative query containing an action and object predicates. Our system will identify and label in real-time all frame sequences that match the query. In the offline mode, the system accepts a video repository as input, preprocesses all the video in an offline manner and extracts suitable metadata. Following this step, users can execute any query they wish interactively on the video repository (containing actions and objects supported by the underlying detection models) to identify sequences of frames from videos that satisfy the query. In this case, to limit the number of results produced, we introduce novel result ranking algorithms that can produce the k most relevant results efficiently.We demonstrate that SVQ-ACT can correctly capture the desired query semantics and execute queries efficiently and correctly, delivering a high degree of accuracy.","PeriodicalId":434744,"journal":{"name":"2023 IEEE 39th International Conference on Data Engineering (ICDE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129242441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Deep Multi-View Framework for Anomaly Detection on Attributed Networks (Extended Abstract) 一种用于属性网络异常检测的深度多视图框架(扩展摘要)

2023 IEEE 39th International Conference on Data Engineering (ICDE) Pub Date : 2023-04-01 DOI: 10.1109/ICDE55515.2023.00326

Zhen Peng, Minnan Luo, Jundong Li, Luguo Xue, Qinghua Zheng

{"title":"A Deep Multi-View Framework for Anomaly Detection on Attributed Networks (Extended Abstract)","authors":"Zhen Peng, Minnan Luo, Jundong Li, Luguo Xue, Qinghua Zheng","doi":"10.1109/ICDE55515.2023.00326","DOIUrl":"https://doi.org/10.1109/ICDE55515.2023.00326","url":null,"abstract":"Many existing anomaly detection methods on attributed networks do not seriously tackle the inherent multi-view property in attribute space but concatenate multiple views into a single feature vector, which inevitably ignores the incompatibility between heterogeneous views caused by their own statistical properties. In practice, the distinct but complementary information brought by multi-view data promises the potential for more effective anomaly detection than the efforts only based on single-view data. Furthermore, abnormal patterns naturally behave diversely in different views, which coincides with people’s desire to discover specific abnormalities according to their preferences for views (attributes). Most existing methods cannot adapt to people’s requirements as they fail to consider the idiosyncrasy of user preferences. Thus, in this paper, we propose a multi-view framework ALARM to incorporate user preferences into anomaly detection and simultaneously tackle heterogeneous attribute characteristics through multiple graph encoders and a well-designed aggregator that supports self-learning and user-guided learning. Experiments on synthetic and real-world datasets corroborate the desirable performance of ALARM and its effectiveness in supporting user-oriented anomaly detection.","PeriodicalId":434744,"journal":{"name":"2023 IEEE 39th International Conference on Data Engineering (ICDE)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129666511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0