2020 IEEE 36th International Conference on Data Engineering (ICDE)最新文献_第2页

Automatic Calibration of Road Intersection Topology using Trajectories 基于轨迹的道路交叉口拓扑自动标定

2020 IEEE 36th International Conference on Data Engineering (ICDE) Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00145

Lisheng Zhao, Jiali Mao, Min Pu, Guoping Liu, Cheqing Jin, Weining Qian, Aoying Zhou, Xiang Wen, Runbo Hu, Hua Chai

{"title":"Automatic Calibration of Road Intersection Topology using Trajectories","authors":"Lisheng Zhao, Jiali Mao, Min Pu, Guoping Liu, Cheqing Jin, Weining Qian, Aoying Zhou, Xiang Wen, Runbo Hu, Hua Chai","doi":"10.1109/ICDE48307.2020.00145","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00145","url":null,"abstract":"The inaccuracy of road intersection in digital road map easily brings serious effects on the mobile navigation and other applications. Massive traveling trajectories of thousands of vehicles enable frequent updating of road intersection topology. In this paper, we first expand the road intersection detection issue into a topology calibration problem for road intersection influence zone. Distinct from the existing road intersection update methods, we not only determine the location and coverage of road intersection, but figure out incorrect or missing turning paths within whole influence zone based on unmatched trajectories as compared to the existing map. The important challenges of calibration issue include that trajectories are mixing with exceptional data, and road intersections are of different sizes and shapes, etc. To address above challenges, we propose a three-phase calibration framework, called CITT. It is composed of trajectory quality improving, core zone detection, and topology calibration within road intersection influence zone. From such components it can automatically obtain high quality topology of road intersection influence zone. Extensive experiments compared with the state-of-the-art methods using trajectory data obtained from Didi Chuxing and Chicago campus shuttles demonstrate that CITT method has strong stability and robustness and significantly outperforms the existing methods.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"80 1","pages":"1633-1644"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72869131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Fela: Incorporating Flexible Parallelism and Elastic Tuning to Accelerate Large-Scale DML 结合灵活并行性和弹性调优加速大规模DML

2020 IEEE 36th International Conference on Data Engineering (ICDE) Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00124

Jinkun Geng, Dan Li, Shuai Wang

{"title":"Fela: Incorporating Flexible Parallelism and Elastic Tuning to Accelerate Large-Scale DML","authors":"Jinkun Geng, Dan Li, Shuai Wang","doi":"10.1109/ICDE48307.2020.00124","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00124","url":null,"abstract":"Distributed machine learning (DML) has become the common practice in industry, because of the explosive volume of training data and the growing complexity of training model. Traditional DML follows data parallelism but causes significant communication cost, due to the huge amount of parameter transmission. The recently emerging model-parallel solutions can reduce the communication workload, but leads to load imbalance and serious straggler problems. More importantly, the existing solutions, either data-parallel or model-parallel, ignore the nature of flexible parallelism for most DML tasks, thus failing to fully exploit the GPU computation power. Targeting at these existing drawbacks, we propose Fela, which incorporates both flexible parallelism and elastic tuning mechanism to accelerate DML. In order to fully leverage GPU power and reduce communication cost, Fela adopts hybrid parallelism and uses flexible parallel degrees to train different parts of the model. Meanwhile, Fela designs token-based scheduling policy to elastically tune the workload among different workers, thus mitigating the straggler effect and achieve better load balance. Our comparative experiments show that Fela can significantly improve the training throughput and outperforms the three main baselines (i.e. dataparallel, model-parallel, and hybrid-parallel) by up to 3.23×, 12.22×, and 1.85× respectively.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"32 1","pages":"1393-1404"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82166847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Reinforcement Learning with Tree-LSTM for Join Order Selection 基于树- lstm的连接顺序选择强化学习

2020 IEEE 36th International Conference on Data Engineering (ICDE) Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00116

Xiang Yu, Guoliang Li, Chengliang Chai, N. Tang

{"title":"Reinforcement Learning with Tree-LSTM for Join Order Selection","authors":"Xiang Yu, Guoliang Li, Chengliang Chai, N. Tang","doi":"10.1109/ICDE48307.2020.00116","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00116","url":null,"abstract":"Join order selection (JOS) – the problem of finding the optimal join order for an SQL query – is a primary focus of database query optimizers. The problem is hard due to its large solution space. Exhaustively traversing the solution space is prohibitively expensive, which is often combined with heuristic pruning. Despite decades-long effort, traditional optimizers still suffer from low scalability or low accuracy when handling complicated SQL queries. Recent attempts using deep reinforcement learning (DRL), by encoding join trees with fixed-length handtuned feature vectors, have shed some light on JOS. However, using fixed-length feature vectors cannot capture the structural information of a join tree, which may produce poor join plans. Moreover, it may also cause retraining the neural network when handling schema changes (e.g., adding tables/columns) or multialias table names that are common in SQL queries.In this paper, we present RTOS, a novel learned optimizer that uses Reinforcement learning with Tree-structured long short-term memory (LSTM) for join Order Selection. RTOS improves existing DRL-based approaches in two main aspects: (1) it adopts graph neural networks to capture the structures of join trees; and (2) it well supports the modification of database schema and multi-alias table names. Extensive experiments on Join Order Benchmark (JOB) and TPC-H show that RTOS outperforms traditional optimizers and existing DRL-based learned optimizers. In particular, the plan RTOS generated for JOB is 101% on (estimated) cost and 67% on latency (i.e., execution time) on average, compared with dynamic programming that is known to produce the state-of-the-art results on join plans.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"19 1","pages":"1297-1308"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82078089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 85

Group Recommendation with Latent Voting Mechanism 具有潜在投票机制的群体推荐

2020 IEEE 36th International Conference on Data Engineering (ICDE) Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00018

Lei Guo, Hongzhi Yin, Qinyong Wang, B. Cui, Zi Huang, Li-zhen Cui

{"title":"Group Recommendation with Latent Voting Mechanism","authors":"Lei Guo, Hongzhi Yin, Qinyong Wang, B. Cui, Zi Huang, Li-zhen Cui","doi":"10.1109/ICDE48307.2020.00018","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00018","url":null,"abstract":"Group Recommendation (GR) is the task of suggesting relevant items/events for a group of users in online systems, whose major challenge is to aggregate the preferences of group members to infer the decision of a group. Prior group recommendation methods applied predefined static strategies for preference aggregation. However, these static strategies are insufficient to model the complicated decision making process of a group, especially for occasional groups which are formed adhoc. Compared to conventional individual recommendation task, GR is rather dynamic and each group member may contribute differently to the final group decision. Recent works argue that group members should have non-uniform weights in forming the decision of a group, and try to utilize a standard attention mechanism to aggregate the preferences of group members, but they do not model the interaction behavior among group members, and the decision making process is largely unexplored.In this work, we study GR in a more general scenario, that is Occasional Group Recommendation (OGR), and focus on solving the preference aggregation problem and the data sparsity issue of group-item interactions. Instead of exploring new heuristic or vanilla attention-based mechanism, we propose a new social self-attention based aggregation strategy by directly modeling the interactions among group members, namely Group Self-Attention (GroupSA). In GroupSA, we treat the group decision making process as multiple voting processes, and develop a stacked social self-attention network to simulate how a group consensus is reached. To overcome the data sparsity issue, we resort to the relatively abundant user-item and user-user interaction data, and enhance the representation of users by two types of aggregation methods. In the training process, we further propose a joint training method to learn the user/item embeddings in the group-item recommendation task and the user-item recommendation task simultaneously. Finally, we conduct extensive experiments on two real-world datasets. The experimental results demonstrate the superiority of our proposed GroupSA method compared to several state-of-the-art methods in terms of HR and NDCG.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"1 1","pages":"121-132"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76235553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 29

SAD: An Unsupervised System for Subsequence Anomaly Detection 子序列异常检测的无监督系统

2020 IEEE 36th International Conference on Data Engineering (ICDE) Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00168

Paul Boniol, Michele Linardi, Federico Roncallo, Themis Palpanas

引用次数: 14

Query-driven Repair of Functional Dependency Violations 查询驱动的功能依赖冲突修复

2020 IEEE 36th International Conference on Data Engineering (ICDE) Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00195

Stella Giannakopoulou, M. Karpathiotakis, A. Ailamaki

引用次数: 3

Fast Query Decomposition for Batch Shortest Path Processing in Road Networks 面向批量最短路径处理的快速查询分解

2020 IEEE 36th International Conference on Data Engineering (ICDE) Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00107

Lei Li, Mengxuan Zhang, Wen Hua, Xiaofang Zhou

{"title":"Fast Query Decomposition for Batch Shortest Path Processing in Road Networks","authors":"Lei Li, Mengxuan Zhang, Wen Hua, Xiaofang Zhou","doi":"10.1109/ICDE48307.2020.00107","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00107","url":null,"abstract":"Shortest path query is a fundamental operation in various location-based services (LBS) and most of them process queries on the server-side. As the business expands, scalability becomes a severe issue. Instead of simply deploying more servers to cope with the quickly increasing query number, batch shortest path algorithms have been proposed recently to answer a set of queries together using shareable computation. Besides, they can also work in a highly dynamic environment as no index is needed. However, the existing batch algorithms either assume the batch queries are finely decomposed or just process them without differentiation, resulting in poor query efficiency. In this paper, we aim to improve the performance of batch shortest path algorithms by revisiting the problem of query clustering. Specifically, we first propose three query decomposition methods to cluster queries: Zigzag that considers the 1-N shared computation; Search-Space Estimation that further incorporates search space estimation; and Co-Clustering that considers the source and target’s spatial locality. After that, we propose two batch algorithms that take advantage of the previously decomposed query sets for efficient query answering: Local Cache that improves the existing Global Cache with higher cache hit ratio, and R2R that finds a set of approximate shortest paths from one region to another with bounded error. Experiments on a large real-world query sets verify the effectiveness and efficiency of our decomposition methods compared with the state-of-the-art batch algorithms.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"224 1","pages":"1189-1200"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83454520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 32

Neighbor Profile: Bagging Nearest Neighbors for Unsupervised Time Series Mining 邻居配置文件:无监督时间序列挖掘的最近邻袋装化

2020 IEEE 36th International Conference on Data Engineering (ICDE) Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00039

Yuanduo He, Xu Chu, Yasha Wang

{"title":"Neighbor Profile: Bagging Nearest Neighbors for Unsupervised Time Series Mining","authors":"Yuanduo He, Xu Chu, Yasha Wang","doi":"10.1109/ICDE48307.2020.00039","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00039","url":null,"abstract":"Unsupervised time series mining has been attracting great interest from both academic and industrial communities. As the two most basic data mining tasks, the discoveries of frequent/rare subsequences have been extensively studied in the literature. Specifically, frequent/rare subsequences are defined as the ones with the smallest/largest 1-nearest neighbor distance, which are also known as motif/discord. However, discord fails to identify rare subsequences when it occurs more than once in the time series, which is widely known as the twin freak problem. This problem is just the \"tip of the iceberg\" due to the 1-nearest neighbor distance based definitions. In this work, we for the first time provide a clear theoretical analysis of motif/discord as the 1-nearest neighbor based nonparametric density estimation of subsequence. Particularly, we focus on matrix profile, a recently proposed mining framework, which unifies the discovery of motif and discord under the same computing model. Thereafter, we point out the inherent three issues: low-quality density estimation, gravity defiant behavior, and lack of reusable model, which deteriorate the performance of matrix profile in both efficiency and subsequence quality.To overcome these issues, we propose Neighbor Profile to robustly model the subsequence density by bagging nearest neighbors for the discovery of frequent/rare subsequences. Specifically, we leverage multiple subsamples and average the density estimations from subsamples using adjusted nearest neighbor distances, which not only enhances the estimation robustness but also realizes a reusable model for efficient learning. We check the sanity of neighbor profile on synthetic data and further evaluate it on real-world datasets. The experimental results demonstrate that neighbor profile can correctly model the subsequences of different densities and shows superior performance significantly over matrix profile on the real-world arrhythmia dataset. Also, it is shown that neighbor profile is efficient for massive datasets.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"5 1","pages":"373-384"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84766895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Statistical Estimation of Diffusion Network Topologies 扩散网络拓扑的统计估计

2020 IEEE 36th International Conference on Data Engineering (ICDE) Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00060

Ke‐qi Han, Yuan Tian, Yunjia Zhang, Ling Han, H. Huang, Yunjun Gao

{"title":"Statistical Estimation of Diffusion Network Topologies","authors":"Ke‐qi Han, Yuan Tian, Yunjia Zhang, Ling Han, H. Huang, Yunjun Gao","doi":"10.1109/ICDE48307.2020.00060","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00060","url":null,"abstract":"Reconstructing the topology of a diffusion network based on observed diffusion results is an open challenge in data mining. Existing approaches mostly assume that the observed diffusion results are available and consist of not only the final infection statuses of nodes, but also the exact timestamps that pinpoint when infections occur. Nonetheless, the exact infection timestamps are often unavailable in practice, due to a high cost and uncertainties in the monitoring of node infections. In this work, we investigate the problem of how to infer the topology of a diffusion network from only the final infection statuses of nodes. To this end, we propose a new scoring criterion for diffusion network reconstruction, which is able to estimate the likelihood of potential topologies of the objective diffusion network based on infection status results with a relatively low statistical error. As the proposed scoring criterion is decomposable, our problem is transformed into finding for each node in the network a set of most probable parent nodes that maximizes the value of a local score. Furthermore, to eliminate redundant computations during the search of most probable parent nodes, we identify insignificant candidate parent nodes by checking whether their infections have negative or extremely low positive correlations with the infections of a corresponding child node, and exclude them from the search space. Extensive experiments on both synthetic and real-world networks are conducted, and the results verify the effectiveness and efficiency of our approach.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"146 1","pages":"625-636"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88684404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

A Transformation-based Framework for KNN Set Similarity Search(Extended Abstract) 基于变换的KNN集相似度搜索框架(扩展摘要)

2020 IEEE 36th International Conference on Data Engineering (ICDE) Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00239

Yong Zhang, Jiacheng Wu, Jin Wang, Chunxiao Xing

引用次数: 1