2023 IEEE 39th International Conference on Data Engineering (ICDE)最新文献_第4页

Extracting Graphs Properties with Semantic Joins 使用语义连接提取图形属性

2023 IEEE 39th International Conference on Data Engineering (ICDE) Pub Date : 2023-04-01 DOI: 10.1109/ICDE55515.2023.00175

Yang Cao, W. Fan, Wenzhi Fu, Ruochun Jin, Weijie Ou, Wenliang Yi

引用次数: 0

Exploiting Reuse for GPU Subgraph Enumeration (Extended Abstract) 利用GPU子图枚举的重用(扩展摘要)

2023 IEEE 39th International Conference on Data Engineering (ICDE) Pub Date : 2023-04-01 DOI: 10.1109/ICDE55515.2023.00309

Wentian Guo, Yuchen Li, K. Tan

{"title":"Exploiting Reuse for GPU Subgraph Enumeration (Extended Abstract)","authors":"Wentian Guo, Yuchen Li, K. Tan","doi":"10.1109/ICDE55515.2023.00309","DOIUrl":"https://doi.org/10.1109/ICDE55515.2023.00309","url":null,"abstract":"Subgraph enumeration is important for many applications such as network motif discovery, community detection, and frequent subgraph mining. To accelerate the execution, recent works utilize graphics processing units (GPUs) to parallelize subgraph enumeration. The performances of these parallel schemes are dominated by the set intersection operations which account for up to 95% of the total processing time. (Un)surprisingly, a significant portion (as high as 99%) of these operations is actually redundant, i.e., the same set of vertices is repeatedly encountered and evaluated. Therefore, in this paper, we seek to salvage and recycle the results of such operations to avoid repeated computation. Our solution consists of two phases. In the first phase, we generate a reusable plan that determines the opportunity for reuse. The plan is based on a novel reuse discovery mechanism that can identify available results to prevent redundant computation. In the second phase, the plan is executed to produce the subgraph enumeration results. This processing is based on a newly designed reusable parallel search strategy that can efficiently maintain and retrieve the results of set intersection operations. Our implementation on GPUs shows that our approach can achieve up to 5 times speedups compared with the state-of-the-art GPU solutions.","PeriodicalId":434744,"journal":{"name":"2023 IEEE 39th International Conference on Data Engineering (ICDE)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114763737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Keyword-based Socially Tenuous Group Queries 基于关键字的社会脆弱群体查询

2023 IEEE 39th International Conference on Data Engineering (ICDE) Pub Date : 2023-04-01 DOI: 10.1109/ICDE55515.2023.00079

Huaijie Zhu, Wei Liu, Jian Yin, Ningning Cui, Jianliang Xu, Xinfeng Huang, Wang-Chien Lee

{"title":"Keyword-based Socially Tenuous Group Queries","authors":"Huaijie Zhu, Wei Liu, Jian Yin, Ningning Cui, Jianliang Xu, Xinfeng Huang, Wang-Chien Lee","doi":"10.1109/ICDE55515.2023.00079","DOIUrl":"https://doi.org/10.1109/ICDE55515.2023.00079","url":null,"abstract":"Socially tenuous groups (or simply tenuous groups) in a social network/graph refer to subgraphs with few social interactions and weak relationships among members. However, existing studies on tenuous group queries do not consider the user profiles (keywords) of the members whereas in many social network applications, e.g., finding reviewers for paper selection and recommending seed users in social advertising, keywords also need to be considered. Thus, in this paper, we investigate the problem of keywords-based socially tenous group (KTG) queries. A KTG query is to find top N tenuous groups in which the members of each group jointly cover the most number of query keywords. To address the KTG problem, we first propose two exact algorithms, namely KTG-VKC and KTG-VKC-DEG, which give priority to the valid keyword coverage and the combination of valid keyword coverage and degree, respectively, to select members to form a feasible group by adopting a branch and bound (BB) strategy. Moreover, we propose keyword pruning and k-line filtering to accelerate the algorithms. To yield diversified KTG results, we also study the problem of diversified keywords-based socially tenous group (DKTG) queries. To deal with the DKTG problem, we propose a DKTG-Greedy algorithm by exploiting a greedy heuristic in combination with KTG-VKC-DEG. Furthermore, we design two alternative indexes, namely NL and NLRNL, to efficiently check whether the social distance of any two members is greater than the social constraint k in the above algorithms. We conduct extensive experiments using real datasets to validate our ideas and evaluate the proposed algorithms. Experimental results show that the NLRNL index achieves a better performance than the NL index.","PeriodicalId":434744,"journal":{"name":"2023 IEEE 39th International Conference on Data Engineering (ICDE)","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114641855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Survey on Spark Ecosystem: Big Data Processing Infrastructure, Machine Learning, and Applications (Extended abstract) Spark生态系统综述:大数据处理基础设施、机器学习与应用(扩展摘要)

2023 IEEE 39th International Conference on Data Engineering (ICDE) Pub Date : 2023-04-01 DOI: 10.1109/ICDE55515.2023.00316

Shanjian Tang, Bin He, Ce Yu, Yusen Li, Kun Li

{"title":"A Survey on Spark Ecosystem: Big Data Processing Infrastructure, Machine Learning, and Applications (Extended abstract)","authors":"Shanjian Tang, Bin He, Ce Yu, Yusen Li, Kun Li","doi":"10.1109/ICDE55515.2023.00316","DOIUrl":"https://doi.org/10.1109/ICDE55515.2023.00316","url":null,"abstract":"With the explosive increase of big data in industry and academic fields, it is important to apply large-scale data processing systems to analyze Big Data. Arguably, Spark is the state-of-the-art in large-scale data computing systems nowadays, due to its good properties including generality, fault tolerance, high performance of in-memory data processing, and scalability. Spark adopts a flexible Resident Distributed Dataset (RDD) programming model with a set of provided transformation and action operators whose operating functions can be customized by users according to their applications. It is originally positioned as a fast and general data processing system. A large body of research efforts have been made to make it more efficient (faster) and general by considering various circumstances since its introduction. In this survey, we aim to have a thorough review of various kinds of optimization techniques on the generality and performance improvement of Spark. We introduce various data management and processing systems, machine learning algorithms and applications supported by Spark. Additionally, we make a discussion on the open issues and challenges for large-scale in-memory data processing with Spark.","PeriodicalId":434744,"journal":{"name":"2023 IEEE 39th International Conference on Data Engineering (ICDE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117189740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ROI-demand Traffic Prediction: A Pre-train, Query and Fine-tune Framework roi需求流量预测:一个预训练、查询和微调框架

2023 IEEE 39th International Conference on Data Engineering (ICDE) Pub Date : 2023-04-01 DOI: 10.1109/ICDE55515.2023.00107

Yue Cui, Shuhao Li, W. Deng, Zhaokun Zhang, Jing Zhao, Kai Zheng, Xiaofang Zhou

{"title":"ROI-demand Traffic Prediction: A Pre-train, Query and Fine-tune Framework","authors":"Yue Cui, Shuhao Li, W. Deng, Zhaokun Zhang, Jing Zhao, Kai Zheng, Xiaofang Zhou","doi":"10.1109/ICDE55515.2023.00107","DOIUrl":"https://doi.org/10.1109/ICDE55515.2023.00107","url":null,"abstract":"Traffic prediction has drawn increasing attention due to its essential role in smart city applications. To achieve precise predictions, a large number of approaches have been proposed to model spatial dependencies and temporal dynamics. Despite their superior performance, most existing studies focus datasets that are usually in large geographic scales, e.g., citywide, while ignoring the results on specific regions. However, in many scenarios, for example, route planning on time-dependent road networks, only small regions are of interest. We name the task of answering forecasting requests from any query region of interest (ROI) as ROI-demand traffic prediction (RTP). In this paper, we make a primary observation that existing methods fail to jointly achieve effectiveness and efficiency for RTP. To address this issue, a novel model-agnostic framework based on pre-Training, Querying and fine-Tuning, named TQT, is proposed, which first customizes input data given an ROI, and then makes fast adaptation from pre-trained traffic prediction backbone models by fine-tuning. We evaluate TQT on two real-world traffic datasets, performing both flow and speed prediction tasks. Extensive experiment results demonstrate the effectiveness and efficiency of the proposed method.","PeriodicalId":434744,"journal":{"name":"2023 IEEE 39th International Conference on Data Engineering (ICDE)","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117292395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Self-Supervised Spatial-Temporal Bottleneck Attentive Network for Efficient Long-term Traffic Forecasting 高效长期交通预测的自监督时空瓶颈关注网络

2023 IEEE 39th International Conference on Data Engineering (ICDE) Pub Date : 2023-04-01 DOI: 10.1109/ICDE55515.2023.00125

S. Guo, Youfang Lin, Letian Gong, Chenyu Wang, Zeyu Zhou, Zekai Shen, Yiheng Huang, Huaiyu Wan

{"title":"Self-Supervised Spatial-Temporal Bottleneck Attentive Network for Efficient Long-term Traffic Forecasting","authors":"S. Guo, Youfang Lin, Letian Gong, Chenyu Wang, Zeyu Zhou, Zekai Shen, Yiheng Huang, Huaiyu Wan","doi":"10.1109/ICDE55515.2023.00125","DOIUrl":"https://doi.org/10.1109/ICDE55515.2023.00125","url":null,"abstract":"In intelligent transportation systems, accurate long-term traffic forecasting is informative for administrators and travelers to make wise decisions in advance. Recently proposed spatial-temporal forecasting models perform well for short-term traffic forecasting, but two challenges hinder their applications for long-term forecasting in practice. Firstly, existing traffic forecasting models do not have satisfactory scalability on effectiveness and efficiency, i.e., as the prediction time spans extend, existing models either cannot capture the long-term spatial-temporal dynamics of traffic data or equip global receptive fields at the cost of quadratic computational complexity. Secondly, the dilemma between the models’ strong appetite for high-quality training data and their generalization ability is also a challenge we have to face. Thus how to improve data utilization efficiency deserves thoughtful thinking. Aiming at solving the long-term traffic forecasting problem and facilitating the deployment of traffic forecasting models in practice, this paper proposes an efficient and effective Self-supervised Spatial-Temporal Bottleneck Attentive Network (SSTBAN). Specifically, SSTBAN follows a multi-task framework by incorporating a self-supervised learner to produce robust latent representations for historical traffic data, so as to improve its generalization performance and robustness for forecasting. Besides, we design a spatial-temporal bottleneck attention mechanism, reducing the computational complexity meanwhile encoding global spatial-temporal dynamics. Extensive experiments on real-world long-term traffic forecasting tasks, including traffic speed forecasting and traffic flow forecasting under nine scenarios, demonstrate that SSTBAN not only achieves the overall best performance but also has good computation efficiency and data utilization efficiency.","PeriodicalId":434744,"journal":{"name":"2023 IEEE 39th International Conference on Data Engineering (ICDE)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117312366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Distributed (α, β)-Core Decomposition over Bipartite Graphs 二部图上的分布(α， β)-核分解

2023 IEEE 39th International Conference on Data Engineering (ICDE) Pub Date : 2023-04-01 DOI: 10.1109/ICDE55515.2023.00075

Qing Liu, Xuankun Liao, Xinfeng Huang, Jianliang Xu, Yunjun Gao

{"title":"Distributed (α, β)-Core Decomposition over Bipartite Graphs","authors":"Qing Liu, Xuankun Liao, Xinfeng Huang, Jianliang Xu, Yunjun Gao","doi":"10.1109/ICDE55515.2023.00075","DOIUrl":"https://doi.org/10.1109/ICDE55515.2023.00075","url":null,"abstract":"(α, β)-core is an important cohesive subgraph model for bipartite graphs. Given a bipartite graph G, the problem of (α, β)-core decomposition is to compute non-empty (α, β)-cores for all possible values of α and β. The state-of-the-art (α, β)-core decomposition algorithm is a peeling-based algorithm, which iteratively deletes the vertex from high degree to low degree. However, as the peeling-based algorithm is designed for centralized environments, it cannot be applied to distributed environments, where graphs are partitioned and stored in different machines. Motivated by this, in this paper, we study the distributed (α, β)-core decomposition problem, aiming to develop new algorithms to support (α, β)-core decomposition in distributed environments. To this end, first, we analyze the local properties of (α, β)-core, and devise n-order Bi-indexes for the vertex, which are iteratively defined using the vertex neighbors’ (n − 1)-order Bi-indexes. Next, we propose an algorithm for (α, β)-core decomposition through iteratively calculating n-order Bi-indexes for every vertex. To further improve the efficiency of the algorithm, we propose two optimizations. Then, we extend our proposed algorithms to different distributed graph processing frameworks to make them run in distributed environments. Finally, extensive experimental results on both real and synthetic bipartite graphs demonstrate the efficiency of our proposed algorithms.","PeriodicalId":434744,"journal":{"name":"2023 IEEE 39th International Conference on Data Engineering (ICDE)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116300656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Discovery of Cross Joins (Extended Abstract) 交叉连接的发现(扩展摘要)

2023 IEEE 39th International Conference on Data Engineering (ICDE) Pub Date : 2023-04-01 DOI: 10.1109/ICDE55515.2023.00353

Miika Hannula, Zhuoxing Zhang, Bor-Kuan Song, S. Link

引用次数: 0

Batch-Based Cooperative Task Assignment in Spatial Crowdsourcing 空间众包中基于批处理的协同任务分配

2023 IEEE 39th International Conference on Data Engineering (ICDE) Pub Date : 2023-04-01 DOI: 10.1109/ICDE55515.2023.00095

Yezhou Yang, Yurong Cheng, Yeru Yang, Ye Yuan, Guoren Wang

{"title":"Batch-Based Cooperative Task Assignment in Spatial Crowdsourcing","authors":"Yezhou Yang, Yurong Cheng, Yeru Yang, Ye Yuan, Guoren Wang","doi":"10.1109/ICDE55515.2023.00095","DOIUrl":"https://doi.org/10.1109/ICDE55515.2023.00095","url":null,"abstract":"The rapid development of the spatial crowdsourcing platform in the fields of express delivery, food delivery, and intelligent transportation has attracted widespread attention. As a typical problem in spatial crowdsourcing, online task matching problem has been widely studied. Most of the existing researches are based on the task allocation of different optimizations under one single platform. Recently, in order to solve the situation of non-uniform distribution of tasks and crowd workers on a single platform, cross online task assignment has been proposed aiming at increasing the mutual benefit through cooperations. However, existing methods lead to the situation where the local platform lends workers to other platforms, resulting in a lack of workers of itself. In this paper, we propose a Batch-Based Cooperative Task Assignment(BCTA) problem, which enables multi-platform task assignment to be completed within a tolerant time. We design a BCTA model and propose fixed-t BCTA(FT-BCTA) algorithm and adaptive BCTA(Adt-BCTA) algorithm to solve the BCTA problem. FT-BCTA focuses on a fixed batching strategy, while Adt-BCTA considers the batching strategy adaptively according to the supply and demand of multi-platforms. Extensive experiments on both real datasets and synthetic datasets show the effectiveness and efficiency of our algorithms.","PeriodicalId":434744,"journal":{"name":"2023 IEEE 39th International Conference on Data Engineering (ICDE)","volume":"215 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126105282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient Public Transport Planning on Roads 有效的道路公共交通规划

2023 IEEE 39th International Conference on Data Engineering (ICDE) Pub Date : 2023-04-01 DOI: 10.1109/ICDE55515.2023.00188

Libin Wang, R. C. Wong

{"title":"Efficient Public Transport Planning on Roads","authors":"Libin Wang, R. C. Wong","doi":"10.1109/ICDE55515.2023.00188","DOIUrl":"https://doi.org/10.1109/ICDE55515.2023.00188","url":null,"abstract":"Public transport contributes significantly to addressing some city issues such as air pollution and traffic congestion. As the public transport demand changes in urban development, we need to plan new routes to match the demand. Existing methods of planning new bus routes either are inefficient in using the path’s cost or use other inaccurate cost measurements. This paper focuses on finding a new bus route efficiently on road networks. Specifically, we first propose the Bus Routing on Roads (BRR) problem which combines two common goals of minimizing the walking costs of passengers and maximizing the connectivity of the new route to the existing transit network. They are consistent with matching the demand and facilitating the transfer. We first show the NP-hardness of the BRR and design an approximation algorithm called Efficient Bus Routing on Roads (EBRR). We theoretically analyzed its approximation ratio and time complexity. Extensive evaluations with state-of-the-art solutions on three real-world datasets validate the effectiveness and efficiency of EBRR. It could recommend a new bus route with high quality in around 10 seconds, 60x faster than the baselines.","PeriodicalId":434744,"journal":{"name":"2023 IEEE 39th International Conference on Data Engineering (ICDE)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124838555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0