{"title":"Unsupervised Graph Representation Learning Beyond Aggregated View","authors":"Jian Zhou;Jiasheng Li;Li Kuang;Ning Gui","doi":"10.1109/TKDE.2024.3418576","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3418576","url":null,"abstract":"Unsupervised graph representation learning aims to condense graph information into dense vector embeddings to support various downstream tasks. To achieve this goal, existing UGRL approaches mainly adopt the message-passing mechanism to simultaneously incorporate graph topology and node attribute with an aggregated view. However, recent research points out that this direct aggregation may lead to issues such as over-smoothing and/or topology distortion, as topology and node attribute of totally different semantics. To address this issue, this paper proposes a novel Graph Dual-view AutoEncoder framework (GDAE) which introduces the node-wise view for an individual node beyond the traditional aggregated view for aggregation of connected nodes. Specifically, the node-wise view captures the unique characteristics of individual node through a decoupling design, i.e., topology encoding by multi-steps random walk while preserving node-wise individual attribute. Meanwhile, the aggregated view aims to better capture the collective commonality among long-range nodes through an enhanced strategy, i.e., topology masking then attribute aggregation. Extensive experiments on 5 synthetic and 11 real-world benchmark datasets demonstrate that GDAE achieves the best results with up to 49.5% and 21.4% relative improvement in node degree prediction and cut-vertex detection tasks and remains top in node classification and link prediction tasks.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"9504-9516"},"PeriodicalIF":8.9,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142636353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhangtao Cheng;Fan Zhou;Xovee Xu;Kunpeng Zhang;Goce Trajcevski;Ting Zhong;Philip S. Yu
{"title":"Information Cascade Popularity Prediction via Probabilistic Diffusion","authors":"Zhangtao Cheng;Fan Zhou;Xovee Xu;Kunpeng Zhang;Goce Trajcevski;Ting Zhong;Philip S. Yu","doi":"10.1109/TKDE.2024.3465241","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3465241","url":null,"abstract":"Information cascade popularity prediction is an important problem in social network content diffusion analysis. Various facets have been investigated (e.g., diffusion structures and patterns, user influence) and, recently, deep learning models based on sequential architecture and graph neural network (GNN) have been leveraged. However, despite the improvements attained in predicting the future popularity, these methodologies fail to capture two essential aspects inherent to information diffusion: (1) the temporal irregularity of cascade event – i.e., users’ re-tweetings at random and non-periodic time instants; and (2) the inherent uncertainty of the information diffusion. To address these challenges, in this work, we present CasDO – a novel framework for information cascade popularity prediction with probabilistic diffusion models and neural ordinary differential equations (ODEs). We devise a temporal ODE network to generalize the discrete state transitions in RNNs to continuous-time dynamics. CasDO introduces a probabilistic diffusion model to consider the uncertainties in information diffusion by injecting noises in the forwarding process and reconstructing cascade embedding in the reversing process. Extensive experiments that we conducted on three large-scale datasets demonstrate the advantages of the CasDO model over baselines.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"8541-8555"},"PeriodicalIF":8.9,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142645547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rongqing Li;Jiaqi Yu;Changsheng Li;Wenhan Luo;Ye Yuan;Guoren Wang
{"title":"DREAM: Domain-Agnostic Reverse Engineering Attributes of Black-Box Model","authors":"Rongqing Li;Jiaqi Yu;Changsheng Li;Wenhan Luo;Ye Yuan;Guoren Wang","doi":"10.1109/TKDE.2024.3460806","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3460806","url":null,"abstract":"Deep learning models are usually black boxes when deployed on machine learning platforms. Prior works have shown that the attributes (e.g., the number of convolutional layers) of a target black-box model can be exposed through a sequence of queries. There is a crucial limitation: these works assume the training dataset of the target model is known beforehand and leverage this dataset for model attribute attack. However, it is difficult to access the training dataset of the target black-box model in reality. Therefore, whether the attributes of a target black-box model could be still revealed in this case is doubtful. In this paper, we investigate a new problem of black-box reverse engineering, without requiring the availability of the target model’s training dataset. We put forward a general and principled framework DREAM, by casting this problem as out-of-distribution (OOD) generalization. In this way, we can learn a domain-agnostic meta-model to infer the attributes of the target black-box model with unknown training data. This makes our method one of the kinds that can gracefully apply to an arbitrary domain for model attribute reverse engineering with strong generalization ability. Extensive experimental results demonstrate the superiority of our proposed method over the baselines.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"8009-8022"},"PeriodicalIF":8.9,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142636364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhongyun Zhang;Lei Yang;Jiajun Yao;Chao Ma;Jianguo Wang
{"title":"Joint Optimization of Pricing, Dispatching and Repositioning in Ride-Hailing With Multiple Models Interplayed Reinforcement Learning","authors":"Zhongyun Zhang;Lei Yang;Jiajun Yao;Chao Ma;Jianguo Wang","doi":"10.1109/TKDE.2024.3464563","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3464563","url":null,"abstract":"Popular ride-hailing products, such as DiDi, Uber and Lyft, provide people with transportation convenience. Pricing, order dispatching and vehicle repositioning are three tasks with tight correlation and complex interactions in ride-hailing platforms, significantly impacting each other’s decisions and demand distribution or supply distribution. However, no past work considered combining the three tasks to improve platform efficiency. In this paper, we exploit to optimize pricing, dispatching and repositioning strategies simultaneously. Such a new multi-stage decision-making problem is quite challenging because it involves complex coordination and lacks a unified problem model. To address this problem, we propose a novel \u0000<bold>J</b>\u0000oint optimization framework of \u0000<bold>P</b>\u0000ricing, \u0000<bold>D</b>\u0000ispatching and \u0000<bold>R</b>\u0000epositioning (JPDR) integrating contextual bandit and multi-agent deep reinforcement learning. JPDR consists of two components, including a Soft Actor-Critic (SAC)-based centralized policy for dispatching and repositioning and a pricing strategy learned by a multi-armed contextual bandit algorithm based on the feedback from the former. The two components learn in a mutually guided way to achieve joint optimization because their updates are highly interdependent. Based on real-world data, we implement a realistic environment simulator. Extensive experiments conducted on it show our method outperforms state-of-the-art baselines in terms of both gross merchandise volume and success rate.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"8593-8606"},"PeriodicalIF":8.9,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142645550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Data-Driven Three-Stage Adaptive Pattern Mining Approach for Multi-Energy Loads","authors":"Yixiu Guo;Yong Li;Sisi Zhou;Zhenyu Zhang;Zuyi Li;Mohammad Shahidehpour","doi":"10.1109/TKDE.2024.3462770","DOIUrl":"10.1109/TKDE.2024.3462770","url":null,"abstract":"In-depth understanding of the multi-energy consumption behavior pattern is the essential to improve the management of multi-energy system (MES). This paper proposes a data-driven three-stage adaptive pattern mining approach for multi-energy loads, which addresses the issues of complex multi-dimensional time-series mining, uncommon daily loads discovery, typical load classification and parameter setting requiring user intervention. In the first stage, the relative state changes over time between different energy loads are excavated based on Autoplait, which realizes time pattern discovery, segmentation and match for multi-dimensional loads. In the second stage, adaptive affinity propagation (AAP) considering trend similarity distance (TSD) is proposed to classify loads into common and uncommon clusters, where uncommon loads are eliminated and daily pattern is obtained by taking average of common loads. In the third stage, AAP with windows dynamic time warping (WDTW) identifies various profiles to obtain typical pattern of daily loads. Specifically, pattern mining provides the key information of multi-energy loads, which is significant to the applications for the demand side, such as load scene compression, load forecasting and demand response analysis. A case study uses MES data from Arizona State University to verify the effectiveness and practicality of the proposed approach.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"7455-7467"},"PeriodicalIF":8.9,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142265832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Progressive Skeleton Learning for Effective Local-to-Global Causal Structure Learning","authors":"Xianjie Guo;Kui Yu;Lin Liu;Jiuyong Li;Jiye Liang;Fuyuan Cao;Xindong Wu","doi":"10.1109/TKDE.2024.3461832","DOIUrl":"10.1109/TKDE.2024.3461832","url":null,"abstract":"Causal structure learning (CSL) from observational data is a crucial objective in various machine learning applications. Recent advances in CSL have focused on local-to-global learning, which offers improved efficiency and accuracy. The local-to-global CSL algorithms first learn the local skeleton of each variable in a dataset, then construct the global skeleton by combining these local skeletons, and finally orient edges to infer causality. However, data quality issues such as noise and small samples often result in the presence of problematic \u0000<italic>asymmetric edges</i>\u0000 during global skeleton construction, hindering the creation of a high-quality global skeleton. To address this challenge, we propose a novel local-to-global CSL algorithm with a progressive enhancement strategy and make the following novel contributions: 1) To construct an accurate global skeleton, we design a novel strategy to iteratively correct \u0000<italic>asymmetric edges</i>\u0000 and progressively improve the accuracy of the global skeleton. 2) Based on the learned accurate global skeleton, we design an integrated global skeleton orientation strategy to infer the correct directions of edges for obtaining an accurate and reliable causal structure. Extensive experiments demonstrate that our method achieves better performance than the existing CSL methods.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"9065-9079"},"PeriodicalIF":8.9,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142265833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Multi-Task Learning for Spatio-Temporal Incomplete Qualitative Event Forecasting","authors":"Tanmoy Chowdhury;Yuyang Gao;Liang Zhao","doi":"10.1109/TKDE.2024.3460539","DOIUrl":"10.1109/TKDE.2024.3460539","url":null,"abstract":"Forecasting spatiotemporal social events has significant benefits for society to provide the proper amounts and types of resources to manage catastrophes and any accompanying societal risks. Nevertheless, forecasting event subtypes are far more complex than merely extending binary prediction to cover multiple subtypes because of spatial heterogeneity, experiencing a partial set of event subtypes, subtle discrepancy among different event subtypes, nature of the event subtype, spatial correlation of event subtypes. We present \u0000<underline>D</u>\u0000e\u0000<underline>e</u>\u0000p mul\u0000<underline>t</u>\u0000i-task l\u0000<underline>e</u>\u0000arning for spatio-temporal in\u0000<underline>c</u>\u0000omple\u0000<underline>t</u>\u0000e qual\u0000<underline>i</u>\u0000tative e\u0000<underline>v</u>\u0000ent for\u0000<underline>e</u>\u0000casting (DETECTIVE) framework to effectively forecast the subtypes of future events by addressing all these issues. This formulates spatial locations into tasks to handle spatial heterogeneity in event subtypes and learns a joint deep representation of subtypes across tasks. This has the adaptability to be used for different types of problem formulation required by the nature of the events. Furthermore, based on the “first law of geography”, spatially-closed tasks share similar event subtypes or scale patterns so that adjacent tasks can share knowledge effectively. To optimize the non-convex and strongly coupled problem of the proposed model, we also propose algorithms based on the Alternating Direction Method of Multipliers (ADMM). Extensive experiments on real-world datasets demonstrate the model’s usefulness and efficiency.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"7913-7926"},"PeriodicalIF":8.9,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142265835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamical Targeted Ensemble Learning for Streaming Data With Concept Drift","authors":"Husheng Guo;Yang Zhang;Wenjian Wang","doi":"10.1109/TKDE.2024.3460404","DOIUrl":"10.1109/TKDE.2024.3460404","url":null,"abstract":"Concept drift is an important characteristic and inevitable difficult problem in streaming data mining. Ensemble learning is commonly used to deal with concept drift. However, most ensemble methods cannot balance the accuracy and diversity of base learners after drift occurs, and cannot adjust adaptively according to the drift type. To solve these problems, this paper proposes a targeted ensemble learning (Targeted EL) method to improve the accuracy and diversity of ensemble learning for streaming data with abrupt and gradual concept drift. First, to improve the accuracy of the base learners, the method adopts different sample weighting strategies for different types of drift to realize bidirectional transfer of new and old distributed samples. Second, the difference matrix is constructed by the prediction results of the base learners on the current samples. According to the drift type, the submatrix with appropriate size and maximum difference sum is extracted adaptively to select appropriate, accuracy and diverse base learners for ensemble. The experimental results show that the proposed method can achieve good generalization performance when dealing with the streaming data with abrupt and gradual concept drift.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"8023-8036"},"PeriodicalIF":8.9,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142265834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huanyu Zhang;Yi-Fan Zhang;Zhang Zhang;Qingsong Wen;Liang Wang
{"title":"LogoRA: Local-Global Representation Alignment for Robust Time Series Classification","authors":"Huanyu Zhang;Yi-Fan Zhang;Zhang Zhang;Qingsong Wen;Liang Wang","doi":"10.1109/TKDE.2024.3459908","DOIUrl":"10.1109/TKDE.2024.3459908","url":null,"abstract":"Unsupervised domain adaptation (UDA) of time series aims to teach models to identify consistent patterns across various temporal scenarios, disregarding domain-specific differences, which can maintain their predictive accuracy and effectively adapt to new domains. However, existing UDA methods struggle to adequately extract and align both global and local features in time series data. To address this issue, we propose the \u0000<bold>Lo</b>\u0000cal-\u0000<bold>G</b>\u0000l\u0000<bold>o</b>\u0000bal \u0000<bold>R</b>\u0000epresentation \u0000<bold>A</b>\u0000lignment framework (LogoRA), which employs a two-branch encoder–comprising a multi-scale convolutional branch and a patching transformer branch. The encoder enables the extraction of both local and global representations from time series. A fusion module is then introduced to integrate these representations, enhancing domain-invariant feature alignment from multi-scale perspectives. To achieve effective alignment, LogoRA employs strategies like invariant feature learning on the source domain, utilizing triplet loss for fine alignment and dynamic time warping-based feature alignment. Additionally, it reduces source-target domain gaps through adversarial training and per-class prototype alignment. Our evaluations on four time-series datasets demonstrate that LogoRA outperforms strong baselines by up to 12.52%, showcasing its superiority in time series UDA tasks.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"8718-8729"},"PeriodicalIF":8.9,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142220104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tonglong Wei;Youfang Lin;Shengnan Guo;Yan Lin;Yiheng Huang;Chenyang Xiang;Yuqing Bai;Huaiyu Wan
{"title":"Diff-RNTraj: A Structure-Aware Diffusion Model for Road Network-Constrained Trajectory Generation","authors":"Tonglong Wei;Youfang Lin;Shengnan Guo;Yan Lin;Yiheng Huang;Chenyang Xiang;Yuqing Bai;Huaiyu Wan","doi":"10.1109/TKDE.2024.3460051","DOIUrl":"10.1109/TKDE.2024.3460051","url":null,"abstract":"Trajectory data is essential for various applications. However, publicly available trajectory datasets remain limited in scale due to privacy concerns, which hinders the development of trajectory mining and applications. Although some trajectory generation methods have been proposed to expand dataset scale, they generate trajectories in the geographical coordinate system, posing two limitations for practical applications: 1) failing to ensure that the generated trajectories are road-constrained. 2) lacking road-related information. In this paper, we propose a new problem, road network-constrained trajectory (RNTraj) generation, which can directly generate trajectories on the road network with road-related information. Specifically, RNTraj is a hybrid type of data, in which each point is represented by a discrete road segment and a continuous moving rate. To generate RNTraj, we design a diffusion model called Diff-RNTraj, which can effectively handle the hybrid RNTraj using a continuous diffusion framework by incorporating a pre-training strategy to embed hybrid RNTraj into continuous representations. During the sampling stage, a RNTraj decoder is designed to map the continuous representation generated by the diffusion model back to the hybrid RNTraj format. Furthermore, Diff-RNTraj introduces a novel loss function to enhance trajectory’s spatial validity. Extensive experiments conducted on two datasets demonstrate the effectiveness of Diff-RNTraj.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"7940-7953"},"PeriodicalIF":8.9,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142220106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}