{"title":"Hierarchical Denoising for Robust Social Recommendation","authors":"Zheng Hu;Satoshi Nakagawa;Yan Zhuang;Jiawen Deng;Shimin Cai;Tao Zhou;Fuji Ren","doi":"10.1109/TKDE.2024.3508778","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3508778","url":null,"abstract":"Social recommendations leverage social networks to augment the performance of recommender systems. However, the critical task of denoising social information has not been thoroughly investigated in prior research. In this study, we introduce a hierarchical denoising robust social recommendation model to tackle noise at two levels: 1) intra-domain noise, resulting from user multi-faceted social trust relationships, and 2) inter-domain noise, stemming from the entanglement of the latent factors over heterogeneous relations (e.g., user-item interactions, user-user trust relationships). Specifically, our model advances a preference and social psychology-aware methodology for the fine-grained and multi-perspective estimation of tie strength within social networks. This serves as a precursor to an edge weight-guided edge pruning strategy that refines the model's diversity and robustness by dynamically filtering social ties. Additionally, we propose a user interest-aware cross-domain denoising gate, which not only filters noise during the knowledge transfer process but also captures the high-dimensional, nonlinear information prevalent in social domains. We conduct extensive experiments on three real-world datasets to validate the effectiveness of our proposed model against state-of-the-art baselines. We perform empirical studies on synthetic datasets to validate the strong robustness of our proposed model.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 2","pages":"739-753"},"PeriodicalIF":8.9,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142940804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Graph Percolation Embeddings for Efficient Knowledge Graph Inductive Reasoning","authors":"Kai Wang;Dan Lin;Siqiang Luo","doi":"10.1109/TKDE.2024.3508064","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3508064","url":null,"abstract":"We study Graph Neural Networks (GNNs)-based embedding techniques for knowledge graph (KG) reasoning. For the first time, we link the path redundancy issue in the state-of-the-art path encoding-based models to the transformation error in model training, which brings us new theoretical insights into KG reasoning, as well as high efficacy in practice. On the theoretical side, we analyze the entropy of transformation error in KG paths and point out query-specific redundant paths causing entropy increases. These findings guide us to maintain the shortest paths and remove redundant paths for minimized-entropy message passing. To achieve this goal, on the practical side, we propose an efficient Graph Percolation process motivated by the percolation phenomenon in Fluid Mechanics, and design a lightweight GNN-based KG reasoning framework called Graph Percolation Embeddings (<italic>GraPE</i>)<sup>1</sup>. GraPE outperforms state-of-the-art methods in both transductive and inductive reasoning tasks, while requiring fewer training parameters and less inference time.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 3","pages":"1198-1212"},"PeriodicalIF":8.9,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Runze Yang;Hao Peng;Angsheng Li;Peng Li;Chunyang Liu;Philip S. Yu
{"title":"Hierarchical Abstracting Graph Kernel","authors":"Runze Yang;Hao Peng;Angsheng Li;Peng Li;Chunyang Liu;Philip S. Yu","doi":"10.1109/TKDE.2024.3509028","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3509028","url":null,"abstract":"Graph kernels have been regarded as a successful tool for handling a variety of graph applications since they were proposed. However, most of the proposed graph kernels are based on the R-convolution framework, which decomposes graphs into a set of substructures at the same abstraction level and compares all substructure pairs equally; these methods inherently overlook the utility of the hierarchical structural information embedded in graphs. In this paper, we propose \u0000<bold>H</b>\u0000ierarchical \u0000<bold>A</b>\u0000bstracting \u0000<bold>G</b>\u0000raph \u0000<bold>K</b>\u0000ernels (HAGK), a novel set of graph kernels that compare graphs’ hierarchical substructures to capture and utilize the latent hierarchical structural information fully. Instead of generating non-structural substructures, we reveal each graph’s hierarchical substructures by constructing its \u0000<italic>hierarchical abstracting</i>\u0000, specifically, the hierarchically organized nested node sets adhering to the principle of structural entropy minimization. To compare a pair of hierarchical abstractings, we propose two novel substructure matching approaches, \u0000<italic>Local Optimal Matching</i>\u0000 (LOM) and \u0000<italic>Priority Ordering Matching</i>\u0000 (POM), to find appropriate matching between the substructures by different strategies recursively. Extensive experiments demonstrate that the proposed kernels are highly competitive with the existing state-of-the-art graph kernels, and verify that the hierarchical abstracting plays a significant role in the improvement of the kernel performance.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 2","pages":"724-738"},"PeriodicalIF":8.9,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142940944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FWCEC: An Enhanced Feature Weighting Method via Causal Effect for Clustering","authors":"Fuyuan Cao;Xuechun Jing;Kui Yu;Jiye Liang","doi":"10.1109/TKDE.2024.3508057","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3508057","url":null,"abstract":"Feature weighting aims to assign different weights to features based on their importance in machine learning tasks. In clustering tasks, the existing methods learn feature importance based on the clustering results derived from the collaborative contribution of all features, which overlooks the independent effect of each feature. In fact, there are underlying causal relationships between features and the clustering results, and the features with high causal effects are always more crucial for clustering. Therefore, we propose an enhanced \u0000<underline>F</u>\u0000eature \u0000<underline>W</u>\u0000eighting method via \u0000<underline>C</u>\u0000ausal \u0000<underline>E</u>\u0000ffect for \u0000<underline>C</u>\u0000lustering, calculating the causal effect of each feature on the clustering results for obtaining the independent contribution of each feature. Specifically, we start by identifying the causal relationships among the features and utilizing the causal relationships to generate a reasonable treatment group. Next, we compare the changes in the data distribution between the treatment and control groups to determine the causal effect of each feature. Finally, the causal effects of features are used for enhancing the clustering-driven weight learning. Moreover, we present a theory of relative order consistency in causal effect. Experimental results demonstrate that utilizing causal effect in weight learning facilitates efficient convergence and achieves superior accuracy compared to state-of-the-art clustering algorithms.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 2","pages":"685-697"},"PeriodicalIF":8.9,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142940768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Graph Cross-Correlated Network for Recommendation","authors":"Hao Chen;Yuanchen Bei;Wenbing Huang;Shengyuan Chen;Feiran Huang;Xiao Huang","doi":"10.1109/TKDE.2024.3491778","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3491778","url":null,"abstract":"Collaborative filtering (CF) models have demonstrated remarkable performance in recommender systems, which represent users and items as embedding vectors. Recently, due to the powerful modeling capability of graph neural networks for user-item interaction graphs, graph-based CF models have gained increasing attention. They encode each user/item and its subgraph into a single super vector by combining graph embeddings after each graph convolution. However, each hop of the neighbor in the user-item subgraphs carries a specific semantic meaning. Encoding all subgraph information into single vectors and inferring user-item relations with dot products can weaken the semantic information between user and item subgraphs, thus leaving untapped potential. Exploiting this untapped potential provides insight into improving performance for existing recommendation models. To this end, we propose the Graph Cross-correlated Network for Recommendation (GCR), which serves as a general recommendation paradigm that explicitly considers correlations between user/item subgraphs. GCR first introduces the Plain Graph Representation (PGR) to extract information directly from each hop of neighbors into corresponding PGR vectors. Then, GCR develops Cross-Correlated Aggregation (CCA) to construct possible cross-correlated terms between PGR vectors of user/item subgraphs. Finally, GCR comprehensively incorporates the cross-correlated terms for recommendations. Experimental results show that GCR outperforms state-of-the-art models on both interaction prediction and click-through rate prediction tasks.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 2","pages":"710-723"},"PeriodicalIF":8.9,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142940719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiawei Xue;Takahiro Yabe;Kota Tsubouchi;Jianzhu Ma;Satish V. Ukkusuri
{"title":"Predicting Individual Irregular Mobility via Web Search-Driven Bipartite Graph Neural Networks","authors":"Jiawei Xue;Takahiro Yabe;Kota Tsubouchi;Jianzhu Ma;Satish V. Ukkusuri","doi":"10.1109/TKDE.2024.3487549","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3487549","url":null,"abstract":"Individual mobility prediction holds significant importance in urban computing, supporting various applications such as place recommendations. Current studies primarily focus on frequent mobility patterns including commuting trips to residential and workplaces. However, such studies do not accurately forecast irregular trips, which incorporate journeys that end at locations other than residences and workplaces. Despite their usefulness in recommendations and advertising, the stochastic, infrequent, and spontaneous nature of irregular trips makes them challenging to predict. To address the difficulty, this study proposes a web search-driven bipartite graph neural network, namely WS-BiGNN, for the individual irregular mobility prediction (IIMP) problem. Specifically, we construct bipartite graphs to represent mobility and web search records, formulating the IIMP problem as a link prediction task. First, WS-BiGNN employs user-user edges and POI-POI edges (POI: point-of-interest) to bolster information propagation within sparse bipartite graphs. Second, the temporal weighting module is created to discern the influence of past mobility and web searches on future mobility. Lastly, WS-BiGNN incorporates the search-mobility memory module, which classifies four interpretable web search-mobility patterns and harnesses them to improve prediction accuracy. We perform experiments utilizing real-world data in Tokyo from October 2019 to March 2020. The results showcase the superior performance of WS-BiGNN compared to baseline models, as supported by higher scores in Recall and NDCG. The exceptional performance and additional analysis reveal that infrequent behavior may be effectively predicted by learning search-mobility patterns at the individual level.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 2","pages":"851-864"},"PeriodicalIF":8.9,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142940769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaoxiao Ma;Fanzhen Liu;Jia Wu;Jian Yang;Shan Xue;Quan Z. Sheng
{"title":"Rethinking Unsupervised Graph Anomaly Detection With Deep Learning: Residuals and Objectives","authors":"Xiaoxiao Ma;Fanzhen Liu;Jia Wu;Jian Yang;Shan Xue;Quan Z. Sheng","doi":"10.1109/TKDE.2024.3501307","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3501307","url":null,"abstract":"Anomalies often occur in real-world information networks/graphs, such as malevolent users in online review networks and fake news in social media. When representing such structured network data as graphs, anomalies usually appear as anomalous nodes that exhibit significantly deviated structure patterns, or different attributes, or the both. To date, numerous unsupervised methods have been developed to detect anomalies based on residual analysis, which assumes that anomalies will introduce larger residual errors (i.e., graph reconstruction loss). While these existing works achieved encouraging performance, in this paper, we formally prove that their employed learning objectives, i.e., MSE and cross-entropy losses, encounter significant limitations in learning the major data distributions, particularly for anomaly detection, and through our preliminary study, we reveal that the vanilla residual analysis-based methods cannot effectively investigate the rich graph structure. Upon these discoveries, we propose a novel structure-biased graph anomaly detection framework (SALAD) to attain anomalies’ divergent patterns with the assistance of a specially designed node representation augmentation approach. We further present two effective training objectives to empower SALAD to effectively capture the major structure and attribute distributions by emphasizing less on anomalies that introduce higher reconstruction errors under the encoder-decoder framework. The detection performance on eight widely-used datasets demonstrates SALAD's superiority over twelve state-of-the-art baselines. Additional ablation and case studies validate that our data augmentation method and training objectives result in the impressive performance.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 2","pages":"881-895"},"PeriodicalIF":8.9,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142940943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Online Dynamic Hybrid Broad Learning System for Real-Time Safety Assessment of Dynamic Systems","authors":"Zeyi Liu;Xiao He","doi":"10.1109/TKDE.2024.3475028","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3475028","url":null,"abstract":"Real-time safety assessment of dynamic systems is of paramount importance in industrial processes since it provides continuous monitoring and evaluation to prevent potential harm to the environment and individuals. However, there are still several challenges to be resolved due to the requirements of time consumption and the non-stationary nature of real-world environments. In this paper, a novel online dynamic hybrid broad learning system, termed ODH-BLS, is proposed to more fully utilize the co-design advantages of active adaptation and passive adaptation. It makes effective use of limited annotations with the proposed sample value function. Simultaneously, anchor points can be dynamically adjusted to accommodate changes of the underlying distribution, thereby leveraging the value of unlabeled samples. An iterative update rule is also derived to ensure adaptation of the assessment model to real-time data at low computational costs. We also provide theoretical analyses to illustrate its practicality. Several experiments regarding the JiaoLong deep-sea manned submersible are carried out. The results demonstrate that the proposed ODH-BLS method achieves a performance improvement of approximately 8% over the baseline method on the benchmark dataset, showing its effectiveness in solving real-time safety assessment tasks for dynamic systems.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"8928-8938"},"PeriodicalIF":8.9,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142636267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qing Huang;Dianshu Liao;Zhenchang Xing;Zhiqiang Yuan;Qinghua Lu;Xiwei Xu;Jiaxing Lu
{"title":"SE Factual Knowledge in Frozen Giant Code Model: A Study on FQN and Its Retrieval","authors":"Qing Huang;Dianshu Liao;Zhenchang Xing;Zhiqiang Yuan;Qinghua Lu;Xiwei Xu;Jiaxing Lu","doi":"10.1109/TKDE.2024.3436883","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3436883","url":null,"abstract":"Giant pre-trained code models (PCMs) start coming into the developers’ daily practices. Understanding the type and amount of software knowledge in PCMs is essential for integrating PCMs into software engineering (SE) tasks and unlocking their potential. In this work, we conduct the first systematic study on the SE factual knowledge in the state-of-the-art PCM CoPilot, focusing on APIs’ Fully Qualified Names (FQNs), the fundamental knowledge for effective code analysis, search and reuse. Driven by FQNs’ data distribution properties, we design a novel lightweight in-context learning on Copilot for FQN inference, which does not require code compilation as traditional methods or gradient update by recent FQN prompt-tuning. We systematically experiment with five in-context learning design factors to identify the best configuration for practical use. With this best configuration, we investigate the impact of example prompts and FQN data properties on CoPilot's FQN inference capability. Our results confirm that CoPilot stores diverse FQN knowledge and can be applied for FQN inference due to its high accuracy and non-reliance on code analysis. Additionally, our extended study shows that the in-context learning method can be generalized to retrieve other SE factual knowledge embedded in giant PCMs. Furthermore, we find that the advanced general model GPT-4 also stores substantial SE knowledge. Comparing FQN inference between CoPilot and GPT-4, we observe that as model capabilities improve, the same prompts yield better results. Based on our experience interacting with Copilot, we discuss various opportunities to improve human-CoPilot interaction in the FQN inference task.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"9220-9234"},"PeriodicalIF":8.9,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142600273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Litian Zhang;Xiaoming Zhang;Ziyi Zhou;Xi Zhang;Senzhang Wang;Philip S. Yu;Chaozhuo Li
{"title":"Early Detection of Multimodal Fake News via Reinforced Propagation Path Generation","authors":"Litian Zhang;Xiaoming Zhang;Ziyi Zhou;Xi Zhang;Senzhang Wang;Philip S. Yu;Chaozhuo Li","doi":"10.1109/TKDE.2024.3496701","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3496701","url":null,"abstract":"Amidst the rapid propagation of multimodal fake news across social media platforms, the detection of fake news has emerged as a prime research pursuit. To detect heightened level of meticulous fabrications, propagation paths are introduced to provide nuanced social context that enhances the basic semantic analysis of the news content. However, existing propagation-enhanced models encounter a dilemma between detection efficacy and social hazard. In this paper, we explore the innovative problem of early fake news detection through the generation of propagation paths, capable of benefiting from the extensive social context within propagation paths while mitigating potential social hazards. To address these challenges, we propose a novel Reinforced Propagation Path Generation Fake News Detection model, \u0000<italic>RPPG-Fake</i>\u0000. Departing from conventional discriminative approaches, \u0000<italic>RPPG-Fake</i>\u0000 captures the propagation topology pattern from a heterogeneous social graph and generates the propagation paths to detect fake news effectively under a reinforcement learning paradigm. Our proposal is extensively evaluated over three popular datasets, and experimental results demonstrate the superiority of our proposal.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 2","pages":"613-625"},"PeriodicalIF":8.9,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142940766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}