{"title":"Towards Accurate Truth Discovery With Privacy-Preserving Over Crowdsourced Data Streams","authors":"Zhimao Gong;Zhibang Yang;Shenghong Yang;Siyang Yu;Kenli Li;Mingxing Duan","doi":"10.1109/TKDE.2025.3536180","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3536180","url":null,"abstract":"Truth discovery endeavors to extract valuable information from multi-source data through weighted aggregation. Some studies have integrated differential privacy techniques into traditional truth discovery algorithms to protect data privacy. However, due to the neglect of outliers and limitations in budget allocation, these schemes still need improvement in the accuracy of discovery results. To solve these challenges, we propose a privacy-preserving scheme called PriPTD to achieve secure and accurate truth discovery services over crowdsourced data streams. Instead of assuming that worker weights are always stable between two neighboring timestamps, we delve deeper to consider outliers where worker weights change rapidly. Accordingly, we develop an outlier-aware weight estimation method with a time series model to capture and handle these outliers. Furthermore, to ensure data utility under a limited budget, we devise a weight-aware budget allocation algorithm. Its core idea is that timestamps with higher importance consume a larger proportion of the remaining budget. Additionally, we design a noise-aware error adjustment approach to mitigate the adverse effects of introduced noise on accuracy. Theoretical analysis and extensive experiments validate our scheme. Final comparative experiments against existing works confirm that our scheme achieves more accurate truth discovery while preserving privacy.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 4","pages":"2155-2168"},"PeriodicalIF":8.9,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143570837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Intent-Guided Heterogeneous Graph Contrastive Learning for Recommendation","authors":"Lei Sang;Yu Wang;Yi Zhang;Yiwen Zhang;Xindong Wu","doi":"10.1109/TKDE.2025.3536096","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3536096","url":null,"abstract":"Contrastive Learning (CL)-based recommender systems have gained prominence in the context of Heterogeneous Graph (HG) due to their capacity to enhance the consistency of representations across different views. However, existing frameworks often neglect the fact that user-item interactions within HG are governed by diverse latent intents (e.g., brand preferences or demographic characteristics of item audiences), which are pivotal in capturing fine-grained relations. The exploration of these underlying intents, particularly through the lens of meta-paths in HGs, presents us with two principal challenges: i) How to integrate CL with intents; ii) How to mitigate noise from meta-path-driven intents. To address these challenges, we propose an innovative framework termed <italic>Intent-guided Heterogeneous Graph Contrastive Learning</i> (IHGCL), which designed to enhance CL-based recommendation by capturing the intents contained within meta-paths. Specifically, the IHGCL framework includes: i) a meta-path-based Dual Contrastive Learning (DCL) approach to effectively integrate intents into the recommendation, constructing intent-intent contrast and intent-interaction contrast; ii) a Bottlenecked AutoEncoder (BAE) that combines mask propagation with the information bottleneck principle to significantly reduce noise perturbations introduced by meta-paths. Empirical evaluations conducted across six distinct datasets demonstrate the superior performance of our IHGCL framework relative to conventional baseline methods.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 4","pages":"1915-1929"},"PeriodicalIF":8.9,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143570798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jie Liu;Chunhai Zhang;Zhicheng He;Wenzheng Zhang;Na Li
{"title":"Network-to-Network: Self-Supervised Network Representation Learning via Position Prediction","authors":"Jie Liu;Chunhai Zhang;Zhicheng He;Wenzheng Zhang;Na Li","doi":"10.1109/TKDE.2024.3493391","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3493391","url":null,"abstract":"Network Representation Learning (NRL) has achieved remarkable success in learning low-dimensional representations for network nodes. However, most NRL methods, including Graph Neural Networks (GNNs) and their variants, face critical challenges. First, labeled network data, which are required for training most GNNs, are expensive to obtain. Second, existing methods are sub-optimal in preserving comprehensive topological information, including structural and positional information. Finally, most GNN approaches ignore the rich node content information. To address these challenges, we propose a self-supervised Network-to-Network framework (Net2Net) to learn semantically meaningful node representations. Our framework employs a pretext task of node position prediction (PosPredict) to effectively fuse the topological and content knowledge into low-dimensional embeddings for every node in a semi-supervised manner. Specifically, we regard a network as node content and position networks, where Net2Net aims to learn the mapping between them. We utilize a multi-layer recursively composable encoder to integrate the content and topological knowledge into the egocentric network node embeddings. Furthermore, we design a cross-modal decoder to map the egocentric node embeddings into their node position identities (PosIDs) in the node position network. Extensive experiments on eight diverse networks demonstrate the superiority of Net2Net over comparable methods.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 3","pages":"1354-1365"},"PeriodicalIF":8.9,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Web-FTP: A Feature Transferring-Based Pre-Trained Model for Web Attack Detection","authors":"Zhenyu Guo;Qinghua Shang;Xin Li;Chengyi Li;Zijian Zhang;Zhuo Zhang;Jingjing Hu;Jincheng An;Chuanming Huang;Yang Chen;Yuguang Cai","doi":"10.1109/TKDE.2024.3512793","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3512793","url":null,"abstract":"Web attack is a major threat to cyberspace security, so web attack detection models have become a critical task. Traditional supervised learning methods learn features of web attacks with large amounts of high-confidence labeled data, which are extremely expensive in the real world. Pre-trained models offer a novel solution with their ability to learn generic features on large unlabeled datasets. However, designing and deploying a pre-trained model for real-world web attack detection remains challenges. In this paper, we present a pre-trained model for web attack detection, including a pre-processing module, a pre-training module, and a deployment scheme. Our model significantly improves classification performance on several web attack detection datasets. Moreover, we deploy the model in real-world systems and show its potential for industrial applications.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 3","pages":"1495-1507"},"PeriodicalIF":8.9,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"2024 Reviewers List","authors":"","doi":"10.1109/TKDE.2025.3527173","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3527173","url":null,"abstract":"","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 3","pages":"1018-1029"},"PeriodicalIF":8.9,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10855178","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shuai Wang;Hai Wang;Li Lin;Xiaohui Zhao;Tian He;Dian Shen;Wei Xi
{"title":"HPST-GT: Full-Link Delivery Time Estimation Via Heterogeneous Periodic Spatial-Temporal Graph Transformer","authors":"Shuai Wang;Hai Wang;Li Lin;Xiaohui Zhao;Tian He;Dian Shen;Wei Xi","doi":"10.1109/TKDE.2025.3533610","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3533610","url":null,"abstract":"A warehouse-distribution integration (WDI) e-commerce platform is an approach that combines warehousing and distribution processes, which is increasingly adopted in industry to enhance business efficiency. In the WDI e-commerce, one of the most important problems is to estimate the full-link delivery time for decision-making. Traditional methods designed for separate warehouse-distribution models struggle to address challenges in integrated systems. The difficulties stem from two main factors: (i) the contextual influence exerted by neighboring units within heterogeneous delivery networks, and (ii) the uncertainty in delivery times caused by dynamic and periodic temporal factors such as fluctuations in online sales volumes and the varying characteristics of different delivery units (e.g., warehouses and sorting centers). To address these challenges, we propose a novel full-link delivery time estimation framework called <bold>H</b>eterogeneous <bold>P</b>eriodic <bold>S</b>patial-<bold>T</b>emporal <bold>G</b>raph <bold>T</b>ransformer (<bold>HPST-GT</b>). First, we develop heterogeneous graph transformers to capture the hierarchical and diverse information of the warehouse-distribution network. Next, we design spatial-temporal transformers based on heterogeneous features to analyze the correlation between spatial and temporal information. Finally, we create a heterogeneous spatial-temporal graph prediction module to estimate full-link delivery time. Our method, evaluated on a one-month dataset from a leading e-commerce platform, surpasses current benchmarks across multiple performance metrics.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 4","pages":"1885-1901"},"PeriodicalIF":8.9,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143570825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive Learning in Imbalanced Data Streams With Unpredictable Feature Evolution","authors":"Jiahang Tu;Xijia Tang;Shilin Gu;Yucong Dai;Ruidong Fan;Chenping Hou","doi":"10.1109/TKDE.2025.3531431","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3531431","url":null,"abstract":"Learning from data streams collected sequentially over time are widely spread in real-world applications. Previous methods typically assume that the data stream has a feature space with a fixed or clearly defined evolution pattern, as well as a balanced class distribution. However, in many practical scenarios, such as environmental monitoring systems, the frequency of anomalous events is significantly imbalanced compared to normal ones and the feature space dynamically changes due to ecological evolution and sensor lifespan. To alleviate this important but rarely studied problem, we propose the Adaptive Learning in Imbalace data streams with Unpredictable feature evolution (ALIU) algorithm. As data streams with imbalanced class distribution arrive, ALIU first mitigates the model's bias for the majority class by reweighting the adaptive gradient descent magnitudes between different classes. Then, a new loss function is proposed that simultaneously focuses on misclassifications and maintains model robustness. Further, when imbalanced data streams arrive with feature evolutions, we reuse the previously learned model and update the incomplete and augmented features by adopting the adaptive gradient strategy and ensemble method, respectively. Finally, we utilize the projected technique to build a sparse yet efficient model. Based on a few common and mild assumptions, we theoretically analyze that the ALIU satisfies a sub-linear regret bound under both convex and strong convex loss functions and the performance of model can be improved with the assistance of old features. Besides, extensive experimental results further demonstrate the effectiveness of our proposed algorithm.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 4","pages":"1527-1541"},"PeriodicalIF":8.9,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143570799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Few-Shot Causal Representation Learning for Out-of-Distribution Generalization on Heterogeneous Graphs","authors":"Pengfei Ding;Yan Wang;Guanfeng Liu;Nan Wang;Xiaofang Zhou","doi":"10.1109/TKDE.2025.3531469","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3531469","url":null,"abstract":"To address the issue of label sparsity in heterogeneous graphs (HGs), heterogeneous graph few-shot learning (HGFL) has recently emerged. HGFL aims to extract meta-knowledge from source HGs with rich-labeled data and transfers it to a target HG, facilitating learning new classes with few-labeled training data and improving predictions on unlabeled testing data. Existing methods typically assume the same distribution across the source HG, training data, and testing data. However, in practice, distribution shifts in HGFL are inevitable due to (1) the scarcity of source HGs that match the target HG's distribution, and (2) the unpredictable data generation mechanism of the target HG. Such distribution shifts can degrade the performance of existing methods, leading to a novel problem of out-of-distribution (OOD) generalization in HGFL. To address this challenging problem, we propose COHF, a <underline>C</u>ausal <underline>O</u>OD <underline>H</u>eterogeneous graph <underline>F</u>ew-shot learning model. In COHF, we first adopt a bottom-up data generative perspective to identify the invariance principle for OOD generalization. Then, based on this principle, we design a novel variational autoencoder-based heterogeneous graph neural network (VAE-HGNN) to mitigate the impact of distribution shifts. Finally, we propose a novel meta-learning framework that incorporates VAE-HGNN to effectively transfer meta-knowledge in OOD environments. Extensive experiments on seven real-world datasets have demonstrated the superior performance of COHF over the state-of-the-art methods.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 4","pages":"1804-1818"},"PeriodicalIF":8.9,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143570795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing Hyperedge Prediction With Context-Aware Self-Supervised Learning","authors":"Yunyong Ko;Hanghang Tong;Sang-Wook Kim","doi":"10.1109/TKDE.2025.3532263","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3532263","url":null,"abstract":"Hypergraphs can naturally model <i>group-wise relations</i> (e.g., a group of users who co-purchase an item) as <i>hyperedges</i>. <i>Hyperedge prediction</i> is to predict future or unobserved hyperedges, which is a fundamental task in many real-world applications (e.g., group recommendation). Despite the recent breakthrough of hyperedge prediction methods, the following challenges have been rarely studied: (<b>C1</b>) <i>How to aggregate the nodes in each hyperedge candidate for accurate hyperedge prediction?</i> and (<b>C2</b>) <i>How to mitigate the inherent data sparsity problem in hyperedge prediction?</i> To tackle both challenges together, in this paper, we propose a novel hyperedge prediction framework (<b><inline-formula><tex-math>$mathsf{CASH}$</tex-math><alternatives><mml:math><mml:mi>CASH</mml:mi></mml:math><inline-graphic></alternatives></inline-formula></b>) that employs (1) <i>context-aware node aggregation</i> to precisely capture complex relations among nodes in each hyperedge for (C1) and (2) <i>self-supervised contrastive learning</i> in the context of hyperedge prediction to enhance hypergraph representations for (C2). Furthermore, as for (C2), we propose a <i>hyperedge-aware augmentation</i> method to fully exploit the latent semantics behind the original hypergraph and consider both node-level and group-level contrasts (i.e., <i>dual contrasts</i>) for better node and hyperedge representations. Extensive experiments on six real-world hypergraphs reveal that <inline-formula><tex-math>$mathsf{CASH}$</tex-math></inline-formula> consistently outperforms all competing methods in terms of the accuracy in hyperedge prediction and each of the proposed strategies is effective in improving the model accuracy of <inline-formula><tex-math>$mathsf{CASH}$</tex-math></inline-formula>.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 4","pages":"1772-1784"},"PeriodicalIF":8.9,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143570614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HyCubE: Efficient Knowledge Hypergraph 3D Circular Convolutional Embedding","authors":"Zhao Li;Xin Wang;Jun Zhao;Wenbin Guo;Jianxin Li","doi":"10.1109/TKDE.2025.3531372","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3531372","url":null,"abstract":"Knowledge hypergraph embedding models are usually computationally expensive due to the inherent complex semantic information. However, existing works mainly focus on improving the effectiveness of knowledge hypergraph embedding, making the model architecture more complex and redundant. It is desirable and challenging for knowledge hypergraph embedding to reach a trade-off between model effectiveness and efficiency. In this paper, we propose an end-to-end efficient knowledge hypergraph embedding model, HyCubE, which designs a novel <i>3D circular convolutional neural network</i> and the <i>alternate mask stack</i> strategy to enhance the interaction and extraction of feature information comprehensively. Furthermore, our proposed model achieves a better trade-off between effectiveness and efficiency by adaptively adjusting the 3D circular convolutional layer structure to handle <inline-formula><tex-math>$n$</tex-math></inline-formula>-ary knowledge tuples of different arities with fewer parameters. In addition, we use a knowledge hypergraph 1-N multilinear scoring way to accelerate the model training efficiency further. Finally, extensive experimental results on all datasets demonstrate that our proposed model consistently outperforms state-of-the-art baselines, with an average improvement of 8.22% and a maximum improvement of 33.82% across all metrics. Meanwhile, HyCubE is 6.12x faster, GPU memory usage is 52.67% lower, and the number of parameters is reduced by 85.21% compared with the average metric of the latest state-of-the-art baselines.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 4","pages":"1902-1914"},"PeriodicalIF":8.9,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143570808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}