{"title":"An Adaptive Entire-Space Multi-Scenario Multi-Task Transfer Learning Model for Recommendations","authors":"Qingqing Yi;Jingjing Tang;Xiangyu Zhao;Yujian Zeng;Zengchun Song;Jia Wu","doi":"10.1109/TKDE.2025.3536334","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3536334","url":null,"abstract":"Multi-scenario and multi-task recommendation systems efficiently facilitate knowledge transfer across different scenarios and tasks. However, many existing approaches inadequately incorporate personalized information across users and scenarios. Moreover, the conversion rate (CVR) task in multi-task learning often encounters challenges like sample selection bias, resulting from systematic differences between the training and inference sample spaces, and data sparsity due to infrequent clicks. To address these issues, we propose Adaptive Entire-space Multi-scenario Multi-task Transfer Learning model (AEM<inline-formula><tex-math>$^{2}$</tex-math></inline-formula>TL) with four key modules: 1) Scenario-CGC (Scenario-Customized Gate Control), 2) Task-CGC (Task-Customized Gate Control), 3) Personalized Gating Network, and 4) Entire-space Supervised Multi-Task Module. AEM<inline-formula><tex-math>$^{2}$</tex-math></inline-formula>TL employs a multi-gate mechanism to effectively integrate shared and specific information across scenarios and tasks, enhancing prediction adaptability. To further improve task-specific personalization, it incorporates personalized prior features and applies a gating mechanism that dynamically scales the top-layer neural units. A novel post-impression behavior decomposition technique is designed to leverage all impression samples across the entire space, mitigating sample selection bias and data sparsity. Furthermore, an adaptive weighting mechanism dynamically allocates attention to tasks based on their relative importance, ensuring optimal task prioritization. Extensive experiments on one industrial and two real-world public datasets indicate the superiority of AEM<inline-formula><tex-math>$^{2}$</tex-math></inline-formula>TL over state-of-the-art methods.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 4","pages":"1585-1598"},"PeriodicalIF":8.9,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143570836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-Label Feature Selection With Missing Features via Implicit Label Replenishment and Positive Correlation Feature Recovery","authors":"Jianhua Dai;Wenxiang Chen;Yuhua Qian","doi":"10.1109/TKDE.2025.3536080","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3536080","url":null,"abstract":"Multi-label feature selection can effectively solve the curse of dimensionality problem in multi-label learning. Existing multi-label feature selection methods mostly handle multi-label data without missing features. However, in practical applications, multi-label data with missing features exist widely, and most existing multi-label feature selection methods are not directly applicable. Therefore, we propose a feature selection method for multi-label data with missing features. First, we propose a method to extract implicit label information from the feature space to replenish the binary label information. Second, we learn the positive correlation between features to construct a feature correlation recovery matrix to recover missing features. Finally, we design a sparse model-based multi-label feature selection method for processing multi-label data with missing features and prove the convergence of this method. Comparative experiments with existing feature selection methods demonstrate the effectiveness of our method.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 4","pages":"2042-2055"},"PeriodicalIF":8.9,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143570839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junjian Shi;Ye Han;Xiaojie Guo;Zekun Fei;Zheli Liu;Siyi Lv;Tong Li;Xiaotao Liu
{"title":"SMPCache: Towards More Efficient SQL Queries in Multi-Party Collaborative Data Analysis","authors":"Junjian Shi;Ye Han;Xiaojie Guo;Zekun Fei;Zheli Liu;Siyi Lv;Tong Li;Xiaotao Liu","doi":"10.1109/TKDE.2025.3535944","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3535944","url":null,"abstract":"Privacy-preserving collaborative data analysis is a popular research direction in recent years. Among all such analysis tasks, privacy-preserving SQL queries on multi-party databases are of particular industrial interest. Although the privacy concern can be addressed by many cryptographic tools, such as secure multi-party computation (MPC), the efficiency of executing such SQL queries is far from satisfactory, especially for high-volume databases. In particular, existing MPC-based solutions treat each SQL query as an isolated task and launch it from scratch, in spite of the nature that many SQL queries are done regularly and somewhat overlap in their functionalities. In this work, we are motivated to exploit this nature to improve the efficiency of MPC-based, privacy-preserving SQL queries. We introduce a cache-like optimization mechanism. To ensure a higher cache hit rate and reduce redundant MPC operators, we present a cache structure different from that of plain databases and design a set of cache strategies. Our optimization mechanism, SMPCache, can be built upon secret-sharing-based MPC frameworks, which attract much attention from the industry. To demonstrate the utility of SMPCache, we implement it on Rosetta, an open-source MPC library, and use real-world datasets to launch extensive experiments on some basic SQL operators (e.g., Filter, Order-by, Aggregation, and Inner-Join) and some representative composite SQL queries. To give a data point, we note that SMPCache can achieve most up to 3536× efficiency improvement on the TPC-DS dataset and 562× on the TPC-H dataset at a moderate storage cost. We also apply SMPCache to the basic SQL operators (Filter, Order-by, Group-by, Aggregation, and Inner-join) of the Secrecy framework, achieving up to 127.3× efficiency improvement.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 4","pages":"2111-2125"},"PeriodicalIF":8.9,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143570867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Latent and Changing Dynamics in Real Non-Stationary Environments","authors":"Zihe Liu;Jie Lu;Junyu Xuan;Guangquan Zhang","doi":"10.1109/TKDE.2025.3535961","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3535961","url":null,"abstract":"Model-based reinforcement learning (RL) aims to learn the underlying dynamics of a given environment. The success of most existing works is built on the critical assumption that the dynamic is fixed, which is unrealistic in many open-world scenarios, such as drone delivery and online chatting, where agents may need to deal with environments with unpredictable changing dynamics (hereafter, <italic>real non-stationary environment</i>). Therefore, learning changing dynamics in a real non-stationary environment offers both significant benefits and challenges. This paper proposes a new model-based reinforcement learning algorithm that proactively and dynamically detects possible changes and Learns these Latent and Changing Dynamics (LLCD) in a latent Markovian space for real non-stationary environments. To ensure the Markovian property of the RL model and improve computational efficiency, we employ a latent space model to learn the environment’s transition dynamics. Furthermore, we perform online change detection in the latent space to promptly identify change points in non-stationary environments. Then, we utilize the detected information to help the agent adapt to new conditions. Experiments indicate that the rewards of the proposed algorithm accumulate for the most rapid adaptions to environmental change, among other benefits. This work has a strong potential to enhance environmentally suitable model-based reinforcement learning capabilities.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 4","pages":"1930-1942"},"PeriodicalIF":8.9,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143570796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Graph Condensation: A Survey","authors":"Xinyi Gao;Junliang Yu;Tong Chen;Guanhua Ye;Wentao Zhang;Hongzhi Yin","doi":"10.1109/TKDE.2025.3535877","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3535877","url":null,"abstract":"The rapid growth of graph data poses significant challenges in storage, transmission, and particularly the training of graph neural networks (GNNs). To address these challenges, graph condensation (GC) has emerged as an innovative solution. GC focuses on synthesizing a compact yet highly representative graph, enabling GNNs trained on it to achieve performance comparable to those trained on the original large graph. The notable efficacy of GC and its broad prospects have garnered significant attention and spurred extensive research. This survey paper provides an up-to-date and systematic overview of GC, organizing existing research into five categories aligned with critical GC evaluation criteria: effectiveness, generalization, efficiency, fairness, and robustness. To facilitate an in-depth and comprehensive understanding of GC, this paper examines various methods under each category and thoroughly discusses two essential components within GC: optimization strategies and condensed graph generation. We also empirically compare and analyze representative GC methods with diverse optimization strategies based on the five proposed GC evaluation criteria. Finally, we explore the applications of GC in various fields, outline the related open-source libraries, and highlight the present challenges and novel insights, with the aim of promoting advancements in future research.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 4","pages":"1819-1837"},"PeriodicalIF":8.9,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143570794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Accurate Truth Discovery With Privacy-Preserving Over Crowdsourced Data Streams","authors":"Zhimao Gong;Zhibang Yang;Shenghong Yang;Siyang Yu;Kenli Li;Mingxing Duan","doi":"10.1109/TKDE.2025.3536180","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3536180","url":null,"abstract":"Truth discovery endeavors to extract valuable information from multi-source data through weighted aggregation. Some studies have integrated differential privacy techniques into traditional truth discovery algorithms to protect data privacy. However, due to the neglect of outliers and limitations in budget allocation, these schemes still need improvement in the accuracy of discovery results. To solve these challenges, we propose a privacy-preserving scheme called PriPTD to achieve secure and accurate truth discovery services over crowdsourced data streams. Instead of assuming that worker weights are always stable between two neighboring timestamps, we delve deeper to consider outliers where worker weights change rapidly. Accordingly, we develop an outlier-aware weight estimation method with a time series model to capture and handle these outliers. Furthermore, to ensure data utility under a limited budget, we devise a weight-aware budget allocation algorithm. Its core idea is that timestamps with higher importance consume a larger proportion of the remaining budget. Additionally, we design a noise-aware error adjustment approach to mitigate the adverse effects of introduced noise on accuracy. Theoretical analysis and extensive experiments validate our scheme. Final comparative experiments against existing works confirm that our scheme achieves more accurate truth discovery while preserving privacy.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 4","pages":"2155-2168"},"PeriodicalIF":8.9,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143570837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Intent-Guided Heterogeneous Graph Contrastive Learning for Recommendation","authors":"Lei Sang;Yu Wang;Yi Zhang;Yiwen Zhang;Xindong Wu","doi":"10.1109/TKDE.2025.3536096","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3536096","url":null,"abstract":"Contrastive Learning (CL)-based recommender systems have gained prominence in the context of Heterogeneous Graph (HG) due to their capacity to enhance the consistency of representations across different views. However, existing frameworks often neglect the fact that user-item interactions within HG are governed by diverse latent intents (e.g., brand preferences or demographic characteristics of item audiences), which are pivotal in capturing fine-grained relations. The exploration of these underlying intents, particularly through the lens of meta-paths in HGs, presents us with two principal challenges: i) How to integrate CL with intents; ii) How to mitigate noise from meta-path-driven intents. To address these challenges, we propose an innovative framework termed <italic>Intent-guided Heterogeneous Graph Contrastive Learning</i> (IHGCL), which designed to enhance CL-based recommendation by capturing the intents contained within meta-paths. Specifically, the IHGCL framework includes: i) a meta-path-based Dual Contrastive Learning (DCL) approach to effectively integrate intents into the recommendation, constructing intent-intent contrast and intent-interaction contrast; ii) a Bottlenecked AutoEncoder (BAE) that combines mask propagation with the information bottleneck principle to significantly reduce noise perturbations introduced by meta-paths. Empirical evaluations conducted across six distinct datasets demonstrate the superior performance of our IHGCL framework relative to conventional baseline methods.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 4","pages":"1915-1929"},"PeriodicalIF":8.9,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143570798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jie Liu;Chunhai Zhang;Zhicheng He;Wenzheng Zhang;Na Li
{"title":"Network-to-Network: Self-Supervised Network Representation Learning via Position Prediction","authors":"Jie Liu;Chunhai Zhang;Zhicheng He;Wenzheng Zhang;Na Li","doi":"10.1109/TKDE.2024.3493391","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3493391","url":null,"abstract":"Network Representation Learning (NRL) has achieved remarkable success in learning low-dimensional representations for network nodes. However, most NRL methods, including Graph Neural Networks (GNNs) and their variants, face critical challenges. First, labeled network data, which are required for training most GNNs, are expensive to obtain. Second, existing methods are sub-optimal in preserving comprehensive topological information, including structural and positional information. Finally, most GNN approaches ignore the rich node content information. To address these challenges, we propose a self-supervised Network-to-Network framework (Net2Net) to learn semantically meaningful node representations. Our framework employs a pretext task of node position prediction (PosPredict) to effectively fuse the topological and content knowledge into low-dimensional embeddings for every node in a semi-supervised manner. Specifically, we regard a network as node content and position networks, where Net2Net aims to learn the mapping between them. We utilize a multi-layer recursively composable encoder to integrate the content and topological knowledge into the egocentric network node embeddings. Furthermore, we design a cross-modal decoder to map the egocentric node embeddings into their node position identities (PosIDs) in the node position network. Extensive experiments on eight diverse networks demonstrate the superiority of Net2Net over comparable methods.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 3","pages":"1354-1365"},"PeriodicalIF":8.9,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Web-FTP: A Feature Transferring-Based Pre-Trained Model for Web Attack Detection","authors":"Zhenyu Guo;Qinghua Shang;Xin Li;Chengyi Li;Zijian Zhang;Zhuo Zhang;Jingjing Hu;Jincheng An;Chuanming Huang;Yang Chen;Yuguang Cai","doi":"10.1109/TKDE.2024.3512793","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3512793","url":null,"abstract":"Web attack is a major threat to cyberspace security, so web attack detection models have become a critical task. Traditional supervised learning methods learn features of web attacks with large amounts of high-confidence labeled data, which are extremely expensive in the real world. Pre-trained models offer a novel solution with their ability to learn generic features on large unlabeled datasets. However, designing and deploying a pre-trained model for real-world web attack detection remains challenges. In this paper, we present a pre-trained model for web attack detection, including a pre-processing module, a pre-training module, and a deployment scheme. Our model significantly improves classification performance on several web attack detection datasets. Moreover, we deploy the model in real-world systems and show its potential for industrial applications.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 3","pages":"1495-1507"},"PeriodicalIF":8.9,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"2024 Reviewers List","authors":"","doi":"10.1109/TKDE.2025.3527173","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3527173","url":null,"abstract":"","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 3","pages":"1018-1029"},"PeriodicalIF":8.9,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10855178","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}