{"title":"Camouflaged Variational Graph AutoEncoder Against Attribute Inference Attacks for Cross-Domain Recommendation","authors":"Yudi Xiong;Yongxin Guo;Weike Pan;Qiang Yang;Zhong Ming;Xiaojin Zhang;Han Yu;Tao Lin;Xiaoying Tang","doi":"10.1109/TKDE.2025.3565793","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3565793","url":null,"abstract":"Cross-domain recommendation (CDR) aims to alleviate the data sparsity problem by leveraging the benefits of modeling two domains. However, existing research often focuses on the recommendation performance while ignores the privacy leakage issue. We find that an attacker can infer user attribute information from the knowledge (e.g., user preferences) transferred between the source and target domains. For example, in our experiments, the average inference accuracies of attack models on gender and age attributes are 0.8323 and 0.3897. The best-performing attack model achieves accuracies of 0.8847 and 0.4634, exceeding a random inference by 25.10% and 64.04%. We can see that the leakage of user attribute information may significantly exceed what would be expected from random inference. In this paper, we propose a novel recommendation framework named CVGAE (short for camouflaged variational graph autoencoder), which effectively models user behaviors and mitigates the risk of user attribute information leakage at the same time. Specifically, our CVGAE combines the strengths of VAEs in capturing latent features and variability with the ability of GCNs in exploiting high-order relational information. Moreover, to ensure against attribute inference attacks without sacrificing the recommendation performance, we design a user attribute protection module that fuses user attribute-camouflaged information with knowledge transfer during cross-domain processes. We then conduct extensive experiments on three real-world datasets, and find our CVGAE is able to achieve strong privacy protection while making little sacrifices in recommendation accuracy.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 7","pages":"3916-3932"},"PeriodicalIF":8.9,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144219687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Charging-Aware Task Assignment for Urban Logistics With Electric Vehicles","authors":"Yafei Li;Yuke Pan;Guanglei Zhu;Shuo He;Mingliang Xu;Jianliang Xu","doi":"10.1109/TKDE.2025.3565858","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3565858","url":null,"abstract":"The rapid growth of e-commerce has intensified the demand for efficient urban logistics. Electric Vehicles (EVs), with their eco-friendly and high-efficiency features, have emerged as a promising solution for improving urban logistics efficiency. However, due to their limited battery capacity, EVs often require recharging during operations, and improper charging decisions may lead to delivery delays, resulting in a loss of platform revenue. In this paper, we explore a novel EV Charging-Aware Task Assignment (ECTA) problem in urban logistics scenarios, where the objective is to maximize platform revenue by ensuring timely task completion while meeting the charging needs of EVs. To address this challenge, we present e-Charge, an efficient two-stage framework that enables real-time optimization of two continuous processes: task assignment and charging decision. For task assignment, which focuses on matching tasks to suitable EVs, we construct a hybrid weight model that incorporates charging penalties to calculate matching weights for EVs in both active and charging states, thus improving task assignment quality. Additionally, we implement an effective vehicle selection strategy to expedite the matching process, ensuring the efficiency of task assignment. For charging decision, which focuses on determining when and where EVs should be charged, we propose a multi-agent reinforcement learning (MARL) approach to dynamically select the charging timing for EVs. To further enhance decision-making quality, we devise a hierarchical communication graph that enables better collaboration between EVs and facilitates adaptive charging decisions. Finally, extensive experiments demonstrate that <italic>e-Charge</i> significantly outperforms compared methods, achieving higher revenue and task completion ratio across a wide range of parameter settings.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 7","pages":"3947-3961"},"PeriodicalIF":8.9,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144219592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xin Liu;Yuxiang Zhang;Meng Wu;Mingyu Yan;Kun He;Wei Yan;Shirui Pan;Xiaochun Ye;Dongrui Fan
{"title":"Revisiting Edge Perturbation for Graph Neural Network in Graph Data Augmentation and Attack","authors":"Xin Liu;Yuxiang Zhang;Meng Wu;Mingyu Yan;Kun He;Wei Yan;Shirui Pan;Xiaochun Ye;Dongrui Fan","doi":"10.1109/TKDE.2025.3565306","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3565306","url":null,"abstract":"Edge perturbation is a basic method to modify graph structures. It can be categorized into two veins based on their effects on the performance of graph neural networks (GNNs), i.e., graph data augmentation and attack. Surprisingly, both veins of edge perturbation methods employ the same operations, yet yield opposite effects on GNNs’ accuracy. A distinct boundary between these methods in using edge perturbation has never been clearly defined. Consequently, inappropriate perturbations may lead to undesirable outcomes, necessitating precise adjustments to achieve desired effects. Therefore, questions of “why edge perturbation has a two-faced effect?” and “what makes edge perturbation flexible and effective?” still remain unanswered. In this paper, we will answer these questions by proposing a unified formulation and establishing a quantizable boundary between two categories of edge perturbation methods. Specifically, we conduct experiments to elucidate the differences and similarities between these methods and theoretically unify the workflow of these methods by casting it to one optimization problem. Then, we devise Edge Priority Detector (EPD) to generate a novel priority metric, bridging these methods up in the workflow. Experiments show that EPD can make augmentation or attack flexibly and achieve comparable or superior performance to other counterparts with less time overhead.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 7","pages":"4225-4238"},"PeriodicalIF":8.9,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144232036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Domain Adaptation via Learning Using Statistical Invariant","authors":"Chunna Li;Yiwei Song;Yuan-Hai Shao","doi":"10.1109/TKDE.2025.3565780","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3565780","url":null,"abstract":"Domain adaptation has found widespread applications in real-life scenarios, especially when the target domain has limited labeled samples. However, most of the domain adaptation models only utilize one type of knowledge from the source domain, which is usually achieved by strong mode of convergence. To fully incorporate multiple knowledge from the source domain, for binary classification, this paper studies a novel learning paradigm for Domain Adaptation via Learning Using Statistical Invariant by simultaneously combining the strong and weak modes of convergence in a Hilbert space. The strong mode of convergence undertakes the mission of learning a least squares probability output binary classification task in a general hypothesis space, while the weak mode of convergence integrates diverse knowledge by constructing meaningful statistical invariants that embody the concept of intelligence. The utilization of weak convergence shrinks the admissible set of approximation functions, and subsequently accelerates the learning process. In this paper, several statistical invariants that represent sample, feature and parameter information from the source domain are constructed. By taking an appropriate statistical invariant, DLUSI realizes some existing methods. Experimental results on synthetic data as well as the widely used Amazon Reviews and 20 News data demonstrate the superiority of the proposed method.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 7","pages":"4023-4034"},"PeriodicalIF":8.9,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144219815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TELEX: Two-Level Learned Index for Rich Queries on Enclave-Based Blockchain Systems","authors":"Haotian Wu;Yuzhe Tang;Zhaoyan Shen;Jun Tao;Chenhao Lin;Zhe Peng","doi":"10.1109/TKDE.2025.3564905","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3564905","url":null,"abstract":"Blockchain has become a popular paradigm for secure and immutable data storage. Despite its numerous applications across various fields, concerns regarding the user privacy and result integrity during data queries persist. Additionally, the need for rich query functionalities to harness the full potential of blockchain data remains an area ripe for exploration. In order to address these challenges, our paper first utilizes a framework based on the Trusted Execution Environment (TEE) and oblivious RAM technique to achieve both privacy and data integrity. To enhance the query efficiency over the entire blockchain, we then devise a two-level learned indexing methodology named TELEX within the TEE for both integer and string keys. We also propose different query processing algorithms for versatile query types, including exact queries, aggregate queries, Boolean queries, and range queries. By implementing the prototype and conducting extensive evaluation, we demonstrate the feasibility and remarkable improvement in efficiency compared to existing solutions.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 7","pages":"4299-4313"},"PeriodicalIF":8.9,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144232028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kenny Ye Liang;Yunxiang Su;Shaoxu Song;Chunping Li
{"title":"Turn Waste Into Wealth: On Efficient Clustering and Cleaning Over Dirty Data","authors":"Kenny Ye Liang;Yunxiang Su;Shaoxu Song;Chunping Li","doi":"10.1109/TKDE.2025.3564313","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3564313","url":null,"abstract":"Dirty data commonly exist. Simply discarding a large number of inaccurate points (as noises) could greatly affect clustering results. We argue that dirty data can be repaired and utilized as strong supports in clustering. To this end, we study a novel problem of clustering and repairing over dirty data at the same time. Referring to the minimum change principle in data repairing, the objective is to find a minimum modification of inaccurate points such that the large amount of dirty data can enhance clustering. We show that the problem is <sc>np</small>-hard and can be formulated as an integer linear programming (<sc>ilp</small>) problem. A constant factor approximation algorithm <sc>gdorc</small> is devised based on grid, with high efficiency. In experiments, <sc>gdorc</small> has great repairing and clustering results with low time consumption. Empirical results demonstrate that <italic>both the clustering and cleaning accuracies</i> can be improved by our approach of repairing and utilizing the dirty data in clustering.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 7","pages":"4361-4372"},"PeriodicalIF":8.9,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144232133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaobin Rui;Zhixiao Wang;Hao Peng;Wei Chen;Philip S. Yu
{"title":"A Scalable Algorithm for Fair Influence Maximization With Unbiased Estimator","authors":"Xiaobin Rui;Zhixiao Wang;Hao Peng;Wei Chen;Philip S. Yu","doi":"10.1109/TKDE.2025.3564283","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3564283","url":null,"abstract":"This paper studies the fair influence maximization problem with efficient algorithms. In particular, given a graph <inline-formula><tex-math>$G$</tex-math></inline-formula>, a community structure <inline-formula><tex-math>${mathcal {C}}$</tex-math></inline-formula> consisting of disjoint communities, and a budget <inline-formula><tex-math>$k$</tex-math></inline-formula>, the problem asks to select a seed set <inline-formula><tex-math>$S$</tex-math></inline-formula> (<inline-formula><tex-math>$|S|=k$</tex-math></inline-formula>) that maximizes the influence spread while narrowing the influence gap between different communities. This problem derives from some significant social scenarios, such as health interventions (e.g. suicide/HIV prevention) where individuals from underrepresented groups or LGBTQ communities may be disproportionately excluded from the benefits of the intervention. To depict the concept of fairness in the context of influence maximization, researchers have proposed various notions of fairness, where the welfare fairness notion that better balances fairness level and influence spread has shown promising effectiveness. However, the lack of efficient algorithms for optimizing the objective function under welfare fairness restricts its application to networks of only a few hundred nodes. In this paper, we modify the objective function of welfare fairness to maximize the exponentially weighted sum and the logarithmically weighted sum over all communities’ influenced fractions (utility). To achieve efficient algorithms with theoretical guarantees, we first introduce two unbiased estimators: one for the fractional power of the arithmetic mean and the other for the logarithm of the arithmetic mean. Then, by adapting the Reverse Influence Sampling (RIS) approach, we convert the optimization problem to a weighted maximum coverage problem. We also analyze the number of reverse reachable sets needed to approximate the fair influence at a high probability. Finally, we present an efficient algorithm that guarantees <inline-formula><tex-math>$1-1/e - varepsilon$</tex-math></inline-formula> (positive objective function) or <inline-formula><tex-math>$1+1/e + varepsilon$</tex-math></inline-formula> (negative objective function) approximation for any small <inline-formula><tex-math>$varepsilon > 0$</tex-math></inline-formula>. Experiments demonstrate that our proposed algorithm could efficiently handle large-scale networks with good performance.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 7","pages":"3881-3895"},"PeriodicalIF":8.9,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144219759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data Synthesis Reinvented: Preserving Missing Patterns for Enhanced Analysis","authors":"Xinyue Wang;Hafiz Asif;Shashank Gupta;Jaideep Vaidya","doi":"10.1109/TKDE.2025.3563319","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3563319","url":null,"abstract":"Synthetic data is being widely used as a replacement or enhancement for real data in fields as diverse as healthcare, telecommunications, and finance. Unlike real data, which represents actual people and objects, synthetic data is generated from an estimated distribution that retains key statistical properties of the real data. This makes synthetic data attractive for sharing while addressing privacy, confidentiality, and autonomy concerns. Real data often contains missing values that hold important information about individual, system, or organizational behavior. Standard synthetic data generation methods eliminate missing values as part of their pre-processing steps and thus completely ignore this valuable source of information. Instead, we propose methods to generate synthetic data that preserve both the observable and missing data distributions; consequently, retaining the valuable information encoded in the missing patterns of the real data. Our approach handles various missing data scenarios and can easily integrate with existing data generation methods. Extensive empirical evaluations on diverse datasets demonstrate the effectiveness of our approach as well as the value of preserving missing data distribution in synthetic data.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 7","pages":"3962-3975"},"PeriodicalIF":8.9,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144219817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Local Community Detection Method Based on Folded Subgraph","authors":"Mengting Zhang;Weihong Bi","doi":"10.1109/TKDE.2025.3563100","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3563100","url":null,"abstract":"Community structure refers to the “small groups” in the network. Detecting community structure in networks has significant application value. With the continuous expansion and complexity of the network, the global information of the network is often difficult to obtain. On the other hand, in some cases, we pay more attention to the local community where the given node is located. Local community detection methods detect local community structure by using local information from a given node. However, many local community detection methods encounter the problem of precision limitation. Therefore, in order to alleviate such problems, we propose the FG-based method in this paper. Based on the characteristics of complex networks, a folded subgraph method is designed to consider some similar nodes as single nodes, reducing the impact of noise in the network. Furthermore, based on the folded subgraph, FG-based method designs a three-stage local expansion strategy, in which nodes with different characteristics are added to the local community in each stage. We conduct experiments on datasets and find that the FG-based method can improve the recall and precision of local community structures.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 7","pages":"3869-3880"},"PeriodicalIF":8.9,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144219686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Self-Labeling and Self-Knowledge Distillation Unsupervised Feature Selection","authors":"Yunzhi Ling;Feiping Nie;Weizhong Yu;Xuelong Li","doi":"10.1109/TKDE.2025.3561046","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3561046","url":null,"abstract":"This paper proposes a deep pseudo-label method for unsupervised feature selection, which learns non-linear representations to generate pseudo-labels and trains a Neural Network (NN) to select informative features via self-Knowledge Distillation (KD). Specifically, the proposed method divides a standard NN into two sub-components: an encoder and a predictor, and introduces a dependency subnet. It works by self-supervised pre-training the encoder to produce informative representations and then alternating between two steps: (1) learning pseudo-labels by combining the clustering results of the encoder's outputs with the NN's prediction outputs, and (2) updating the NN's parameters by globally selecting a subset of features to predict the pseudo-labels while updating the subnet's parameters through self-KD. Self-KD is achieved by encouraging the subnet to locally capture a subset of the NN features to produce class probabilities that match those produced by the NN. This allows the model to self-absorb the learned inter-class knowledge and evaluate feature diversity, removing redundant features without sacrificing performance. Meanwhile, the potential discriminative capability of a NN can also be self-excavated without the assistance of other NNs. The two alternate steps reinforce each other: in step (2), by predicting the learned pseudo-labels and conducting self-KD, the discrimination of the outputs of both the NN and the encoder is gradually enhanced, while the self-labeling method in step (1) leverages these two improvements to further refine the pseudo-labels for step (2), resulting in the superior performance. Extensive experiments show the proposed method significantly outperforms state-of-the-art methods across various datasets.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 7","pages":"4270-4284"},"PeriodicalIF":8.9,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144232027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}