{"title":"DFL-Net: Disentangled Feature Learning Network for Multi-View Clustering","authors":"Zhe Chen;Xiao-Jun Wu;Tianyang Xu;Josef Kittler","doi":"10.1109/TKDE.2025.3574150","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3574150","url":null,"abstract":"Multi-view clustering aims at partitioning data into their underlying categories by mining shared and complementary information conveyed by different views. Although the integration of deep learning and disentanglement learning has markedly improved clustering performance, our analysis reveals two fundamental limitations in existing approaches: inadequate separation between view-shared and view-exclusive features; and the negative effects of clustering-irrelevant information on feature decoupling. To tackle these issues, we present a novel Disentangled Feature Learning Network (DFL-Net), which utilizes a progressive learning framework to systematically disentangle features. DFL-Net initially establishes view-shared representations through semantic disparity minimization, followed by the construction of orthogonal feature subspaces using cross-view and intra-view independence constraints to isolate view-specific features. Subsequently, DFL-Net enforces clustering consistency across views to adaptively eliminate irrelevant information, thus enhancing the overall effectiveness of disentanglement learning. The framework introduces two significant innovations: a comprehensive feature independence criterion that concurrently reduces intra-view and cross-view feature dependencies, and an irrelevance filtering mechanism that ensures cross-view clustering consistency. Extensive experiments on benchmark datasets demonstrate the superior performance of DFL-Net compared to state-of-the-art methods.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 8","pages":"4537-4547"},"PeriodicalIF":8.9,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144572954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Gaussian Process Latent Variable Modeling for Few-Shot Time Series Forecasting","authors":"Yunyao Cheng;Chenjuan Guo;Kaixuan Chen;Kai Zhao;Bin Yang;Jiandong Xie;Christian S. Jensen;Feiteng Huang;Kai Zheng","doi":"10.1109/TKDE.2025.3573673","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3573673","url":null,"abstract":"Accurate time series forecasting is crucial for optimizing resource allocation, industrial production, and urban management, particularly with the growth of cyber-physical and IoT systems. However, limited training sample availability in fields like physics and biology poses significant challenges. Existing models struggle to capture long-term dependencies and to model diverse meta-knowledge explicitly in few-shot scenarios. To address these issues, we propose MetaGP, a meta-learning-based Gaussian process latent variable model that uses a Gaussian process kernel function to capture long-term dependencies and to maintain strong correlations in time series. We also introduce Kernel Association Search (KAS) as a novel meta-learning component to explicitly model meta-knowledge, thereby enhancing both interpretability and prediction accuracy. We study MetaGP on simulated and real-world few-shot datasets, showing that it is capable of state-of-the-art prediction accuracy. We also find that MetaGP can capture long-term dependencies and can model meta-knowledge, thereby providing valuable insights into complex time series patterns.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 8","pages":"4604-4619"},"PeriodicalIF":8.9,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144581694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DASCE: Long-Tailed Data Augmentation Based Sparse Class-Correlation Exploitation","authors":"Mengnan Qi;Shasha Mao;Yimeng Zhang;Jing Gu;Shuiping Gou;Licheng Jiao;Yuming Zhang","doi":"10.1109/TKDE.2025.3573899","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3573899","url":null,"abstract":"The long-tailed data distribution frequently occurs in the real-world scenarios, whereas deep learning is not effective enough for such distribution. In order to improve the effectiveness for the long-tailed data, data augmentation is widely used to balance the distribution of classes by generating new samples. However, most existing studies are designed from the perspective of the class-independence assumption by default, ignoring the effect of interrelation among classes for data augmentation, which causes that some generated samples may be unrepresentative and useless for balancing the class-distribution. Inspired by this, we propose a new data augmentation method based the sparse class-correlation exploitation in this paper, which can generate more representative samples by utilizing the class-correlation, to effectively balance the class-distribution for the long-tailed data. In the proposed method, a sparse class-correlation exploration module is first proposed to explore the potential correlations among multiple classes for boosting the classification performance. Based on the class-correlations, the pivotal seed-samples are generated by maximizing the sparse representation of challenging samples. Meanwhile, an ambiguity-filtered translation module is designed to generate more representative new samples for the target classes based the obtained seed-samples by enhancing the class-consistency and suppressing the deviation from the target classes. In addition, we introduce the self-supervised feature and fuse it with the discriminative feature to explore more accurate class-correlations. Experimental results illustrate that the proposed method obtains better performance only with a small number of generated samples than the state-of-the-art methods.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 8","pages":"4497-4511"},"PeriodicalIF":8.9,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144573014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Camouflaged Variational Graph AutoEncoder Against Attribute Inference Attacks for Cross-Domain Recommendation","authors":"Yudi Xiong;Yongxin Guo;Weike Pan;Qiang Yang;Zhong Ming;Xiaojin Zhang;Han Yu;Tao Lin;Xiaoying Tang","doi":"10.1109/TKDE.2025.3565793","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3565793","url":null,"abstract":"Cross-domain recommendation (CDR) aims to alleviate the data sparsity problem by leveraging the benefits of modeling two domains. However, existing research often focuses on the recommendation performance while ignores the privacy leakage issue. We find that an attacker can infer user attribute information from the knowledge (e.g., user preferences) transferred between the source and target domains. For example, in our experiments, the average inference accuracies of attack models on gender and age attributes are 0.8323 and 0.3897. The best-performing attack model achieves accuracies of 0.8847 and 0.4634, exceeding a random inference by 25.10% and 64.04%. We can see that the leakage of user attribute information may significantly exceed what would be expected from random inference. In this paper, we propose a novel recommendation framework named CVGAE (short for camouflaged variational graph autoencoder), which effectively models user behaviors and mitigates the risk of user attribute information leakage at the same time. Specifically, our CVGAE combines the strengths of VAEs in capturing latent features and variability with the ability of GCNs in exploiting high-order relational information. Moreover, to ensure against attribute inference attacks without sacrificing the recommendation performance, we design a user attribute protection module that fuses user attribute-camouflaged information with knowledge transfer during cross-domain processes. We then conduct extensive experiments on three real-world datasets, and find our CVGAE is able to achieve strong privacy protection while making little sacrifices in recommendation accuracy.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 7","pages":"3916-3932"},"PeriodicalIF":8.9,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144219687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Charging-Aware Task Assignment for Urban Logistics With Electric Vehicles","authors":"Yafei Li;Yuke Pan;Guanglei Zhu;Shuo He;Mingliang Xu;Jianliang Xu","doi":"10.1109/TKDE.2025.3565858","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3565858","url":null,"abstract":"The rapid growth of e-commerce has intensified the demand for efficient urban logistics. Electric Vehicles (EVs), with their eco-friendly and high-efficiency features, have emerged as a promising solution for improving urban logistics efficiency. However, due to their limited battery capacity, EVs often require recharging during operations, and improper charging decisions may lead to delivery delays, resulting in a loss of platform revenue. In this paper, we explore a novel EV Charging-Aware Task Assignment (ECTA) problem in urban logistics scenarios, where the objective is to maximize platform revenue by ensuring timely task completion while meeting the charging needs of EVs. To address this challenge, we present e-Charge, an efficient two-stage framework that enables real-time optimization of two continuous processes: task assignment and charging decision. For task assignment, which focuses on matching tasks to suitable EVs, we construct a hybrid weight model that incorporates charging penalties to calculate matching weights for EVs in both active and charging states, thus improving task assignment quality. Additionally, we implement an effective vehicle selection strategy to expedite the matching process, ensuring the efficiency of task assignment. For charging decision, which focuses on determining when and where EVs should be charged, we propose a multi-agent reinforcement learning (MARL) approach to dynamically select the charging timing for EVs. To further enhance decision-making quality, we devise a hierarchical communication graph that enables better collaboration between EVs and facilitates adaptive charging decisions. Finally, extensive experiments demonstrate that <italic>e-Charge</i> significantly outperforms compared methods, achieving higher revenue and task completion ratio across a wide range of parameter settings.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 7","pages":"3947-3961"},"PeriodicalIF":8.9,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144219592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xin Liu;Yuxiang Zhang;Meng Wu;Mingyu Yan;Kun He;Wei Yan;Shirui Pan;Xiaochun Ye;Dongrui Fan
{"title":"Revisiting Edge Perturbation for Graph Neural Network in Graph Data Augmentation and Attack","authors":"Xin Liu;Yuxiang Zhang;Meng Wu;Mingyu Yan;Kun He;Wei Yan;Shirui Pan;Xiaochun Ye;Dongrui Fan","doi":"10.1109/TKDE.2025.3565306","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3565306","url":null,"abstract":"Edge perturbation is a basic method to modify graph structures. It can be categorized into two veins based on their effects on the performance of graph neural networks (GNNs), i.e., graph data augmentation and attack. Surprisingly, both veins of edge perturbation methods employ the same operations, yet yield opposite effects on GNNs’ accuracy. A distinct boundary between these methods in using edge perturbation has never been clearly defined. Consequently, inappropriate perturbations may lead to undesirable outcomes, necessitating precise adjustments to achieve desired effects. Therefore, questions of “why edge perturbation has a two-faced effect?” and “what makes edge perturbation flexible and effective?” still remain unanswered. In this paper, we will answer these questions by proposing a unified formulation and establishing a quantizable boundary between two categories of edge perturbation methods. Specifically, we conduct experiments to elucidate the differences and similarities between these methods and theoretically unify the workflow of these methods by casting it to one optimization problem. Then, we devise Edge Priority Detector (EPD) to generate a novel priority metric, bridging these methods up in the workflow. Experiments show that EPD can make augmentation or attack flexibly and achieve comparable or superior performance to other counterparts with less time overhead.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 7","pages":"4225-4238"},"PeriodicalIF":8.9,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144232036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Domain Adaptation via Learning Using Statistical Invariant","authors":"Chunna Li;Yiwei Song;Yuan-Hai Shao","doi":"10.1109/TKDE.2025.3565780","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3565780","url":null,"abstract":"Domain adaptation has found widespread applications in real-life scenarios, especially when the target domain has limited labeled samples. However, most of the domain adaptation models only utilize one type of knowledge from the source domain, which is usually achieved by strong mode of convergence. To fully incorporate multiple knowledge from the source domain, for binary classification, this paper studies a novel learning paradigm for Domain Adaptation via Learning Using Statistical Invariant by simultaneously combining the strong and weak modes of convergence in a Hilbert space. The strong mode of convergence undertakes the mission of learning a least squares probability output binary classification task in a general hypothesis space, while the weak mode of convergence integrates diverse knowledge by constructing meaningful statistical invariants that embody the concept of intelligence. The utilization of weak convergence shrinks the admissible set of approximation functions, and subsequently accelerates the learning process. In this paper, several statistical invariants that represent sample, feature and parameter information from the source domain are constructed. By taking an appropriate statistical invariant, DLUSI realizes some existing methods. Experimental results on synthetic data as well as the widely used Amazon Reviews and 20 News data demonstrate the superiority of the proposed method.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 7","pages":"4023-4034"},"PeriodicalIF":8.9,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144219815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Min Yang;Xiaoyu Li;Bin Xu;Xiushan Nie;Muming Zhao;Chengqi Zhang;Yu Zheng;Yongshun Gong
{"title":"STDA: Spatio-Temporal Deviation Alignment Learning for Cross-City Fine-Grained Urban Flow Inference","authors":"Min Yang;Xiaoyu Li;Bin Xu;Xiushan Nie;Muming Zhao;Chengqi Zhang;Yu Zheng;Yongshun Gong","doi":"10.1109/TKDE.2025.3565504","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3565504","url":null,"abstract":"Fine-grained urban flow inference (FUFI) is crucial for traffic management, as it infers high-resolution urban flow maps from coarse-grained observations. Existing FUFI methods typically focus on a single city and rely on comprehensive training with large-scale datasets to achieve precise inferences. However, data availability in developing cities may be limited, posing challenges to the development of well-performing models. To address this issue, we propose cross-city fine-grained urban flow inference, which aims to transfer spatio-temporal knowledge from data-rich cities to data-scarce areas using meta-transfer learning. This paper devises a <bold>S</b>patio-<bold>T</b>emporal <bold>D</b>eviation <bold>A</b>lignment (STDA) framework to mitigate spatio-temporal distribution deviations and urban structural deviations between multiple source cities and the target city. Furthermore, STDA presents a cross-city normalization method that adaptively combines batch and instance normalization to maintain consistency between city-variant and city-invariant features. Besides, we design an urban structure alignment module to align spatial topological differences across cities. STDA effectively reduces distribution and structural deviations among different datasets while avoiding negative transfer. Extensive experiments conducted on three real-world datasets demonstrate that STDA consistently outperforms state-of-the-art baselines.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 8","pages":"4833-4845"},"PeriodicalIF":8.9,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144573002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TELEX: Two-Level Learned Index for Rich Queries on Enclave-Based Blockchain Systems","authors":"Haotian Wu;Yuzhe Tang;Zhaoyan Shen;Jun Tao;Chenhao Lin;Zhe Peng","doi":"10.1109/TKDE.2025.3564905","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3564905","url":null,"abstract":"Blockchain has become a popular paradigm for secure and immutable data storage. Despite its numerous applications across various fields, concerns regarding the user privacy and result integrity during data queries persist. Additionally, the need for rich query functionalities to harness the full potential of blockchain data remains an area ripe for exploration. In order to address these challenges, our paper first utilizes a framework based on the Trusted Execution Environment (TEE) and oblivious RAM technique to achieve both privacy and data integrity. To enhance the query efficiency over the entire blockchain, we then devise a two-level learned indexing methodology named TELEX within the TEE for both integer and string keys. We also propose different query processing algorithms for versatile query types, including exact queries, aggregate queries, Boolean queries, and range queries. By implementing the prototype and conducting extensive evaluation, we demonstrate the feasibility and remarkable improvement in efficiency compared to existing solutions.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 7","pages":"4299-4313"},"PeriodicalIF":8.9,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144232028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kenny Ye Liang;Yunxiang Su;Shaoxu Song;Chunping Li
{"title":"Turn Waste Into Wealth: On Efficient Clustering and Cleaning Over Dirty Data","authors":"Kenny Ye Liang;Yunxiang Su;Shaoxu Song;Chunping Li","doi":"10.1109/TKDE.2025.3564313","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3564313","url":null,"abstract":"Dirty data commonly exist. Simply discarding a large number of inaccurate points (as noises) could greatly affect clustering results. We argue that dirty data can be repaired and utilized as strong supports in clustering. To this end, we study a novel problem of clustering and repairing over dirty data at the same time. Referring to the minimum change principle in data repairing, the objective is to find a minimum modification of inaccurate points such that the large amount of dirty data can enhance clustering. We show that the problem is <sc>np</small>-hard and can be formulated as an integer linear programming (<sc>ilp</small>) problem. A constant factor approximation algorithm <sc>gdorc</small> is devised based on grid, with high efficiency. In experiments, <sc>gdorc</small> has great repairing and clustering results with low time consumption. Empirical results demonstrate that <italic>both the clustering and cleaning accuracies</i> can be improved by our approach of repairing and utilizing the dirty data in clustering.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 7","pages":"4361-4372"},"PeriodicalIF":8.9,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144232133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}