IEEE Transactions on Knowledge and Data Engineering最新文献_第6页

Domain Adaptation via Learning Using Statistical Invariant 基于统计不变量的领域自适应学习

IF 8.9 2区计算机科学

IEEE Transactions on Knowledge and Data Engineering Pub Date : 2025-04-29 DOI: 10.1109/TKDE.2025.3565780

Chunna Li;Yiwei Song;Yuan-Hai Shao

{"title":"Domain Adaptation via Learning Using Statistical Invariant","authors":"Chunna Li;Yiwei Song;Yuan-Hai Shao","doi":"10.1109/TKDE.2025.3565780","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3565780","url":null,"abstract":"Domain adaptation has found widespread applications in real-life scenarios, especially when the target domain has limited labeled samples. However, most of the domain adaptation models only utilize one type of knowledge from the source domain, which is usually achieved by strong mode of convergence. To fully incorporate multiple knowledge from the source domain, for binary classification, this paper studies a novel learning paradigm for Domain Adaptation via Learning Using Statistical Invariant by simultaneously combining the strong and weak modes of convergence in a Hilbert space. The strong mode of convergence undertakes the mission of learning a least squares probability output binary classification task in a general hypothesis space, while the weak mode of convergence integrates diverse knowledge by constructing meaningful statistical invariants that embody the concept of intelligence. The utilization of weak convergence shrinks the admissible set of approximation functions, and subsequently accelerates the learning process. In this paper, several statistical invariants that represent sample, feature and parameter information from the source domain are constructed. By taking an appropriate statistical invariant, DLUSI realizes some existing methods. Experimental results on synthetic data as well as the widely used Amazon Reviews and 20 News data demonstrate the superiority of the proposed method.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 7","pages":"4023-4034"},"PeriodicalIF":8.9,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144219815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

STDA: Spatio-Temporal Deviation Alignment Learning for Cross-City Fine-Grained Urban Flow Inference 跨城市细粒度城市流推理的时空偏差对齐学习

IF 8.9 2区计算机科学

IEEE Transactions on Knowledge and Data Engineering Pub Date : 2025-04-29 DOI: 10.1109/TKDE.2025.3565504

Min Yang;Xiaoyu Li;Bin Xu;Xiushan Nie;Muming Zhao;Chengqi Zhang;Yu Zheng;Yongshun Gong

{"title":"STDA: Spatio-Temporal Deviation Alignment Learning for Cross-City Fine-Grained Urban Flow Inference","authors":"Min Yang;Xiaoyu Li;Bin Xu;Xiushan Nie;Muming Zhao;Chengqi Zhang;Yu Zheng;Yongshun Gong","doi":"10.1109/TKDE.2025.3565504","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3565504","url":null,"abstract":"Fine-grained urban flow inference (FUFI) is crucial for traffic management, as it infers high-resolution urban flow maps from coarse-grained observations. Existing FUFI methods typically focus on a single city and rely on comprehensive training with large-scale datasets to achieve precise inferences. However, data availability in developing cities may be limited, posing challenges to the development of well-performing models. To address this issue, we propose cross-city fine-grained urban flow inference, which aims to transfer spatio-temporal knowledge from data-rich cities to data-scarce areas using meta-transfer learning. This paper devises a <bold>S</b>patio-<bold>T</b>emporal <bold>D</b>eviation <bold>A</b>lignment (STDA) framework to mitigate spatio-temporal distribution deviations and urban structural deviations between multiple source cities and the target city. Furthermore, STDA presents a cross-city normalization method that adaptively combines batch and instance normalization to maintain consistency between city-variant and city-invariant features. Besides, we design an urban structure alignment module to align spatial topological differences across cities. STDA effectively reduces distribution and structural deviations among different datasets while avoiding negative transfer. Extensive experiments conducted on three real-world datasets demonstrate that STDA consistently outperforms state-of-the-art baselines.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 8","pages":"4833-4845"},"PeriodicalIF":8.9,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144573002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TELEX: Two-Level Learned Index for Rich Queries on Enclave-Based Blockchain Systems 基于enclave的区块链系统中富查询的两级学习索引

IF 8.9 2区计算机科学

IEEE Transactions on Knowledge and Data Engineering Pub Date : 2025-04-28 DOI: 10.1109/TKDE.2025.3564905

Haotian Wu;Yuzhe Tang;Zhaoyan Shen;Jun Tao;Chenhao Lin;Zhe Peng

引用次数: 0

Turn Waste Into Wealth: On Efficient Clustering and Cleaning Over Dirty Data 变废为宝：关于对脏数据的有效聚类和清理

IF 8.9 2区计算机科学

IEEE Transactions on Knowledge and Data Engineering Pub Date : 2025-04-28 DOI: 10.1109/TKDE.2025.3564313

Kenny Ye Liang;Yunxiang Su;Shaoxu Song;Chunping Li

引用次数: 0

A Scalable Algorithm for Fair Influence Maximization With Unbiased Estimator 一种具有无偏估计量的可伸缩公平影响最大化算法

IF 8.9 2区计算机科学

IEEE Transactions on Knowledge and Data Engineering Pub Date : 2025-04-24 DOI: 10.1109/TKDE.2025.3564283

Xiaobin Rui;Zhixiao Wang;Hao Peng;Wei Chen;Philip S. Yu

{"title":"A Scalable Algorithm for Fair Influence Maximization With Unbiased Estimator","authors":"Xiaobin Rui;Zhixiao Wang;Hao Peng;Wei Chen;Philip S. Yu","doi":"10.1109/TKDE.2025.3564283","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3564283","url":null,"abstract":"This paper studies the fair influence maximization problem with efficient algorithms. In particular, given a graph <inline-formula><tex-math>$G$</tex-math></inline-formula>, a community structure <inline-formula><tex-math>${mathcal {C}}$</tex-math></inline-formula> consisting of disjoint communities, and a budget <inline-formula><tex-math>$k$</tex-math></inline-formula>, the problem asks to select a seed set <inline-formula><tex-math>$S$</tex-math></inline-formula> (<inline-formula><tex-math>$|S|=k$</tex-math></inline-formula>) that maximizes the influence spread while narrowing the influence gap between different communities. This problem derives from some significant social scenarios, such as health interventions (e.g. suicide/HIV prevention) where individuals from underrepresented groups or LGBTQ communities may be disproportionately excluded from the benefits of the intervention. To depict the concept of fairness in the context of influence maximization, researchers have proposed various notions of fairness, where the welfare fairness notion that better balances fairness level and influence spread has shown promising effectiveness. However, the lack of efficient algorithms for optimizing the objective function under welfare fairness restricts its application to networks of only a few hundred nodes. In this paper, we modify the objective function of welfare fairness to maximize the exponentially weighted sum and the logarithmically weighted sum over all communities’ influenced fractions (utility). To achieve efficient algorithms with theoretical guarantees, we first introduce two unbiased estimators: one for the fractional power of the arithmetic mean and the other for the logarithm of the arithmetic mean. Then, by adapting the Reverse Influence Sampling (RIS) approach, we convert the optimization problem to a weighted maximum coverage problem. We also analyze the number of reverse reachable sets needed to approximate the fair influence at a high probability. Finally, we present an efficient algorithm that guarantees <inline-formula><tex-math>$1-1/e - varepsilon$</tex-math></inline-formula> (positive objective function) or <inline-formula><tex-math>$1+1/e + varepsilon$</tex-math></inline-formula> (negative objective function) approximation for any small <inline-formula><tex-math>$varepsilon > 0$</tex-math></inline-formula>. Experiments demonstrate that our proposed algorithm could efficiently handle large-scale networks with good performance.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 7","pages":"3881-3895"},"PeriodicalIF":8.9,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144219759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Data Synthesis Reinvented: Preserving Missing Patterns for Enhanced Analysis 重塑数据综合：为增强分析保留缺失模式

IF 8.9 2区计算机科学

IEEE Transactions on Knowledge and Data Engineering Pub Date : 2025-04-22 DOI: 10.1109/TKDE.2025.3563319

Xinyue Wang;Hafiz Asif;Shashank Gupta;Jaideep Vaidya

{"title":"Data Synthesis Reinvented: Preserving Missing Patterns for Enhanced Analysis","authors":"Xinyue Wang;Hafiz Asif;Shashank Gupta;Jaideep Vaidya","doi":"10.1109/TKDE.2025.3563319","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3563319","url":null,"abstract":"Synthetic data is being widely used as a replacement or enhancement for real data in fields as diverse as healthcare, telecommunications, and finance. Unlike real data, which represents actual people and objects, synthetic data is generated from an estimated distribution that retains key statistical properties of the real data. This makes synthetic data attractive for sharing while addressing privacy, confidentiality, and autonomy concerns. Real data often contains missing values that hold important information about individual, system, or organizational behavior. Standard synthetic data generation methods eliminate missing values as part of their pre-processing steps and thus completely ignore this valuable source of information. Instead, we propose methods to generate synthetic data that preserve both the observable and missing data distributions; consequently, retaining the valuable information encoded in the missing patterns of the real data. Our approach handles various missing data scenarios and can easily integrate with existing data generation methods. Extensive empirical evaluations on diverse datasets demonstrate the effectiveness of our approach as well as the value of preserving missing data distribution in synthetic data.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 7","pages":"3962-3975"},"PeriodicalIF":8.9,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144219817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Local Community Detection Method Based on Folded Subgraph 一种基于折叠子图的局部社区检测方法

IF 8.9 2区计算机科学

IEEE Transactions on Knowledge and Data Engineering Pub Date : 2025-04-21 DOI: 10.1109/TKDE.2025.3563100

Mengting Zhang;Weihong Bi

{"title":"A Local Community Detection Method Based on Folded Subgraph","authors":"Mengting Zhang;Weihong Bi","doi":"10.1109/TKDE.2025.3563100","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3563100","url":null,"abstract":"Community structure refers to the “small groups” in the network. Detecting community structure in networks has significant application value. With the continuous expansion and complexity of the network, the global information of the network is often difficult to obtain. On the other hand, in some cases, we pay more attention to the local community where the given node is located. Local community detection methods detect local community structure by using local information from a given node. However, many local community detection methods encounter the problem of precision limitation. Therefore, in order to alleviate such problems, we propose the FG-based method in this paper. Based on the characteristics of complex networks, a folded subgraph method is designed to consider some similar nodes as single nodes, reducing the impact of noise in the network. Furthermore, based on the folded subgraph, FG-based method designs a three-stage local expansion strategy, in which nodes with different characteristics are added to the local community in each stage. We conduct experiments on datasets and find that the FG-based method can improve the recall and precision of local community structures.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 7","pages":"3869-3880"},"PeriodicalIF":8.9,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144219686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Self-Labeling and Self-Knowledge Distillation Unsupervised Feature Selection 自标记与自知识蒸馏无监督特征选择

IF 8.9 2区计算机科学

IEEE Transactions on Knowledge and Data Engineering Pub Date : 2025-04-21 DOI: 10.1109/TKDE.2025.3561046

Yunzhi Ling;Feiping Nie;Weizhong Yu;Xuelong Li

{"title":"Self-Labeling and Self-Knowledge Distillation Unsupervised Feature Selection","authors":"Yunzhi Ling;Feiping Nie;Weizhong Yu;Xuelong Li","doi":"10.1109/TKDE.2025.3561046","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3561046","url":null,"abstract":"This paper proposes a deep pseudo-label method for unsupervised feature selection, which learns non-linear representations to generate pseudo-labels and trains a Neural Network (NN) to select informative features via self-Knowledge Distillation (KD). Specifically, the proposed method divides a standard NN into two sub-components: an encoder and a predictor, and introduces a dependency subnet. It works by self-supervised pre-training the encoder to produce informative representations and then alternating between two steps: (1) learning pseudo-labels by combining the clustering results of the encoder's outputs with the NN's prediction outputs, and (2) updating the NN's parameters by globally selecting a subset of features to predict the pseudo-labels while updating the subnet's parameters through self-KD. Self-KD is achieved by encouraging the subnet to locally capture a subset of the NN features to produce class probabilities that match those produced by the NN. This allows the model to self-absorb the learned inter-class knowledge and evaluate feature diversity, removing redundant features without sacrificing performance. Meanwhile, the potential discriminative capability of a NN can also be self-excavated without the assistance of other NNs. The two alternate steps reinforce each other: in step (2), by predicting the learned pseudo-labels and conducting self-KD, the discrimination of the outputs of both the NN and the encoder is gradually enhanced, while the self-labeling method in step (1) leverages these two improvements to further refine the pseudo-labels for step (2), resulting in the superior performance. Extensive experiments show the proposed method significantly outperforms state-of-the-art methods across various datasets.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 7","pages":"4270-4284"},"PeriodicalIF":8.9,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144232027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Pseudo-Label Guided Bidirectional Discriminative Deep Multi-View Subspace Clustering 伪标签引导双向判别深度多视图子空间聚类

IF 8.9 2区计算机科学

IEEE Transactions on Knowledge and Data Engineering Pub Date : 2025-04-21 DOI: 10.1109/TKDE.2025.3562723

Yongbo Yu;Zhoumin Lu;Feiping Nie;Weizhong Yu;Zongcheng Miao;Xuelong Li

{"title":"Pseudo-Label Guided Bidirectional Discriminative Deep Multi-View Subspace Clustering","authors":"Yongbo Yu;Zhoumin Lu;Feiping Nie;Weizhong Yu;Zongcheng Miao;Xuelong Li","doi":"10.1109/TKDE.2025.3562723","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3562723","url":null,"abstract":"In practical applications, multi-view subspace clustering is hindered by data noise that disrupts the ideal block-diagonal structure of self-representation matrices, thereby degrading performance. Moreover, many existing methods rely solely on sample features, overlooking the valuable structural information in affinity matrices (e.g., pairwise relationships). While conventional contrastive learning strategies often introduce false negative pairs due to noise and unreliable sample selection. To address these challenges, we propose a pseudo-label guided bidirectional discriminative deep multi-view subspace clustering method (PBDMSC). Our approach first employs pseudo-label guided contrastive learning, using previous cluster assignments to select reliable positive and negative samples, which mitigates incorrect pairings and enhances low-dimensional representations. Then, a discriminative self-representation learning method is introduced that leverages pseudo-labels to enforce homogeneous expression constraints and incorporates a bidirectional attention mechanism to preserve the structured information from affinity matrices, thereby enhancing robustness. Experimental results on six real-world datasets demonstrate that our proposed method achieves state-of-the-art clustering performance.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 7","pages":"4213-4224"},"PeriodicalIF":8.9,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144232123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

NeuralLoss: A Learnable Pretrained Surrogate Loss for Learning to Rank NeuralLoss：一种可学习的预训练替代损失，用于学习排序

IF 8.9 2区计算机科学

IEEE Transactions on Knowledge and Data Engineering Pub Date : 2025-04-18 DOI: 10.1109/TKDE.2025.3562450

Chen Liu;Cailan Jiang;Lixin Zhou

{"title":"NeuralLoss: A Learnable Pretrained Surrogate Loss for Learning to Rank","authors":"Chen Liu;Cailan Jiang;Lixin Zhou","doi":"10.1109/TKDE.2025.3562450","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3562450","url":null,"abstract":"Learning to Rank (LTR) aims to develop a ranking model from supervised data to rank a set of items using machine learning techniques. However, since the losses and ranking metrics involved in LTR are both based on ranking, they are neither continuous nor differentiable, making it challenging to optimize them using gradient descent algorithms. Various surrogate losses have been proposed to address this issue, yet their connection with ranking metrics is often loose, leading to inconsistencies between optimization objectives and evaluation metrics. In this study, we introduce NeuralLoss, a learnable and pretrained surrogate loss. By undergoing training on data structured around ranking metrics, NeuralLoss approximates these ranking metrics, aligning its optimization objectives with evaluation metrics. We employ Transformer to construct the surrogate model and ensure permutation invariance. The pretrained surrogate loss facilitates end-to-end training of ranking models using gradient descent algorithms and can approximate various ranking metrics by adjusting the training data. In this paper, we employ NeuralLoss to approximate NDCG and Recall, demonstrating its performance in both document retrieval and cross-modal retrieval tasks. Experimental results indicate that our approach achieves excellent performance and exhibits strong competitiveness across these tasks.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 7","pages":"4179-4192"},"PeriodicalIF":8.9,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144219675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0