{"title":"Securing Multi-Source Domain Adaptation With Global and Domain-Wise Privacy Demands","authors":"Shuwen Chai;Yutang Xiao;Feng Liu;Jian Zhu;Yuan Zhou","doi":"10.1109/TKDE.2024.3459890","DOIUrl":"10.1109/TKDE.2024.3459890","url":null,"abstract":"Making available a large size of training data for deep learning models and preserving data privacy are two ever-growing concerns in the machine learning community. \u0000<italic>Multi-source domain adaptation</i>\u0000 (MDA) leverages the data information from different domains and aggregates them to improve the performance in the target task, while the privacy leakage risk of publishing models under malicious attacker for membership or attribute inference is even more complicated than the one faced by single-source domain adaptation. In this paper, we tackle the problem of effectively protecting data privacy while training and aggregating multi-source information, where each source domain enjoys an independent privacy budget. Specifically, we develop a \u0000<italic>differentially private MDA</i>\u0000 (DPMDA) algorithm to provide domain-wise privacy protection with adaptive weighting scheme based on task similarity and task-specific privacy budget. We evaluate our algorithm on three benchmark tasks and show that DPMDA can effectively leverage different private budgets from source domains and consistently outperforms the existing private baselines with a reasonable gap with non-private state-of-the-art.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"9235-9248"},"PeriodicalIF":8.9,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142220107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huanyu Zhang;Yi-Fan Zhang;Zhang Zhang;Qingsong Wen;Liang Wang
{"title":"LogoRA: Local-Global Representation Alignment for Robust Time Series Classification","authors":"Huanyu Zhang;Yi-Fan Zhang;Zhang Zhang;Qingsong Wen;Liang Wang","doi":"10.1109/TKDE.2024.3459908","DOIUrl":"10.1109/TKDE.2024.3459908","url":null,"abstract":"Unsupervised domain adaptation (UDA) of time series aims to teach models to identify consistent patterns across various temporal scenarios, disregarding domain-specific differences, which can maintain their predictive accuracy and effectively adapt to new domains. However, existing UDA methods struggle to adequately extract and align both global and local features in time series data. To address this issue, we propose the \u0000<bold>Lo</b>\u0000cal-\u0000<bold>G</b>\u0000l\u0000<bold>o</b>\u0000bal \u0000<bold>R</b>\u0000epresentation \u0000<bold>A</b>\u0000lignment framework (LogoRA), which employs a two-branch encoder–comprising a multi-scale convolutional branch and a patching transformer branch. The encoder enables the extraction of both local and global representations from time series. A fusion module is then introduced to integrate these representations, enhancing domain-invariant feature alignment from multi-scale perspectives. To achieve effective alignment, LogoRA employs strategies like invariant feature learning on the source domain, utilizing triplet loss for fine alignment and dynamic time warping-based feature alignment. Additionally, it reduces source-target domain gaps through adversarial training and per-class prototype alignment. Our evaluations on four time-series datasets demonstrate that LogoRA outperforms strong baselines by up to 12.52%, showcasing its superiority in time series UDA tasks.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"8718-8729"},"PeriodicalIF":8.9,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142220104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tonglong Wei;Youfang Lin;Shengnan Guo;Yan Lin;Yiheng Huang;Chenyang Xiang;Yuqing Bai;Huaiyu Wan
{"title":"Diff-RNTraj: A Structure-Aware Diffusion Model for Road Network-Constrained Trajectory Generation","authors":"Tonglong Wei;Youfang Lin;Shengnan Guo;Yan Lin;Yiheng Huang;Chenyang Xiang;Yuqing Bai;Huaiyu Wan","doi":"10.1109/TKDE.2024.3460051","DOIUrl":"10.1109/TKDE.2024.3460051","url":null,"abstract":"Trajectory data is essential for various applications. However, publicly available trajectory datasets remain limited in scale due to privacy concerns, which hinders the development of trajectory mining and applications. Although some trajectory generation methods have been proposed to expand dataset scale, they generate trajectories in the geographical coordinate system, posing two limitations for practical applications: 1) failing to ensure that the generated trajectories are road-constrained. 2) lacking road-related information. In this paper, we propose a new problem, road network-constrained trajectory (RNTraj) generation, which can directly generate trajectories on the road network with road-related information. Specifically, RNTraj is a hybrid type of data, in which each point is represented by a discrete road segment and a continuous moving rate. To generate RNTraj, we design a diffusion model called Diff-RNTraj, which can effectively handle the hybrid RNTraj using a continuous diffusion framework by incorporating a pre-training strategy to embed hybrid RNTraj into continuous representations. During the sampling stage, a RNTraj decoder is designed to map the continuous representation generated by the diffusion model back to the hybrid RNTraj format. Furthermore, Diff-RNTraj introduces a novel loss function to enhance trajectory’s spatial validity. Extensive experiments conducted on two datasets demonstrate the effectiveness of Diff-RNTraj.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"7940-7953"},"PeriodicalIF":8.9,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142220106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaolin Zheng;Weiming Liu;Chaochao Chen;Jiajie Su;Xinting Liao;Mengling Hu;Yanchao Tan
{"title":"Mining User Consistent and Robust Preference for Unified Cross Domain Recommendation","authors":"Xiaolin Zheng;Weiming Liu;Chaochao Chen;Jiajie Su;Xinting Liao;Mengling Hu;Yanchao Tan","doi":"10.1109/TKDE.2024.3446581","DOIUrl":"10.1109/TKDE.2024.3446581","url":null,"abstract":"Cross-Domain Recommendation has been popularly studied to resolve data sparsity problem via leveraging knowledge transfer across different domains. In this paper, we focus on the \u0000<italic>Unified Cross-Domain Recommendation</i>\u0000 (\u0000<italic>Unified CDR</i>\u0000) problem. That is, how to enhance the recommendation performance within and cross domains when users are partially overlapped. It has two main challenges, i.e., 1) how to obtain robust matching solution among the whole users and 2) how to exploit consistent and accurate results across domains. To address these two challenges, we propose \u0000<monospace>MUCRP</monospace>\u0000, a cross-domain recommendation framework for the Unified CDR problem. \u0000<monospace>MUCRP</monospace>\u0000 contains three modules, i.e., variational rating reconstruction module, robust variational embedding alignment module, and cycle-consistent preference extraction module. To solve the first challenge, we propose fused Gromov-Wasserstein distribution co-clustering optimal transport to obtain more robust matching solution via considering both semantic and structure information. To tackle the second challenge, we propose embedding-consistent and prediction-consistent losses via dual autoencoder framework to achieve consistent results. Our empirical study on Douban and Amazon datasets demonstrates that \u0000<monospace>MUCRP</monospace>\u0000 significantly outperforms the state-of-the-art models.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"8758-8772"},"PeriodicalIF":8.9,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142227343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mining Triangle-Dense Subgraphs of a Fixed Size: Hardness, Lovasz extension and ´ Applications","authors":"Aritra Konar, Nicholas D. Sidiropoulos","doi":"10.1109/tkde.2024.3444608","DOIUrl":"https://doi.org/10.1109/tkde.2024.3444608","url":null,"abstract":"","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 1","pages":""},"PeriodicalIF":8.9,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142220108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ziming Wang;Kai Zhang;Yangming Lv;Yinglong Wang;Zhigang Zhao;Zhenying He;Yinan Jing;X. Sean Wang
{"title":"RTOD: Efficient Outlier Detection With Ray Tracing Cores","authors":"Ziming Wang;Kai Zhang;Yangming Lv;Yinglong Wang;Zhigang Zhao;Zhenying He;Yinan Jing;X. Sean Wang","doi":"10.1109/TKDE.2024.3453901","DOIUrl":"10.1109/TKDE.2024.3453901","url":null,"abstract":"Outlier detection in data streams is a critical component in numerous applications, such as network intrusion detection, financial fraud detection, and public health. To detect abnormal behaviors in real-time, these applications generally have stringent requirements for the performance of outlier detection. This paper proposes RTOD, a high-performance outlier detection approach that utilizes RT cores in modern GPUs for acceleration. RTOD transforms distance-based outlier detection in data streams into an efficient ray tracing job. By creating spheres centered at points within a window and casting rays from each point, RTOD identifies the outlier points according to the number of intersections between rays and spheres. Besides, we propose two optimization techniques, namely Grid Filtering and Ray-BVH Inversion, to further accelerate the detection efficiency of RT cores. Experimental results show that RTOD achieves up to 9.9× speedups over existing start-of-the-art outlier detection algorithms.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"9192-9204"},"PeriodicalIF":8.9,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142220109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yujie Li;Xin Yang;Qiang Gao;Hao Wang;Junbo Zhang;Tianrui Li
{"title":"Cross-Regional Fraud Detection via Continual Learning With Knowledge Transfer","authors":"Yujie Li;Xin Yang;Qiang Gao;Hao Wang;Junbo Zhang;Tianrui Li","doi":"10.1109/TKDE.2024.3451161","DOIUrl":"10.1109/TKDE.2024.3451161","url":null,"abstract":"Fraud detection poses a fundamental yet challenging problem to mitigate various risks associated with fraudulent activities. However, existing methods are limited by their reliance on static data within single geographical regions, thereby restricting the trained model’s adaptability across different regions. Practically, when enterprises expand their business into new cities or countries, training a new model from scratch can incur high computational costs and lead to catastrophic forgetting (CF). To address these limitations, we propose cross-regional fraud detection as an incremental learning problem, enabling the development of a unified model capable of adapting across diverse regions without suffering from CF. Subsequently, we introduce Cross-Regional Continual Learning (CCL), a novel paradigm that facilitates knowledge transfer and maintains performance when incrementally training models from previously learned regions to new ones. Specifically, CCL utilizes prototype-based knowledge replay for effective knowledge transfer while implementing a parameter smoothing mechanism to alleviate forgetting. Furthermore, we construct heterogeneous trade graphs (HTGs) and leverage graph-based backbones to enhance knowledge representation and facilitate knowledge transfer by uncovering intricate semantics inherent in cross-regional datasets. Extensive experiments demonstrate the superiority of our proposed method over baseline approaches and its substantial improvement in cross-regional fraud detection performance.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"7865-7877"},"PeriodicalIF":8.9,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142220111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PLBR: A Semi-Supervised Document Key Information Extraction via Pseudo-Labeling Bias Rectification","authors":"Pengcheng Guo;Yonghong Song;Boyu Wang;Jiaohao Liu;Qi Zhang","doi":"10.1109/TKDE.2024.3443928","DOIUrl":"10.1109/TKDE.2024.3443928","url":null,"abstract":"Document key information extraction (DKIE) methods often require a large number of labeled samples, imposing substantial annotation costs in practical scenarios. Fortunately, pseudo-labeling based semi-supervised learning (PSSL) algorithms provide an effective paradigm to alleviate the reliance on labeled data by leveraging unlabeled data. However, the main challenges for PSSL in DKIE tasks: 1) context dependency of DKIE results in incorrect pseudo-labels. 2) high intra-class variance and low inter-class variation on DKIE. To this end, this paper proposes a similarity matrix Pseudo-Label Bias Rectification (PLBR) semi-supervised method for DKIE tasks, which improves the quality of pseudo-labels on DKIE benchmarks with rare labels. More specifically, the Similarity Matrix Bias Rectification (SMBR) module is proposed to improve the quality of pseudo-labels, which utilizes the contextual information of DKIE data through the analysis of similarity between labeled and unlabeled data. Moreover, a dual branch adaptive alignment (DBAA) mechanism is designed to adaptively align intra-class variance and alleviate inter-class variation on DKIE benchmarks, which is composed of two adaptive alignment ways. One is the intra-class alignment branch, which is designed to adaptively align intra-class variance. The other one is the inter-class alignment branch, which is developed to adaptively alleviate inter-class variance changes on the representation level. Extensive experiment results on two benchmarks demonstrate that PLBR achieves state-of-the-art performance and its performance surpasses the previous SOTA by \u0000<inline-formula><tex-math>$2.11% sim 2.53%$</tex-math></inline-formula>\u0000, \u0000<inline-formula><tex-math>$2.09% sim 2.49%$</tex-math></inline-formula>\u0000 F1-score on FUNSD and CORD with rare labeled samples, respectively. Code will be open to the public.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"9025-9036"},"PeriodicalIF":8.9,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142220110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TripleSurv: Triplet Time-Adaptive Coordinate Learning Approach for Survival Analysis","authors":"Liwen Zhang;Lianzhen Zhong;Fan Yang;Linglong Tang;Di Dong;Hui Hui;Jie Tian","doi":"10.1109/TKDE.2024.3450910","DOIUrl":"10.1109/TKDE.2024.3450910","url":null,"abstract":"A core challenge in survival analysis is to model the distribution of time-to-event data, where the event of interest may be a death, failure, or occurrence of a specific event. Previous studies have showed that ranking and maximum likelihood estimation loss functions are widely-used learning approaches for survival analysis. However, ranking loss only focus on the ranking of survival time and does not consider potential effect of samples’ exact survival time values. Furthermore, the maximum likelihood estimation is unbounded and easily subject to outliers (e.g., censored data), which may cause poor performance of modeling. To handle the complexities of learning process and exploit valuable survival time values, we propose a time-adaptive coordinate loss function, TripleSurv, to achieve adaptive adjustments by introducing the differences in the survival time between sample pairs into the ranking, which can encourage the model to quantitatively rank relative risk of pairs, ultimately enhancing the accuracy of predictions. Most importantly, the TripleSurv is proficient in quantifying the relative risk between samples by ranking ordering of pairs, and consider the time interval as a trade-off to calibrate the robustness of model over sample distribution. Our TripleSurv is evaluated on three real-world survival datasets and a public synthetic dataset. The results show that our method outperforms the state-of-the-art methods and exhibits good model performance and robustness on modeling various sophisticated data distributions with different censor rates.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"9464-9475"},"PeriodicalIF":8.9,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142220118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Triple Factorization-Based SNLF Representation With Improved Momentum-Incorporated AGD: A Knowledge Transfer Approach","authors":"Ming Li;Yan Song;Derui Ding;Ran Sun","doi":"10.1109/TKDE.2024.3450469","DOIUrl":"10.1109/TKDE.2024.3450469","url":null,"abstract":"Symmetric, high-dimensional and sparse (SHiDS) networks usually contain rich knowledge regarding various patterns. To adequately extract useful information from SHiDS networks, a novel biased triple factorization-based (TF) symmetric and non-negative latent factor (SNLF) model is put forward by utilizing the transfer learning (TL) method, namely biased TL-incorporated TF-SNLF (BT\u0000<inline-formula><tex-math>$^{2}$</tex-math></inline-formula>\u0000-SNLF) model. The proposed BT\u0000<inline-formula><tex-math>$^{2}$</tex-math></inline-formula>\u0000-SNLF model mainly includes the following four ideas: 1) the implicit knowledge of the auxiliary matrix in the ternary rating domain is transferred to the target matrix in the numerical rating domain, facilitating the feature extraction; 2) two linear bias vectors are considered into the objective function to discover the knowledge describing the individual entity-oriented effect; 3) an improved momentum-incorporated additive gradient descent algorithm is developed to speed up the model convergence as well as guarantee the non-negativity of target SHiDS networks; and 4) a rigorous proof is provided to show that, under the assumption that the objective function is \u0000<inline-formula><tex-math>$L$</tex-math></inline-formula>\u0000-smooth and \u0000<inline-formula><tex-math>$mu$</tex-math></inline-formula>\u0000-convex, when \u0000<inline-formula><tex-math>$tgeq t_{0}$</tex-math></inline-formula>\u0000, the algorithm begins to descend and it can find an \u0000<inline-formula><tex-math>$epsilon$</tex-math></inline-formula>\u0000-solution within \u0000<inline-formula><tex-math>$O(ln((1+frac{mu L}{L(1+mu )+8mu })/epsilon ))$</tex-math></inline-formula>\u0000. Experimental results on six datasets from real applications demonstrate the effectiveness of our proposed T\u0000<inline-formula><tex-math>$^{2}$</tex-math></inline-formula>\u0000-SNLF and BT\u0000<inline-formula><tex-math>$^{2}$</tex-math></inline-formula>\u0000-SNLF models.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"9448-9463"},"PeriodicalIF":8.9,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142220114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}