Yan Lin;Jilin Hu;Shengnan Guo;Bin Yang;Christian S. Jensen;Youfang Lin;Huaiyu Wan
{"title":"UVTM: Universal Vehicle Trajectory Modeling With ST Feature Domain Generation","authors":"Yan Lin;Jilin Hu;Shengnan Guo;Bin Yang;Christian S. Jensen;Youfang Lin;Huaiyu Wan","doi":"10.1109/TKDE.2025.3570428","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3570428","url":null,"abstract":"Vehicle movement is frequently captured in the form of GPS trajectories, i.e., sequences of timestamped GPS locations. Such data is widely used for various tasks such as travel-time estimation, trajectory recovery, and trajectory prediction. A universal vehicle trajectory model could be applied to different tasks, removing the need to maintain multiple specialized models, thereby reducing computational and storage costs. However, creating such a model is challenging when the integrity of trajectory features is compromised, i.e., in scenarios where only partial features are available or the trajectories are sparse. To address these challenges, we propose the Universal Vehicle Trajectory Model (UVTM), which can effectively adapt to different tasks without excessive retraining. UVTM incorporates two specialized designs. First, it divides trajectory features into three distinct domains. Each domain can be masked and generated independently to accommodate tasks with only partially available features. Second, UVTM is pre-trained by reconstructing dense, feature-complete trajectories from sparse, feature-incomplete counterparts, enabling strong performance even when the integrity of trajectory features is compromised. Experiments involving four representative trajectory-related tasks on three real-world vehicle trajectory datasets provide insight into the performance of UVTM and offer evidence that it is capable of meeting its objectives.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 8","pages":"4894-4907"},"PeriodicalIF":8.9,"publicationDate":"2025-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144573001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ST-LLM+: Graph Enhanced Spatio-Temporal Large Language Models for Traffic Prediction","authors":"Chenxi Liu;Kethmi Hirushini Hettige;Qianxiong Xu;Cheng Long;Shili Xiang;Gao Cong;Ziyue Li;Rui Zhao","doi":"10.1109/TKDE.2025.3570705","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3570705","url":null,"abstract":"Traffic prediction is a crucial component of data management systems, leveraging historical data to learn spatio-temporal dynamics for forecasting future traffic and enabling efficient decision-making and resource allocation. Despite efforts to develop increasingly complex architectures, existing traffic prediction models often struggle to generalize across diverse datasets and contexts, limiting their adaptability in real-world applications. In contrast to existing traffic prediction models, large language models (LLMs) progress mainly through parameter expansion and extensive pre-training while maintaining their fundamental structures. In this paper, we propose ST-LLM+, the graph enhanced spatio-temporal large language models for traffic prediction. Through incorporating a proximity-based adjacency matrix derived from the traffic network into the calibrated LLMs, ST-LLM+ captures complex spatio-temporal dependencies within the traffic network. The Partially Frozen Graph Attention (PFGA) module is designed to retain global dependencies learned during LLMs pre-training while modeling localized dependencies specific to the traffic domain. To reduce computational overhead, ST-LLM+ adopts the LoRA-augmented training strategy, allowing attention layers to be fine-tuned with fewer learnable parameters. Comprehensive experiments on real-world traffic datasets demonstrate that ST-LLM+ outperforms state-of-the-art models. In particular, ST-LLM+ also exhibits robust performance in both few-shot and zero-shot prediction scenarios. Additionally, our case study demonstrates that ST-LLM+ captures global and localized dependencies between stations, verifying its effectiveness for traffic prediction tasks.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 8","pages":"4846-4859"},"PeriodicalIF":8.9,"publicationDate":"2025-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144573006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Incomplete Multi-View Clustering via Multi-Level Contrastive Learning","authors":"Jun Yin;Pei Wang;Shiliang Sun;Zhonglong Zheng","doi":"10.1109/TKDE.2025.3568795","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3568795","url":null,"abstract":"Although significant progress has been made in multi-view learning over the past few decades, it remains challenging, especially in the context of incomplete multi-view clustering, where modeling complex correlations among different views and handling missing data are key difficulties. In this paper, we propose a novel incomplete multi-view clustering network to address the aforementioned issue, named Incomplete Multi-view Clustering via Multi-level Contrastive Learning (IMC-MCL). Specifically, the proposed model aims to minimize the conditional entropy between views to recover missing data by dual prediction strategy. Moreover, the approach learns multi-level features, including latent, high-level and semantic features, with the goal of satisfying both reconstruction and consistency objectives in distinct feature spaces. Specifically, latent features are utilized to accomplish the reconstruction objective, while high-level features and semantic labels are employed to achieve the two consistency goals through contrastive learning. This framework enables the exploration of shared semantics within high-level features and achieves clustering assignment using semantic features. Extensive experiments have shown that the proposed approach outperforms other state-of-the-art incomplete multi-view clustering methods on seven challenging datasets.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 8","pages":"4716-4727"},"PeriodicalIF":8.9,"publicationDate":"2025-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144572988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Final: Combining First-Order Logic With Natural Logic for Question Answering","authors":"Jihao Shi;Xiao Ding;Siu Cheung Hui;Yuxiong Yan;Hengwei Zhao;Ting Liu;Bing Qin","doi":"10.1109/TKDE.2025.3551231","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3551231","url":null,"abstract":"Many question-answering problems can be approached as textual entailment tasks, where the hypotheses are formed by the question and candidate answers, and the premises are derived from an external knowledge base. However, current neural methods often lack transparency in their decision-making processes. Moreover, first-order logic methods, while systematic, struggle to integrate unstructured external knowledge. To address these limitations, we propose a neuro-symbolic reasoning framework called <italic><small>Final</small></i>, which combines <underline><b>FI</b></u>rst-order logic with <underline><b>NA</b></u>tural <underline><b>L</b></u>ogic for question answering. Our framework utilizes <italic>first-order logic</i> to systematically decompose hypotheses and <italic>natural logic</i> to construct reasoning paths from premises to hypotheses, employing bidirectional reasoning to establish links along the reasoning path. This approach not only enhances interpretability but also effectively integrates unstructured knowledge. Our experiments on three benchmark datasets, namely QASC, WorldTree, and WikiHop, demonstrate that <sc>Final</small> outperforms existing methods in commonsense reasoning and reading comprehension tasks, achieving state-of-the-art results. Additionally, our framework also provides transparent reasoning paths that elucidate the rationale behind the correct decisions.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 6","pages":"3103-3117"},"PeriodicalIF":8.9,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143896421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Survey on Point-of-Interest Recommendation: Models, Architectures, and Security","authors":"Qianru Zhang;Peng Yang;Junliang Yu;Haixin Wang;Xingwei He;Siu-Ming Yiu;Hongzhi Yin","doi":"10.1109/TKDE.2025.3551292","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3551292","url":null,"abstract":"The widespread adoption of smartphones and Location-Based Social Networks has led to a massive influx of spatio-temporal data, creating unparalleled opportunities for enhancing Point-of-Interest (POI) recommendation systems. These advanced POI systems are crucial for enriching user experiences, enabling personalized interactions, and optimizing decision-making processes in the digital landscape. However, existing surveys tend to focus on traditional approaches and few of them delve into cutting-edge developments, emerging architectures, as well as security considerations in POI recommendations. To address this gap, our survey stands out by offering a comprehensive, up-to-date review of POI recommendation systems, covering advancements in models, architectures, and security aspects. We systematically examine the transition from traditional models to advanced techniques such as large language models. Additionally, we explore the architectural evolution from centralized to decentralized and federated learning systems, highlighting the improvements in scalability and privacy. Furthermore, we address the increasing importance of security, examining potential vulnerabilities and privacy-preserving approaches. Our taxonomy provides a structured overview of the current state of POI recommendation, while we also identify promising directions for future research in this rapidly advancing field.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 6","pages":"3153-3172"},"PeriodicalIF":8.9,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143896425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xihong Yang;Yiqi Wang;Jin Chen;Wenqi Fan;Xiangyu Zhao;En Zhu;Xinwang Liu;Defu Lian
{"title":"Dual Test-Time Training for Out-of-Distribution Recommender System","authors":"Xihong Yang;Yiqi Wang;Jin Chen;Wenqi Fan;Xiangyu Zhao;En Zhu;Xinwang Liu;Defu Lian","doi":"10.1109/TKDE.2025.3548160","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3548160","url":null,"abstract":"Deep learning has been widely applied in recommender systems, which has recently achieved revolutionary progress. However, most existing learning-based methods assume that the user and item distributions remain unchanged between the training phase and the test phase. However, the distribution of user and item features can naturally shift in real-world scenarios, potentially resulting in a substantial decrease in recommendation performance. This phenomenon can be formulated as an Out-Of-Distribution (OOD) recommendation problem. To address this challenge, we propose a novel <bold>D</b>ual <bold>T</b>est-<bold>T</b>ime-<bold>T</b>raining framework for <bold>O</b>OD <bold>R</b>ecommendation, termed <bold>DT3OR</b>. In DT3OR, we incorporate a model adaptation mechanism during the test-time phase to carefully update the recommendation model, allowing the model to adapt specially to the shifting user and item features. To be specific, we propose a self-distillation task and a contrastive task to assist the model learning both the user’s invariant interest preferences and the variant user/item characteristics during the test-time phase, thus facilitating a smooth adaptation to the shifting features. Furthermore, we provide theoretical analysis to support the rationale behind our dual test-time training framework. To the best of our knowledge, this paper is the first work to address OOD recommendation via a test-time-training strategy. We conduct experiments on five datasets with various backbones. Comprehensive experimental results have demonstrated the effectiveness of DT3OR compared to other state-of-the-art baselines.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 6","pages":"3312-3326"},"PeriodicalIF":8.9,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143896269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fangyuan Xie;Jingjing Xue;Feiping Nie;Weizhong Yu;Xuelong Li
{"title":"Fast Anchor Graph Clustering via Maximizing Within-Cluster Similarity","authors":"Fangyuan Xie;Jingjing Xue;Feiping Nie;Weizhong Yu;Xuelong Li","doi":"10.1109/TKDE.2025.3569777","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3569777","url":null,"abstract":"Anchor-based clustering methods have attracted increasing attention due to their ability to provide efficient and scalable solutions in clustering tasks, such as subspace, multi-view and ensemble clustering. Nevertheless, the majority of anchor-based methods view anchors merely as tools, concentrating on diminishing computational complexity within original data space. However, in fact, clustering can be directly performed on anchors and then the anchor clustering results could be propagated to original data. Due to the much smaller volume of anchors, this could significantly reduce the computational complexity of clustering algorithms. Building upon this idea, in this paper, we propose a fast anchor graph clustering method (FAGC) via maximizing within-cluster similarity. Inspired by the relaxation and discretization model in spectral clustering, we also propose two corresponding models, namely FAGC-R and FAGC-D. FAGC-R first obtains spectral embedding of anchors and then discretizes the embedding to obtain anchor indicator matrix. While FAGC-D directly solves the discrete anchor membership matrix. Once anchor clustering results are obtained, original data labels can be obtained through anchor label transmission. Extensive experiments conducted on synthetic and real datasets illustrate the effectiveness and efficiency of the proposed methods.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 8","pages":"4591-4603"},"PeriodicalIF":8.9,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144573005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Pricing for Data Assets Based on Data Quality, Quantity and Utility on the Perspective of Consumer Heterogeneity","authors":"Juanjuan Lin;Zhigang Huang;Yong Tang","doi":"10.1109/TKDE.2025.3551401","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3551401","url":null,"abstract":"It is an inevitable trend for the development of global digital economy to transform data into data assets and realize their transaction circulation. Aiming at the release of data value and the development of its transaction process, the concept of integrated score of data is proposed by combining integrated quality index containing four dimensions with data quantity. On this basis, data assets are priced according to the principle of profit maximization by constructing a nonlinear programming model. Among them, three types of pricing models are divided according to the heterogeneity of consumers’ utility sensitivity, and the consumers’ wiilingness to pay are adjusted based on business parameters using FAHP system. The proposed model is verified with the data of China's carbon emissions as the original data, combined with the KNN machine learning algorithm and a series of simulation analyses. In addition, multiple sets of heterogeneous data are tested. The results show that the quality, quantity and utility of data have an important impact on the pricing of data assets, and it is necessary to divide the utility sensitivity of consumers as well as take business parameters into consideration. The model proposed can also provide decision-making reference for data platforms.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 6","pages":"3641-3652"},"PeriodicalIF":8.9,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143896393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hyperbolic Hypergraph Transformer With Knowledge State Disentanglement for Knowledge Tracing","authors":"Jiawei Li;Shun Mao;Yixiu Qin;Feng Wang;Yuncheng Jiang","doi":"10.1109/TKDE.2025.3570098","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3570098","url":null,"abstract":"Knowledge Tracing (KT) refers to inferring the students’ knowledge mastery and predicting their future performance. KT serves as the foundation for personalized learning and enhances the effectiveness of educational interventions, becoming a crucial technology in intelligent tutoring systems. Recent approaches have demonstrated notable success by harnessing the potent representational capacities of deep learning. However, complex neural networks lead to entangled knowledge state embeddings, where the embedding dimensions are coupled, limiting their expressiveness and interpretability. In addition, the limitations of existing methods in euclidean space result in distortions when capturing complex relationships among knowledge states. This distortion is reflected in the alteration of distances and geometric structures among knowledge states during the embedding process. To address the challenges, in this paper, we propose a hyperbolic hypergraph transformer with knowledge state <bold>Disen</b>tanglement for <bold>K</b>nowledge <bold>T</b>racing, named DisenKT. We construct the students’ response sequences into the hypergraph, projected into the hyperbolic space to alleviate the representation distortion problem of questions and knowledge states. The embeddings of hierarchical knowledge states are refined through message passing between questions and students based on the proposed hyperbolic hypergraph transformer. Moreover, we are the first to disentangle knowledge states via a contrastive clustering auxiliary task, which enhances the expressiveness and interpretability of knowledge state embeddings. Extensive experimental results on three public datasets demonstrate that DisenKT outperforms state-of-the-art methods on student performance prediction and interpretability.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 8","pages":"4677-4690"},"PeriodicalIF":8.9,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144573003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gen Shi;Yifan Zhu;Wenjin Liu;Quanming Yao;Xuesong Li
{"title":"Heterogeneous Graph-Based Multimodal Brain Network Learning","authors":"Gen Shi;Yifan Zhu;Wenjin Liu;Quanming Yao;Xuesong Li","doi":"10.1109/TKDE.2025.3569648","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3569648","url":null,"abstract":"Graph neural networks (GNNs) provide powerful insights into brain neuroimaging technology from the view of graphical networks. However, most existing GNN-based models treat the brain connectome, derived from neuroimaging, as a homogeneous graph characterized by uniform node and edge types. In fact, emerging studies have reported and emphasized the significance of heterogeneity among human brain activities, especially between the two cerebral hemispheres. Thus, homogeneous-structured brain network-based graph methods are insufficient for modeling complicated cerebral activity states. To overcome this problem, we introduce a novel heterogeneous graph neural network (HeBrainGNN) for multimodal brain neuroimaging fusion learning. HeBrainGNN first conceptualizes the brain network as a heterogeneous graph with multiple types of nodes (representing the left and right hemispheres) and edges (categorizing intra- and interhemispheric interactions). We further develop a self-supervised pretraining strategy for this heterogeneous network to address the potential overfitting problem caused by the conflict between a large parameter size and a small medical data sample size. Empirical results show the superiority of the proposed model over other existing methods in brain-related disease prediction tasks. Ablation experiments show that our heterogeneous graph-based model attaches more importance to hemispheric connections that may be neglected due to their low strength by previous homogeneous graph models. Additional experiments reveal that our pretraining strategy not only addresses the challenge of limited labeled data but also significantly enhances accuracy, affirming the potential of our approach in advancing neuroimaging analysis.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 8","pages":"4664-4676"},"PeriodicalIF":8.9,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144572993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}