{"title":"RAGIC: Risk-Aware Generative Framework for Stock Interval Construction","authors":"Jingyi Gu;Wenlu Du;Guiling Wang","doi":"10.1109/TKDE.2025.3533492","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3533492","url":null,"abstract":"Efforts to predict stock market outcomes have yielded limited success due to the inherently stochastic nature of the market, influenced by numerous unpredictable factors. Many existing prediction approaches focus on single-point predictions, lacking the depth needed for effective decision-making and often overlooking market risk. To bridge this gap, we propose <italic>RAGIC</i>, a novel risk-aware framework for stock <italic>interval</i> prediction to quantify uncertainty. Our approach leverages a Generative Adversarial Network (GAN) to produce future price sequences infused with randomness inherent in financial markets. <italic>RAGIC</i>’s generator detects the risk perception of informed investors and captures historical price trends globally and locally. Then the <italic>risk-sensitive intervals</i> is built upon the simulated future prices from sequence generation through statistical inference, incorporating <italic>horizon-wise</i> insights. The interval’s width is adaptively adjusted to reflect market volatility. Importantly, our approach relies solely on publicly available data and incurs only low computational overhead. <italic>RAGIC</i>’s evaluation across globally recognized broad-based indices demonstrates its balanced performance, offering both accuracy and informativeness. Achieving a consistent 95% coverage, <italic>RAGIC</i> maintains a narrow interval width. This promising outcome suggests that our approach effectively addresses the challenges of stock market prediction while incorporating vital risk considerations.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 4","pages":"2085-2096"},"PeriodicalIF":8.9,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143570797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuren Mao;Yu Hao;Xin Cao;Yunjun Gao;Chang Yao;Xuemin Lin
{"title":"Boosting GNN-Based Link Prediction via PU-AUC Optimization","authors":"Yuren Mao;Yu Hao;Xin Cao;Yunjun Gao;Chang Yao;Xuemin Lin","doi":"10.1109/TKDE.2025.3525490","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3525490","url":null,"abstract":"Link prediction, which aims to predict the existence of a link between two nodes in a network, has various applications ranging from friend recommendation to protein interaction prediction. Recently, Graph Neural Network (GNN)-based link prediction has demonstrated its advantages and achieved the state-of-the-art performance. Typically, GNN-based link prediction can be formulated as a binary classification problem. However, in link prediction, we only have positive data (observed links) and unlabeled data (unobserved links), but no negative data. Therefore, Positive Unlabeled (PU) learning naturally fits the link prediction scenario. Unfortunately, the unknown class prior and data imbalance of networks impede the use of PU learning in link prediction. To deal with these issues, this paper proposes a novel model-agnostic PU learning algorithm for GNN-based link prediction by means of <italic>Positive-Unlabeled Area Under the Receiver Operating Characteristic Curve</i> (PU-AUC) optimization. The proposed method is free of class prior estimation and able to handle the data imbalance. Moreover, we propose an accelerated method to reduce the operational complexity of PU-AUC optimization from quadratic to approximately linear. Extensive experiments back up our theoretical analysis and validate that the proposed method is capable of boosting the performance of the state-of-the-art GNN-based link prediction models.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 4","pages":"1635-1649"},"PeriodicalIF":8.9,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143570764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
James Jianqiao Yu;Xinwei Fang;Shiyao Zhang;Yuxin Ma
{"title":"CLEAR: Spatial-Temporal Traffic Data Representation Learning for Traffic Prediction","authors":"James Jianqiao Yu;Xinwei Fang;Shiyao Zhang;Yuxin Ma","doi":"10.1109/TKDE.2025.3536009","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3536009","url":null,"abstract":"In the evolving field of urban development, precise traffic prediction is essential for optimizing traffic and mitigating congestion. While traditional graph learning-based models effectively exploit complex spatial-temporal correlations, their reliance on trivially generated graph structures or deeply intertwined adjacency learning without supervised loss significantly impedes their efficiency. This paper presents Contrastive Learning of spatial-tEmporal trAffic data Representations (CLEAR) framework, a comprehensive approach to spatial-temporal traffic data representation learning aimed at enhancing the accuracy of traffic predictions. Employing self-supervised contrastive learning, CLEAR strategically extracts discriminative embeddings from both traffic time-series and graph-structured data. The framework applies weak and strong data augmentations to facilitate subsequent exploitations of intrinsic spatial-temporal correlations that are critical for accurate prediction. Additionally, CLEAR incorporates advanced representation learning models that transmute these dynamics into compact, semantic-rich embeddings, thereby elevating downstream models’ prediction accuracy. By integrating with existing traffic predictors, CLEAR boosts predicting performance and accelerates the training process by effectively decoupling adjacency learning from correlation learning. Comprehensive experiments validate that CLEAR can robustly enhance the capabilities of existing graph learning-based traffic predictors and provide superior traffic predictions with a straightforward representation decoder. This investigation highlights the potential of contrastive representation learning in developing robust traffic data representations for traffic prediction.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 4","pages":"1672-1687"},"PeriodicalIF":8.9,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143570801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and Beyond","authors":"Fangzhi Xu;Qika Lin;Jiawei Han;Tianzhe Zhao;Jun Liu;Erik Cambria","doi":"10.1109/TKDE.2025.3536008","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3536008","url":null,"abstract":"Logical reasoning consistently plays a fundamental and significant role in the domains of knowledge engineering and artificial intelligence. Recently, Large Language Models (LLMs) have emerged as a noteworthy innovation in natural language processing (NLP). However, the question of whether LLMs can effectively address the task of logical reasoning, which requires gradual cognitive inference similar to human intelligence, remains unanswered. To this end, we aim to bridge this gap and provide comprehensive evaluations in this paper. First, to offer systematic evaluations, we select fifteen typical logical reasoning datasets and organize them into deductive, inductive, abductive and mixed-form reasoning settings. Considering the comprehensiveness of evaluations, we include 3 early-era representative LLMs and 4 trending LLMs. Second, different from previous evaluations relying only on simple metrics (e.g., <italic>accuracy</i>), we propose fine-level evaluations in objective and subjective manners, covering both answers and explanations, including <italic>answer correctness</i>, <italic>explain correctness</i>, <italic>explain completeness</i> and <italic>explain redundancy</i>. Additionally, to uncover the logical flaws of LLMs, problematic cases will be attributed to five error types from two dimensions, i.e., <italic>evidence selection process</i> and <italic>reasoning process</i>. Third, to avoid the influences of knowledge bias and concentrate purely on benchmarking the logical reasoning capability of LLMs, we propose a new dataset with neutral content. Based on the in-depth evaluations, this paper finally forms a general evaluation scheme of logical reasoning capability from six dimensions (i.e., <italic>Correct</i>, <italic>Rigorous</i>, <italic>Self-aware</i>, <italic>Active</i>, <italic>Oriented</i> and <italic>No hallucination</i>). It reflects the pros and cons of LLMs and gives guiding directions for future works.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 4","pages":"1620-1634"},"PeriodicalIF":8.9,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143570617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hyeongjun Yang;Donghyun Kim;Gayeon Park;KyuHwan Yeom;Kyong-Ho Lee
{"title":"CoreSense: Social Commonsense Knowledge-Aware Context Refinement for Conversational Recommender System","authors":"Hyeongjun Yang;Donghyun Kim;Gayeon Park;KyuHwan Yeom;Kyong-Ho Lee","doi":"10.1109/TKDE.2025.3536464","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3536464","url":null,"abstract":"Unlike the traditional recommender systems that rely on historical data such as clicks or purchases, a conversational recommender system (CRS) aims to provide a personalized recommendation through a natural conversation. The conversational interaction facilitates capturing not only explicit preference from mentioned items but also implicit states, such as a user’s current situation and emotional states from a dialogue context. Nevertheless, existing CRSs fall short of fully exploiting a dialogue context since they primarily derive explicit user preferences from the items and item-attributes mentioned in a conversation. To address this limitation and attain a comprehensive understanding of a dialogue context, we propose <underline>CoreSense</u>, a <underline>co</u>nversational <underline>re</u>commender system enhanced with social common<underline>sense</u> knowledge. In other words, CoreSense exploits the social commonsense knowledge graph ATOMIC to capture the user’s implicit states, such as a user’s current situation and emotional states, from a dialogue context. Thus, the social commonsense knowledge-augmented CRS can provide a more appropriate recommendation from a given dialogue context. Furthermore, we enhance the collaborative filtering effect by utilizing the user’s states inferred from commonsense knowledge as an improved criterion for retrieving other dialogues of similar interests. Extensive experiments on CRS benchmark datasets show that CoreSense provides human-like recommendations and responses based on inferred user states, achieving significant performance improvements.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 4","pages":"1702-1713"},"PeriodicalIF":8.9,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143570830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Adaptive Entire-Space Multi-Scenario Multi-Task Transfer Learning Model for Recommendations","authors":"Qingqing Yi;Jingjing Tang;Xiangyu Zhao;Yujian Zeng;Zengchun Song;Jia Wu","doi":"10.1109/TKDE.2025.3536334","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3536334","url":null,"abstract":"Multi-scenario and multi-task recommendation systems efficiently facilitate knowledge transfer across different scenarios and tasks. However, many existing approaches inadequately incorporate personalized information across users and scenarios. Moreover, the conversion rate (CVR) task in multi-task learning often encounters challenges like sample selection bias, resulting from systematic differences between the training and inference sample spaces, and data sparsity due to infrequent clicks. To address these issues, we propose Adaptive Entire-space Multi-scenario Multi-task Transfer Learning model (AEM<inline-formula><tex-math>$^{2}$</tex-math></inline-formula>TL) with four key modules: 1) Scenario-CGC (Scenario-Customized Gate Control), 2) Task-CGC (Task-Customized Gate Control), 3) Personalized Gating Network, and 4) Entire-space Supervised Multi-Task Module. AEM<inline-formula><tex-math>$^{2}$</tex-math></inline-formula>TL employs a multi-gate mechanism to effectively integrate shared and specific information across scenarios and tasks, enhancing prediction adaptability. To further improve task-specific personalization, it incorporates personalized prior features and applies a gating mechanism that dynamically scales the top-layer neural units. A novel post-impression behavior decomposition technique is designed to leverage all impression samples across the entire space, mitigating sample selection bias and data sparsity. Furthermore, an adaptive weighting mechanism dynamically allocates attention to tasks based on their relative importance, ensuring optimal task prioritization. Extensive experiments on one industrial and two real-world public datasets indicate the superiority of AEM<inline-formula><tex-math>$^{2}$</tex-math></inline-formula>TL over state-of-the-art methods.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 4","pages":"1585-1598"},"PeriodicalIF":8.9,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143570836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-Label Feature Selection With Missing Features via Implicit Label Replenishment and Positive Correlation Feature Recovery","authors":"Jianhua Dai;Wenxiang Chen;Yuhua Qian","doi":"10.1109/TKDE.2025.3536080","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3536080","url":null,"abstract":"Multi-label feature selection can effectively solve the curse of dimensionality problem in multi-label learning. Existing multi-label feature selection methods mostly handle multi-label data without missing features. However, in practical applications, multi-label data with missing features exist widely, and most existing multi-label feature selection methods are not directly applicable. Therefore, we propose a feature selection method for multi-label data with missing features. First, we propose a method to extract implicit label information from the feature space to replenish the binary label information. Second, we learn the positive correlation between features to construct a feature correlation recovery matrix to recover missing features. Finally, we design a sparse model-based multi-label feature selection method for processing multi-label data with missing features and prove the convergence of this method. Comparative experiments with existing feature selection methods demonstrate the effectiveness of our method.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 4","pages":"2042-2055"},"PeriodicalIF":8.9,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143570839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junjian Shi;Ye Han;Xiaojie Guo;Zekun Fei;Zheli Liu;Siyi Lv;Tong Li;Xiaotao Liu
{"title":"SMPCache: Towards More Efficient SQL Queries in Multi-Party Collaborative Data Analysis","authors":"Junjian Shi;Ye Han;Xiaojie Guo;Zekun Fei;Zheli Liu;Siyi Lv;Tong Li;Xiaotao Liu","doi":"10.1109/TKDE.2025.3535944","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3535944","url":null,"abstract":"Privacy-preserving collaborative data analysis is a popular research direction in recent years. Among all such analysis tasks, privacy-preserving SQL queries on multi-party databases are of particular industrial interest. Although the privacy concern can be addressed by many cryptographic tools, such as secure multi-party computation (MPC), the efficiency of executing such SQL queries is far from satisfactory, especially for high-volume databases. In particular, existing MPC-based solutions treat each SQL query as an isolated task and launch it from scratch, in spite of the nature that many SQL queries are done regularly and somewhat overlap in their functionalities. In this work, we are motivated to exploit this nature to improve the efficiency of MPC-based, privacy-preserving SQL queries. We introduce a cache-like optimization mechanism. To ensure a higher cache hit rate and reduce redundant MPC operators, we present a cache structure different from that of plain databases and design a set of cache strategies. Our optimization mechanism, SMPCache, can be built upon secret-sharing-based MPC frameworks, which attract much attention from the industry. To demonstrate the utility of SMPCache, we implement it on Rosetta, an open-source MPC library, and use real-world datasets to launch extensive experiments on some basic SQL operators (e.g., Filter, Order-by, Aggregation, and Inner-Join) and some representative composite SQL queries. To give a data point, we note that SMPCache can achieve most up to 3536× efficiency improvement on the TPC-DS dataset and 562× on the TPC-H dataset at a moderate storage cost. We also apply SMPCache to the basic SQL operators (Filter, Order-by, Group-by, Aggregation, and Inner-join) of the Secrecy framework, achieving up to 127.3× efficiency improvement.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 4","pages":"2111-2125"},"PeriodicalIF":8.9,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143570867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Latent and Changing Dynamics in Real Non-Stationary Environments","authors":"Zihe Liu;Jie Lu;Junyu Xuan;Guangquan Zhang","doi":"10.1109/TKDE.2025.3535961","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3535961","url":null,"abstract":"Model-based reinforcement learning (RL) aims to learn the underlying dynamics of a given environment. The success of most existing works is built on the critical assumption that the dynamic is fixed, which is unrealistic in many open-world scenarios, such as drone delivery and online chatting, where agents may need to deal with environments with unpredictable changing dynamics (hereafter, <italic>real non-stationary environment</i>). Therefore, learning changing dynamics in a real non-stationary environment offers both significant benefits and challenges. This paper proposes a new model-based reinforcement learning algorithm that proactively and dynamically detects possible changes and Learns these Latent and Changing Dynamics (LLCD) in a latent Markovian space for real non-stationary environments. To ensure the Markovian property of the RL model and improve computational efficiency, we employ a latent space model to learn the environment’s transition dynamics. Furthermore, we perform online change detection in the latent space to promptly identify change points in non-stationary environments. Then, we utilize the detected information to help the agent adapt to new conditions. Experiments indicate that the rewards of the proposed algorithm accumulate for the most rapid adaptions to environmental change, among other benefits. This work has a strong potential to enhance environmentally suitable model-based reinforcement learning capabilities.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 4","pages":"1930-1942"},"PeriodicalIF":8.9,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143570796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Graph Condensation: A Survey","authors":"Xinyi Gao;Junliang Yu;Tong Chen;Guanhua Ye;Wentao Zhang;Hongzhi Yin","doi":"10.1109/TKDE.2025.3535877","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3535877","url":null,"abstract":"The rapid growth of graph data poses significant challenges in storage, transmission, and particularly the training of graph neural networks (GNNs). To address these challenges, graph condensation (GC) has emerged as an innovative solution. GC focuses on synthesizing a compact yet highly representative graph, enabling GNNs trained on it to achieve performance comparable to those trained on the original large graph. The notable efficacy of GC and its broad prospects have garnered significant attention and spurred extensive research. This survey paper provides an up-to-date and systematic overview of GC, organizing existing research into five categories aligned with critical GC evaluation criteria: effectiveness, generalization, efficiency, fairness, and robustness. To facilitate an in-depth and comprehensive understanding of GC, this paper examines various methods under each category and thoroughly discusses two essential components within GC: optimization strategies and condensed graph generation. We also empirically compare and analyze representative GC methods with diverse optimization strategies based on the five proposed GC evaluation criteria. Finally, we explore the applications of GC in various fields, outline the related open-source libraries, and highlight the present challenges and novel insights, with the aim of promoting advancements in future research.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 4","pages":"1819-1837"},"PeriodicalIF":8.9,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143570794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}