Shiyue Huang;Ziwei Wang;Yinjun Wu;Yaofeng Tu;Jiankai Wang;Bin Cui
{"title":"OpDiag: Unveiling Database Performance Anomalies Through Query Operator Attribution","authors":"Shiyue Huang;Ziwei Wang;Yinjun Wu;Yaofeng Tu;Jiankai Wang;Bin Cui","doi":"10.1109/TKDE.2025.3557049","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3557049","url":null,"abstract":"How to effectively diagnose and mitigate database performance anomalies remains a significant concern for modern database systems. Manually identifying the root causes of the anomalies is a labor-intensive process and significantly relies on professional experience. Meanwhile, existing work on automatic database diagnosis mainly focuses on detecting anomalous performance metrics or system log. These solutions lack the power to pinpoint detailed issues such as bad queries or problematic operators, which are indispensable for most database troubleshooting processes. In this paper, we propose OpDiag, a diagnosis framework that attributes database performance anomalies to query operators. In this framework, we first construct models offline to represent the relationship between query operators, performance metrics, and anomalies. These models can capture query plan features and support ad-hoc queries and schemas. Then, through feature attribution on these models during online diagnosis, OpDiag can effectively identify critical anomalous metrics and further trace back to suspicious queries and operators. This can provide concrete guidance for subsequent steps in anomaly mitigation. We applied OpDiag to both synthetic benchmark and real industry cases from ZTE Corporation. Empirical studies prove that OpDiag can accurately localize anomalous queries and operators, thus reducing human efforts in diagnosing and mitigating database performance anomalies.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 6","pages":"3613-3626"},"PeriodicalIF":8.9,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143896439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Uncertainty Calibration for Counterfactual Propensity Estimation in Recommendation","authors":"Wenbo Hu;Xin Sun;Qiang Liu;Le Wu;Liang Wang","doi":"10.1109/TKDE.2025.3552658","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3552658","url":null,"abstract":"Post-click conversion rate (CVR) is a reliable indicator of online customers’ preferences, making it crucial for developing recommender systems. A major challenge in predicting CVR is severe selection bias, arising from users’ inherent self-selection behavior and the system’s item selection process. To mitigate this issue, the inverse propensity score (IPS) is employed to weight the prediction error of each observed instance. However, current propensity score estimations are unreliable due to the lack of a quality measure. To address this, we evaluate the quality of propensity scores from the perspective of uncertainty calibration, proposing the use of Expected Calibration Error (ECE) as a measure of propensity-score quality, which quantifies the extent to which predicted probabilities are overconfident by assessing the difference between predicted probabilities and actual observed frequencies. Miscalibrated propensity scores can lead to distorted IPS weights, thereby compromising the debiasing process in CVR prediction. In this paper, we introduce a model-agnostic calibration framework for propensity-based debiasing of CVR predictions. Theoretical analysis on bias and generalization bounds demonstrates the superiority of calibrated propensity estimates over uncalibrated ones. Experiments conducted on the Coat, Yahoo and KuaiRand datasets show improved uncertainty calibration, as evidenced by lower ECE values, leading to enhanced CVR prediction outcomes.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 6","pages":"3781-3793"},"PeriodicalIF":8.9,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143896438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Knowledge-Centered Dual-Process Reasoning for Math Word Problems With Large Language Models","authors":"Jiayu Liu;Zhenya Huang;Qi Liu;Zhiyuan Ma;Chengxiang Zhai;Enhong Chen","doi":"10.1109/TKDE.2025.3556367","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3556367","url":null,"abstract":"Math word problem (MWP) serves as a critical milestone for assessing the text mining ability and knowledge mastery level of models. Recent advancements have witnessed large language models (LLMs) showcasing remarkable performance on MWP. However, current LLMs still frequently exhibit logical errors, which highlights their inability to fully grasp the knowledge required for genuine step-by-step mathematical reasoning. To this end, in this paper, we propose a novel Knowledge-guided Solver (KNOS) framework that empowers LLMs to simulate human mathematical reasoning, whose core idea is to <italic>Invoke-Verify-Inject</i> necessary knowledge to solve MWP. We draw inspiration from the dual-process theory to construct two cooperative systems: a <italic>Knowledge System</i> and an <italic>Inference System</i>. Specifically, the <italic>Knowledge System</i> employs LLMs as the knowledge base and develops a novel <italic>knowledge invoker</i> that can elicit their relevant knowledge to support the strict step-level mathematical reasoning. In the <italic>Inference System</i>, we propose a <italic>knowledge verifier</i> and a <italic>knowledge injector</i> to evaluate the knowledge rationality and further guide the step-wise symbolic deduction in an interpretable manner based on human cognitive mechanism, respectively. Moreover, to tackle the potential scarcity issue of mathematics-specific knowledge in LLMs, we consider an open-book exam scenario and propose an improved version of KNOS called EKNOS. In EKNOS, we meticulously design <italic>knowledge selectors</i> to extract the most relevant commonsense and math formulas from external knowledge sources for each reasoning step. This knowledge is utilized to assist the <italic>knowledge invoker</i> in better stimulating LLMs’ reasoning abilities. Both KNOS and EKNOS are flexible to empower different LLMs. Our experiments with GPT3, ChatGPT, and GPT4 not only demonstrate their reasoning accuracy improvement but also show how they bring the strict step-wise interpretability of mathematical thinking.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 6","pages":"3457-3471"},"PeriodicalIF":8.9,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143896230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tao Meng;Shuo Shan;Hongen Shao;Yuntao Shou;Wei Ai;Keqin Li
{"title":"SE-GNN: Seed Expanded-Aware Graph Neural Network With Iterative Optimization for Semi-Supervised Entity Alignment","authors":"Tao Meng;Shuo Shan;Hongen Shao;Yuntao Shou;Wei Ai;Keqin Li","doi":"10.1109/TKDE.2025.3555586","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3555586","url":null,"abstract":"Entity alignment aims to use pre-aligned seed pairs to find other equivalent entities from different knowledge graphs and is widely used in graph fusion-related fields. However, as the scale of knowledge graphs increases, manually annotating pre-aligned seed pairs becomes difficult. Existing research utilizes entity embeddings obtained by aggregating single structural information to identify potential seed pairs, thus reducing the reliance on pre-aligned seed pairs. However, due to the structural heterogeneity of KG, the quality of potential seed pairs obtained using only a single structural information is not ideal. In addition, although existing research improves the quality of potential seed pairs through semi-supervised iteration, they underestimate the impact of embedding distortion produced by noisy seed pairs on the alignment effect. In order to solve the above problems, we propose a seed expanded-aware graph neural network with iterative optimization for semi-supervised entity alignment, named SE-GNN. First, we utilize the semantic attributes and structural features of entities, combined with a conditional filtering mechanism, to obtain high-quality initial potential seed pairs. Next, we designed a local and global awareness mechanism. It introduces initial potential seed pairs and combines local and global information to obtain a more comprehensive entity embedding representation, which alleviates the impact of KG structural heterogeneity and lays the foundation for the optimization of initial potential seed pairs. Then, we designed the threshold nearest neighbor embedding correction strategy. It combines the similarity threshold and the bidirectional nearest neighbor method as a filtering mechanism to select iterative potential seed pairs and also uses an embedding correction strategy to eliminate the embedding distortion. Finally, we will reach the optimized potential seeds after iterative rounds to input local and global sensing mechanisms, obtain the final entity embedding, and perform entity alignment. Experimental results on public datasets demonstrate the excellent performance of our SE-GNN, showcasing the effectiveness of the model. Our code is publicly available at <uri>https://github.com/ShuoShan1/SE-GNN</uri>.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 6","pages":"3700-3713"},"PeriodicalIF":8.9,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143896460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"User-Friendly and Expressive Forward-Secure Attribute-Based Signature With Server-Aided Signature and Outsourced Verification","authors":"Chao Guo;Yang Lu;Nian Xia;Jiguo Li","doi":"10.1109/TKDE.2025.3554973","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3554973","url":null,"abstract":"Attribute-based signature (ABS) is an attractive variation of digital signature that enables signers to sign messages with fine-grained signature predicates. In ABS, a signer is able to perform signing operations without revealing personal attributes, and verifiers can only confirm that the signature was created by someone with attributes satisfying a specific signature predicate. However, traditional ABS suffers from key exposure, and the compromise of a signer’s signature key results in invalidating all signatures from him/her. To address this problem, forward-secure ABS (FS-ABS) was introduced. Nevertheless, existing FS-ABS schemes have the shortcomings of low policy expressiveness and high computation costs, and thus are not suitable to be employed on mobile devices with limited resources. In this paper, we propose a user-friendly and expressive FS-ABS (UEFS-ABS) scheme that is proven secure in the standard model. The proposed scheme not only supports expressive signature predicates based on the linear secret sharing scheme, but also provides server-aided signature and outsourced verification functions, significantly reducing the workload of user terminals at both signature generation and verification stages. The experiments indicate that compared with the up-to-date FS-ABS scheme, our scheme reduces the computation costs for signature generation (on signers’ devices) and verification (on verifiers’ devices) by about 85% and 68%, respectively. This makes our scheme more suitable for user terminals in mobile computing scenarios.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 6","pages":"3794-3809"},"PeriodicalIF":8.9,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143896390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Meng Ding;Jing-Hua Yang;Xi-Le Zhao;Jie Zhang;Michael K. Ng
{"title":"Nonconvex Low-Rank Tensor Representation for Multi-View Subspace Clustering With Insufficient Observed Samples","authors":"Meng Ding;Jing-Hua Yang;Xi-Le Zhao;Jie Zhang;Michael K. Ng","doi":"10.1109/TKDE.2025.3555043","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3555043","url":null,"abstract":"Multi-view subspace clustering (MVSC) separates the data with multiple views into multiple clusters, and each cluster corresponds to one certain subspace. Existing tensor-based MVSC methods construct self-representation subspace coefficient matrices of all views as a tensor, and introduce the tensor nuclear norm (TNN) to capture the complementary information hidden in different views. The key assumption is that the data samples of each subspace must be sufficient for subspace representation. This work proposes a nonconvex latent transformed low-rank tensor representation framework for MVSC. To deal with the insufficient sample problem, we study the latent low-rank representation in the multi-view case to supplement underlying observed samples. Moreover, we propose to use data-driven transformed TNN (TTNN), resulting from the intrinsic structure of multi-view samples, to preserve the consensus and complementary information in the transformed domain. Meanwhile, the proposed unified nonconvex low-rank tensor representation framework can better learn the high correlation among different views. To resolve the proposed nonconvex optimization model, we propose an effective algorithm under the framework of the alternating direction method of multipliers and theoretically prove that the iteration sequences converge to the critical point. Experiments on various datasets showcase outstanding performance.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 6","pages":"3583-3597"},"PeriodicalIF":8.9,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143896224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distill & Contrast: A New Graph Self-Supervised Method With Approximating Nature Data Relationships","authors":"Dongxiao He;Jitao Zhao;Rui Guo;Zhiyong Feng;Cuiying Huo;Di Jin;Witold Pedrycz;Weixiong Zhang","doi":"10.1109/TKDE.2025.3554524","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3554524","url":null,"abstract":"Contrastive Learning (CL) has emerged as a popular self-supervised representation learning paradigm that has been shown in many applications to perform similarly to traditional supervised learning methods. A key component of CL is mining the latent discriminative relationships between positive and negative samples and using them as self-supervised labels. We argue that this discriminative contrastive task is, in essence, similar to a classification task, and the “either positive or negative” hard label sampling strategies are arbitrary. To solve this problem, we explore ideas from data distillation, which considers probabilistic logit vectors as soft labels to transfer model knowledge. We attempt to abandon the classical hard sampling labels in CL and instead explore self-supervised soft labels. We adopt soft sampling labels that are extracted, without supervision, from the inherent relationships in data pairs to retain more information. We propose a new self-supervised graph learning method, Distill and Contrast (D&C), for learning representations that closely approximate natural data relationships. D&C extracts node similarities from the features and structures to derive soft sampling labels, which also eliminate noise in the data to increase robustness. Extensive experimental results on real-world datasets demonstrate the effectiveness of the proposed method.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 6","pages":"3284-3297"},"PeriodicalIF":8.9,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143896388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On Efficient Single-Source Personalized PageRank Computation in Online Social Networks","authors":"Victor Junqiu Wei;Di Jiang;Jason Chen Zhang","doi":"10.1109/TKDE.2025.3551751","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3551751","url":null,"abstract":"The Single-Source Personalized PageRank (SSPPR) problem is widely used in information retrieval and recommendation systems. Traditional algorithms assume full knowledge of the network, making them inapplicable to online social networks (OSNs), where the topology is unknown, and users can only explore the network step by step via APIs. The only feasible approach for SSPPR in OSNs is Monte Carlo (MC) simulation, but traditional MC methods rely on static sampling, which lacks flexibility, delays feedback, and overestimates the number of required random walks. To address these limitations, we propose PANDA (Single-Source Personalized PageRank on OSNs with Rademacher Average), a progressive sampling algorithm. PANDA iteratively samples random walks in batches, estimating accuracy dynamically using Rademacher Average from statistical learning theory. This data-dependent approach allows for early termination once the desired accuracy is met. Additionally, PANDA features a dynamic sampling schedule to optimize efficiency. Empirical studies show that PANDA significantly outperforms existing methods, achieving the same accuracy with far greater efficiency.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 6","pages":"3598-3612"},"PeriodicalIF":8.9,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143896231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"#REval: A Semantic Evaluation Framework for Hashtag Recommendation","authors":"Areej Alsini;Du Q. Huynh;Amitava Datta","doi":"10.1109/TKDE.2025.3553683","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3553683","url":null,"abstract":"Automatic evaluation of hashtag recommendation models is a fundamental task in Twitter. In the traditional evaluation methods, the recommended hashtags from an algorithm are first compared with the ground truth hashtags for exact correspondences. The number of exact matches is then used to calculate the hit rate, hit ratio, precision, recall, or F1-score. This way of evaluating hashtag similarities is inadequate as it ignores the semantic correlation between the recommended and ground truth hashtags. To tackle this problem, we propose a novel semantic evaluation framework for hashtag recommendation, called #REval. This framework includes an internal module referred to as <italic>BERTag</i>, which automatically learns the hashtag embeddings. We investigate on how the #REval framework performs under different word embedding methods and different numbers of synonyms and hashtags in the recommendation using our proposed #REval-hit-ratio measure. Our experiments of the proposed framework on three large datasets show that #REval gave more meaningful hashtag synonyms for hashtag recommendation evaluation. Our analysis also highlights the sensitivity of the framework to the word embedding technique, with #REval based on BERTag more superior over #REval based on Word2Vec, FastText, and GloVe.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 6","pages":"3075-3087"},"PeriodicalIF":8.9,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143896389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Temporal Event Knowledge for Continual Social Event Classification","authors":"Shengsheng Qian;Shengjie Zhang;Dizhan Xue;Huaiwen Zhang;Changsheng Xu","doi":"10.1109/TKDE.2025.3553162","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3553162","url":null,"abstract":"With the rapid development of Internet and the burgeoning scale of social media, Social Event Classification (SEC) has garnered increasing attention. The existing study of SEC focuses on recognizing a fixed set of social events. However, in real-world scenarios, new social events continually emerge on social media, which suggests the necessity for a practical SEC model that can swiftly adapt to the evolving environment with incremental social events. Therefore, in this paper, we study a new yet crucial problem defined as Continual Social Event Classification (C-SEC), where new events continually emerge in the sequentially collected social data. Accordingly, we propose a novel Temporal Event Knowledge Network (TEKNet) to continually learn temporal event knowledge for C-SEC with temporally incremental events. First, we conduct present event knowledge learning to learn the classification of newly emerging events in the presently incoming data. Second, we design past event knowledge replay with self-knowledge distillation to consolidate the learned knowledge of past events and prevent catastrophic forgetting. Finally, we propose future event knowledge pretraining with a modality mixture mechanism to pretrain the classifiers for events that occur in the future. Comprehensive experiments on real-world social event datasets demonstrate the superiority of our proposed TEKNet for C-SEC.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 6","pages":"3485-3498"},"PeriodicalIF":8.9,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143896149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}