IEEE Transactions on Knowledge and Data Engineering最新文献

筛选
英文 中文
A Scalable Algorithm for Fair Influence Maximization With Unbiased Estimator 一种具有无偏估计量的可伸缩公平影响最大化算法
IF 8.9 2区 计算机科学
IEEE Transactions on Knowledge and Data Engineering Pub Date : 2025-04-24 DOI: 10.1109/TKDE.2025.3564283
Xiaobin Rui;Zhixiao Wang;Hao Peng;Wei Chen;Philip S. Yu
{"title":"A Scalable Algorithm for Fair Influence Maximization With Unbiased Estimator","authors":"Xiaobin Rui;Zhixiao Wang;Hao Peng;Wei Chen;Philip S. Yu","doi":"10.1109/TKDE.2025.3564283","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3564283","url":null,"abstract":"This paper studies the fair influence maximization problem with efficient algorithms. In particular, given a graph <inline-formula><tex-math>$G$</tex-math></inline-formula>, a community structure <inline-formula><tex-math>${mathcal {C}}$</tex-math></inline-formula> consisting of disjoint communities, and a budget <inline-formula><tex-math>$k$</tex-math></inline-formula>, the problem asks to select a seed set <inline-formula><tex-math>$S$</tex-math></inline-formula> (<inline-formula><tex-math>$|S|=k$</tex-math></inline-formula>) that maximizes the influence spread while narrowing the influence gap between different communities. This problem derives from some significant social scenarios, such as health interventions (e.g. suicide/HIV prevention) where individuals from underrepresented groups or LGBTQ communities may be disproportionately excluded from the benefits of the intervention. To depict the concept of fairness in the context of influence maximization, researchers have proposed various notions of fairness, where the welfare fairness notion that better balances fairness level and influence spread has shown promising effectiveness. However, the lack of efficient algorithms for optimizing the objective function under welfare fairness restricts its application to networks of only a few hundred nodes. In this paper, we modify the objective function of welfare fairness to maximize the exponentially weighted sum and the logarithmically weighted sum over all communities’ influenced fractions (utility). To achieve efficient algorithms with theoretical guarantees, we first introduce two unbiased estimators: one for the fractional power of the arithmetic mean and the other for the logarithm of the arithmetic mean. Then, by adapting the Reverse Influence Sampling (RIS) approach, we convert the optimization problem to a weighted maximum coverage problem. We also analyze the number of reverse reachable sets needed to approximate the fair influence at a high probability. Finally, we present an efficient algorithm that guarantees <inline-formula><tex-math>$1-1/e - varepsilon$</tex-math></inline-formula> (positive objective function) or <inline-formula><tex-math>$1+1/e + varepsilon$</tex-math></inline-formula> (negative objective function) approximation for any small <inline-formula><tex-math>$varepsilon &gt; 0$</tex-math></inline-formula>. Experiments demonstrate that our proposed algorithm could efficiently handle large-scale networks with good performance.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 7","pages":"3881-3895"},"PeriodicalIF":8.9,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144219759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data Synthesis Reinvented: Preserving Missing Patterns for Enhanced Analysis 重塑数据综合:为增强分析保留缺失模式
IF 8.9 2区 计算机科学
IEEE Transactions on Knowledge and Data Engineering Pub Date : 2025-04-22 DOI: 10.1109/TKDE.2025.3563319
Xinyue Wang;Hafiz Asif;Shashank Gupta;Jaideep Vaidya
{"title":"Data Synthesis Reinvented: Preserving Missing Patterns for Enhanced Analysis","authors":"Xinyue Wang;Hafiz Asif;Shashank Gupta;Jaideep Vaidya","doi":"10.1109/TKDE.2025.3563319","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3563319","url":null,"abstract":"Synthetic data is being widely used as a replacement or enhancement for real data in fields as diverse as healthcare, telecommunications, and finance. Unlike real data, which represents actual people and objects, synthetic data is generated from an estimated distribution that retains key statistical properties of the real data. This makes synthetic data attractive for sharing while addressing privacy, confidentiality, and autonomy concerns. Real data often contains missing values that hold important information about individual, system, or organizational behavior. Standard synthetic data generation methods eliminate missing values as part of their pre-processing steps and thus completely ignore this valuable source of information. Instead, we propose methods to generate synthetic data that preserve both the observable and missing data distributions; consequently, retaining the valuable information encoded in the missing patterns of the real data. Our approach handles various missing data scenarios and can easily integrate with existing data generation methods. Extensive empirical evaluations on diverse datasets demonstrate the effectiveness of our approach as well as the value of preserving missing data distribution in synthetic data.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 7","pages":"3962-3975"},"PeriodicalIF":8.9,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144219817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Local Community Detection Method Based on Folded Subgraph 一种基于折叠子图的局部社区检测方法
IF 8.9 2区 计算机科学
IEEE Transactions on Knowledge and Data Engineering Pub Date : 2025-04-21 DOI: 10.1109/TKDE.2025.3563100
Mengting Zhang;Weihong Bi
{"title":"A Local Community Detection Method Based on Folded Subgraph","authors":"Mengting Zhang;Weihong Bi","doi":"10.1109/TKDE.2025.3563100","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3563100","url":null,"abstract":"Community structure refers to the “small groups” in the network. Detecting community structure in networks has significant application value. With the continuous expansion and complexity of the network, the global information of the network is often difficult to obtain. On the other hand, in some cases, we pay more attention to the local community where the given node is located. Local community detection methods detect local community structure by using local information from a given node. However, many local community detection methods encounter the problem of precision limitation. Therefore, in order to alleviate such problems, we propose the FG-based method in this paper. Based on the characteristics of complex networks, a folded subgraph method is designed to consider some similar nodes as single nodes, reducing the impact of noise in the network. Furthermore, based on the folded subgraph, FG-based method designs a three-stage local expansion strategy, in which nodes with different characteristics are added to the local community in each stage. We conduct experiments on datasets and find that the FG-based method can improve the recall and precision of local community structures.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 7","pages":"3869-3880"},"PeriodicalIF":8.9,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144219686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-Labeling and Self-Knowledge Distillation Unsupervised Feature Selection 自标记与自知识蒸馏无监督特征选择
IF 8.9 2区 计算机科学
IEEE Transactions on Knowledge and Data Engineering Pub Date : 2025-04-21 DOI: 10.1109/TKDE.2025.3561046
Yunzhi Ling;Feiping Nie;Weizhong Yu;Xuelong Li
{"title":"Self-Labeling and Self-Knowledge Distillation Unsupervised Feature Selection","authors":"Yunzhi Ling;Feiping Nie;Weizhong Yu;Xuelong Li","doi":"10.1109/TKDE.2025.3561046","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3561046","url":null,"abstract":"This paper proposes a deep pseudo-label method for unsupervised feature selection, which learns non-linear representations to generate pseudo-labels and trains a Neural Network (NN) to select informative features via self-Knowledge Distillation (KD). Specifically, the proposed method divides a standard NN into two sub-components: an encoder and a predictor, and introduces a dependency subnet. It works by self-supervised pre-training the encoder to produce informative representations and then alternating between two steps: (1) learning pseudo-labels by combining the clustering results of the encoder's outputs with the NN's prediction outputs, and (2) updating the NN's parameters by globally selecting a subset of features to predict the pseudo-labels while updating the subnet's parameters through self-KD. Self-KD is achieved by encouraging the subnet to locally capture a subset of the NN features to produce class probabilities that match those produced by the NN. This allows the model to self-absorb the learned inter-class knowledge and evaluate feature diversity, removing redundant features without sacrificing performance. Meanwhile, the potential discriminative capability of a NN can also be self-excavated without the assistance of other NNs. The two alternate steps reinforce each other: in step (2), by predicting the learned pseudo-labels and conducting self-KD, the discrimination of the outputs of both the NN and the encoder is gradually enhanced, while the self-labeling method in step (1) leverages these two improvements to further refine the pseudo-labels for step (2), resulting in the superior performance. Extensive experiments show the proposed method significantly outperforms state-of-the-art methods across various datasets.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 7","pages":"4270-4284"},"PeriodicalIF":8.9,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144232027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pseudo-Label Guided Bidirectional Discriminative Deep Multi-View Subspace Clustering 伪标签引导双向判别深度多视图子空间聚类
IF 8.9 2区 计算机科学
IEEE Transactions on Knowledge and Data Engineering Pub Date : 2025-04-21 DOI: 10.1109/TKDE.2025.3562723
Yongbo Yu;Zhoumin Lu;Feiping Nie;Weizhong Yu;Zongcheng Miao;Xuelong Li
{"title":"Pseudo-Label Guided Bidirectional Discriminative Deep Multi-View Subspace Clustering","authors":"Yongbo Yu;Zhoumin Lu;Feiping Nie;Weizhong Yu;Zongcheng Miao;Xuelong Li","doi":"10.1109/TKDE.2025.3562723","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3562723","url":null,"abstract":"In practical applications, multi-view subspace clustering is hindered by data noise that disrupts the ideal block-diagonal structure of self-representation matrices, thereby degrading performance. Moreover, many existing methods rely solely on sample features, overlooking the valuable structural information in affinity matrices (e.g., pairwise relationships). While conventional contrastive learning strategies often introduce false negative pairs due to noise and unreliable sample selection. To address these challenges, we propose a pseudo-label guided bidirectional discriminative deep multi-view subspace clustering method (PBDMSC). Our approach first employs pseudo-label guided contrastive learning, using previous cluster assignments to select reliable positive and negative samples, which mitigates incorrect pairings and enhances low-dimensional representations. Then, a discriminative self-representation learning method is introduced that leverages pseudo-labels to enforce homogeneous expression constraints and incorporates a bidirectional attention mechanism to preserve the structured information from affinity matrices, thereby enhancing robustness. Experimental results on six real-world datasets demonstrate that our proposed method achieves state-of-the-art clustering performance.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 7","pages":"4213-4224"},"PeriodicalIF":8.9,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144232123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NeuralLoss: A Learnable Pretrained Surrogate Loss for Learning to Rank NeuralLoss:一种可学习的预训练替代损失,用于学习排序
IF 8.9 2区 计算机科学
IEEE Transactions on Knowledge and Data Engineering Pub Date : 2025-04-18 DOI: 10.1109/TKDE.2025.3562450
Chen Liu;Cailan Jiang;Lixin Zhou
{"title":"NeuralLoss: A Learnable Pretrained Surrogate Loss for Learning to Rank","authors":"Chen Liu;Cailan Jiang;Lixin Zhou","doi":"10.1109/TKDE.2025.3562450","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3562450","url":null,"abstract":"Learning to Rank (LTR) aims to develop a ranking model from supervised data to rank a set of items using machine learning techniques. However, since the losses and ranking metrics involved in LTR are both based on ranking, they are neither continuous nor differentiable, making it challenging to optimize them using gradient descent algorithms. Various surrogate losses have been proposed to address this issue, yet their connection with ranking metrics is often loose, leading to inconsistencies between optimization objectives and evaluation metrics. In this study, we introduce NeuralLoss, a learnable and pretrained surrogate loss. By undergoing training on data structured around ranking metrics, NeuralLoss approximates these ranking metrics, aligning its optimization objectives with evaluation metrics. We employ Transformer to construct the surrogate model and ensure permutation invariance. The pretrained surrogate loss facilitates end-to-end training of ranking models using gradient descent algorithms and can approximate various ranking metrics by adjusting the training data. In this paper, we employ NeuralLoss to approximate NDCG and Recall, demonstrating its performance in both document retrieval and cross-modal retrieval tasks. Experimental results indicate that our approach achieves excellent performance and exhibits strong competitiveness across these tasks.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 7","pages":"4179-4192"},"PeriodicalIF":8.9,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144219675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FeBT: A Feature Balancing Transformer for Corporate ESG Forecasting FeBT:企业ESG预测的特征平衡变压器
IF 8.9 2区 计算机科学
IEEE Transactions on Knowledge and Data Engineering Pub Date : 2025-04-17 DOI: 10.1109/TKDE.2025.3560137
Yawen Li;Mengyu Zhuang;Guanhua Ye;Yan Li;Junheng Wang;Jinyi Zhou;Pengfei Zhang
{"title":"FeBT: A Feature Balancing Transformer for Corporate ESG Forecasting","authors":"Yawen Li;Mengyu Zhuang;Guanhua Ye;Yan Li;Junheng Wang;Jinyi Zhou;Pengfei Zhang","doi":"10.1109/TKDE.2025.3560137","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3560137","url":null,"abstract":"Environmental, social, and governance (ESG) serves as a crucial indicator for evaluating firms in terms of sustainable development. However, the existing ESG evaluation systems suffer from limitations, such as narrow coverage, subjective bias, and lack of timeliness. Therefore, there is a pressing need to leverage machine learning methods to predict the ESG performance of firms using their publicly available data. Traditional machine learning models encounter the feature imbalance problem due to the heterogeneity in ESG-related features. Common approaches typically involve unfolding all features, thereby granting high-dimensional folding features greater exposure and accessibility to downstream models, which results in the neglect of low-dimensional features. To fill the research gap regarding fully using the heterogeneous features of enterprises to enhance AI-based ESG prediction performance, we propose the Feature Balancing Transformer (FeBT), a model based on autoencoders and Transformer blocks. FeBT incorporates a novel feature balancing technique that compresses and enhances high-dimensional features from imbalanced data into low-dimensional representations, thereby ensuring a more balanced impact of high-dimensional and low-dimensional features on the model’s performance in the downstream ESG forecasting module. Extensive experiments verified the superior performance of FeBT compared with state-of-the-art methods in real-world ESG-related datasets and evidenced that our feature balancing module provides significant insights from high-dimensional folding features.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 7","pages":"4063-4074"},"PeriodicalIF":8.9,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144219811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EPM: Evolutionary Perception Method for Anomaly Detection in Noisy Dynamic Graphs 噪声动态图异常检测的进化感知方法
IF 8.9 2区 计算机科学
IEEE Transactions on Knowledge and Data Engineering Pub Date : 2025-04-15 DOI: 10.1109/TKDE.2025.3561191
Huan Wang;Junyang Chen;Yirui Wu;Victor C. M. Leung;Di Wang
{"title":"EPM: Evolutionary Perception Method for Anomaly Detection in Noisy Dynamic Graphs","authors":"Huan Wang;Junyang Chen;Yirui Wu;Victor C. M. Leung;Di Wang","doi":"10.1109/TKDE.2025.3561191","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3561191","url":null,"abstract":"With the rapid expansion of interactions across various domains such as social networks, transaction networks, and IP-IP networks, anomaly detection in dynamic graphs has become increasingly critical for mitigating potential risks. However, existing anomaly detection methods often assume noise-free dynamic graphs, overlooking the prevalence of noisy dynamic graphs in real-world applications. Specifically, noisy dynamic graphs affected by structural noises—such as spurious and missing nodes and edges—struggle to consistently provide reliable structural evidence for anomaly detection. To tackle this challenge, we propose an Evolutionary Perception Method (EPM) for identifying anomalous nodes in noisy dynamic graphs by resisting the interference of structural noises. EPM primarily consists of two components: a dynamic fitter and a filtering reviser. The dynamic fitter characterizes the interaction dynamics of nodes that removes and generates links at each period as a multiple superposition state, utilizing various link prediction algorithms to fit evolutionary mechanisms. Additionally, the filtering reviser designs evolutional entropies to quantify the evolutional uncertainty in multiple superposition states, further reconstructing the Kalman filter to optimize these entropies. Extensive experiments have proved that our proposed EPM outperforms state-of-the-art methods in discovering anomalous nodes in noisy dynamic graphs.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 7","pages":"4035-4048"},"PeriodicalIF":8.9,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144219758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ontology Embedding: A Survey of Methods, Applications and Resources 本体嵌入:方法、应用和资源综述
IF 8.9 2区 计算机科学
IEEE Transactions on Knowledge and Data Engineering Pub Date : 2025-04-11 DOI: 10.1109/TKDE.2025.3559023
Jiaoyan Chen;Olga Mashkova;Fernando Zhapa-Camacho;Robert Hoehndorf;Yuan He;Ian Horrocks
{"title":"Ontology Embedding: A Survey of Methods, Applications and Resources","authors":"Jiaoyan Chen;Olga Mashkova;Fernando Zhapa-Camacho;Robert Hoehndorf;Yuan He;Ian Horrocks","doi":"10.1109/TKDE.2025.3559023","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3559023","url":null,"abstract":"Ontologies are widely used for representing domain knowledge and meta data, playing an increasingly important role in Information Systems, the Semantic Web, Bioinformatics and many other domains. However, logical reasoning that ontologies can directly support are quite limited in learning, approximation and prediction. One straightforward solution is to integrate statistical analysis and machine learning. To this end, automatically learning vector representation for knowledge of an ontology i.e., <italic>ontology embedding</i> has been widely investigated. Numerous papers have been published on ontology embedding, but a lack of systematic reviews hinders researchers from gaining a comprehensive understanding of this field. To bridge this gap, we write this survey paper, which first introduces different kinds of semantics of ontologies and formally defines ontology embedding as well as its property of faithfulness. Based on this, it systematically categorizes and analyses a relatively complete set of over 80 papers, according to the ontologies they aim at and their technical solutions including geometric modeling, sequence modeling and graph propagation. This survey also introduces the applications of ontology embedding in ontology engineering, machine learning augmentation and life sciences, presents a new library mOWL and discusses the challenges and future directions.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 7","pages":"4193-4212"},"PeriodicalIF":8.9,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144243902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
OpDiag: Unveiling Database Performance Anomalies Through Query Operator Attribution OpDiag:通过查询操作符属性揭示数据库性能异常
IF 8.9 2区 计算机科学
IEEE Transactions on Knowledge and Data Engineering Pub Date : 2025-04-10 DOI: 10.1109/TKDE.2025.3557049
Shiyue Huang;Ziwei Wang;Yinjun Wu;Yaofeng Tu;Jiankai Wang;Bin Cui
{"title":"OpDiag: Unveiling Database Performance Anomalies Through Query Operator Attribution","authors":"Shiyue Huang;Ziwei Wang;Yinjun Wu;Yaofeng Tu;Jiankai Wang;Bin Cui","doi":"10.1109/TKDE.2025.3557049","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3557049","url":null,"abstract":"How to effectively diagnose and mitigate database performance anomalies remains a significant concern for modern database systems. Manually identifying the root causes of the anomalies is a labor-intensive process and significantly relies on professional experience. Meanwhile, existing work on automatic database diagnosis mainly focuses on detecting anomalous performance metrics or system log. These solutions lack the power to pinpoint detailed issues such as bad queries or problematic operators, which are indispensable for most database troubleshooting processes. In this paper, we propose OpDiag, a diagnosis framework that attributes database performance anomalies to query operators. In this framework, we first construct models offline to represent the relationship between query operators, performance metrics, and anomalies. These models can capture query plan features and support ad-hoc queries and schemas. Then, through feature attribution on these models during online diagnosis, OpDiag can effectively identify critical anomalous metrics and further trace back to suspicious queries and operators. This can provide concrete guidance for subsequent steps in anomaly mitigation. We applied OpDiag to both synthetic benchmark and real industry cases from ZTE Corporation. Empirical studies prove that OpDiag can accurately localize anomalous queries and operators, thus reducing human efforts in diagnosing and mitigating database performance anomalies.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 6","pages":"3613-3626"},"PeriodicalIF":8.9,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143896439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信