IEEE Transactions on Knowledge and Data Engineering最新文献_第2页

B2BGAN: A Backbone-to-Branches GAN-Based Oversampling Approach for Class-Imbalanced Tabular Data B2BGAN：一种基于骨干到分支gan的类不平衡表数据过采样方法

IF 10.4 2区计算机科学

IEEE Transactions on Knowledge and Data Engineering Pub Date : 2025-07-29 DOI: 10.1109/TKDE.2025.3593637

Xiaoguang Wang;Chenxu Wang;Mengqin Wang;Jun Liu;Xiaohong Guan

{"title":"B2BGAN: A Backbone-to-Branches GAN-Based Oversampling Approach for Class-Imbalanced Tabular Data","authors":"Xiaoguang Wang;Chenxu Wang;Mengqin Wang;Jun Liu;Xiaohong Guan","doi":"10.1109/TKDE.2025.3593637","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3593637","url":null,"abstract":"Tabular data is prevalent in many fields. In practice, tabular data classification may encounter severe challenges due to class imbalance, i.e., some majority classes overwhelm minority ones. Such imbalance could lead to biased prediction tendency of trained classifiers towards majority classes. Oversampling minority classes is an essential solution due to its generality and independence of downstream tasks. Recent years have witnessed the advantages of generative adversarial networks (GANs) in synthetic data generation, favored for their ability to generate quasi-realistic samples. However, challenges arise when the size of minority classes is too small to provide sufficient information for learning real data distributions. Furthermore, the generated minority-class samples could exacerbate the class overlap problem, i.e., some generated samples unexpectedly overlap with partial majority-class samples. To address these challenges, this paper presents B2BGAN, a novel GAN-based approach for oversampling imbalanced tabular data. To capture the real data distribution in a fine-grained manner, we propose a novel backbone-to-branches neural network for the generator to fit the majority and minority classes simultaneously. The backbone network fits the whole distribution of the entire data, while each branch network grasps the distinctive characteristics of individual classes. To alleviate the class overlap problem of generated samples, we develop a prototype-guided loss function to ensure that generated samples are closer to the corresponding class prototypes. We evaluate the effectiveness of B2BGAN on six real-world datasets using six metrics. Experimental results demonstrate that our method outperforms state-of-the-art models by 5.38% in AUC and 10.19% in AP.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 10","pages":"5808-5822"},"PeriodicalIF":10.4,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145049813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Unsupervised Concept Drift Detection From Deep Learning Representations in Real-Time 实时深度学习表示的无监督概念漂移检测

IF 10.4 2区计算机科学

IEEE Transactions on Knowledge and Data Engineering Pub Date : 2025-07-29 DOI: 10.1109/TKDE.2025.3593123

Salvatore Greco;Bartolomeo Vacchetti;Daniele Apiletti;Tania Cerquitelli

{"title":"Unsupervised Concept Drift Detection From Deep Learning Representations in Real-Time","authors":"Salvatore Greco;Bartolomeo Vacchetti;Daniele Apiletti;Tania Cerquitelli","doi":"10.1109/TKDE.2025.3593123","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3593123","url":null,"abstract":"Concept drift is the phenomenon in which the underlying data distributions and statistical properties of a target domain change over time, leading to a degradation in model performance. Consequently, production models require continuous drift detection monitoring. Most drift detection methods to date are supervised, relying on ground-truth labels. However, they are inapplicable in many real-world scenarios, as true labels are often unavailable. Although recent efforts have proposed unsupervised drift detectors, many lack the accuracy required for reliable detection or are too computationally intensive for real-time use in high-dimensional, large-scale production environments. Moreover, they often fail to characterize or explain drift effectively. To address these limitations, we propose DriftLens, an unsupervised framework for real-time concept drift detection and characterization. Designed for deep learning classifiers handling unstructured data, DriftLens leverages distribution distances in deep learning representations to enable efficient and accurate detection. Additionally, it characterizes drift by analyzing and explaining its impact on each label. Our evaluation across classifiers and data-types demonstrates that DriftLens (i) outperforms previous methods in detecting drift in 15/17 use cases; (ii) runs at least 5 times faster; (iii) produces drift curves that align closely with actual drift (correlation <inline-formula><tex-math>$geq !0.85$</tex-math></inline-formula>); (iv) effectively identifies representative drift samples as explanations.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 10","pages":"6232-6245"},"PeriodicalIF":10.4,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11103500","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145036858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Handling Out-of-Distribution Data: A Survey 分发外数据的处理：一项调查

IF 10.4 2区计算机科学

IEEE Transactions on Knowledge and Data Engineering Pub Date : 2025-07-28 DOI: 10.1109/TKDE.2025.3592614

Lakpa Tamang;Mohamed Reda Bouadjenek;Richard Dazeley;Sunil Aryal

{"title":"Handling Out-of-Distribution Data: A Survey","authors":"Lakpa Tamang;Mohamed Reda Bouadjenek;Richard Dazeley;Sunil Aryal","doi":"10.1109/TKDE.2025.3592614","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3592614","url":null,"abstract":"In the field of Machine Learning (ML) and data-driven applications, one of the significant challenge is the change in data distribution between the training and deployment stages, commonly known as distribution shift. This paper outlines different mechanisms for handling two main types of distribution shifts: (i) <bold>Covariate shift: where the value of features or covariates change between train and test data, and (ii) <bold>Concept/Semantic-shift: where model experiences shift in the concept learned during training due to emergence of novel classes in the test phase. We sum up our contributions in three folds. First, we formalize distribution shifts, recite on how the conventional method fails to handle them adequately and urge for a model that can simultaneously perform better in all types of distribution shifts. Second, we discuss why handling distribution shifts is important and provide an extensive review of the methods and techniques that have been developed to detect, measure, and mitigate the effects of these shifts. Third, we discuss the current state of distribution shift handling mechanisms and propose future research directions in this area. Overall, we provide a retrospective synopsis of the literature in the distribution shift, focusing on OOD data that had been overlooked in the existing surveys.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 10","pages":"5948-5966"},"PeriodicalIF":10.4,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145049812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GeoRecover: Recovery From Poisoning Attacks for LDP-Enabled Spatial Density Aggregation GeoRecover：从中毒攻击中恢复ldp支持的空间密度聚集

IF 10.4 2区计算机科学

IEEE Transactions on Knowledge and Data Engineering Pub Date : 2025-07-28 DOI: 10.1109/TKDE.2025.3593289

Xinyue Sun;Qingqing Ye;Haibo Hu;Jiawei Duan;Hui He;Weizhe Zhang

{"title":"GeoRecover: Recovery From Poisoning Attacks for LDP-Enabled Spatial Density Aggregation","authors":"Xinyue Sun;Qingqing Ye;Haibo Hu;Jiawei Duan;Hui He;Weizhe Zhang","doi":"10.1109/TKDE.2025.3593289","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3593289","url":null,"abstract":"The spatial density distribution collected and aggregated from users’ trajectory data is vital for location-based services like regional popularity analysis and congestion measurement. However, spatial density aggregation poses privacy concerns since trajectory data usually originate from users. Local differential privacy (LDP) addresses these concerns by allowing users to perturb their data before reporting it. Yet, LDP is vulnerable to poisoning attacks where attackers manipulate data from malicious users. Recent studies attempt to defend against such attacks in LDP-enabled frequency estimation but suffer from inaccurate data recovery due to empirical presets of malicious user proportions and inaccurate malicious data estimation. These issues worsen in spatial density aggregation, as high-dimensional trajectory data help conceal malicious information. In this work, we propose GeoRecover, a method to defend against poisoning attacks in LDP-enabled spatial density aggregation by addressing previous limitations. GeoRecover designs an adaptive model to unify these attacks. Under this model, GeoRecover estimates the proportion of malicious users using statistical differences between genuine and malicious data and learns malicious data statistics through LDP properties. This allows GeoRecover to recover accurate spatial density distribution by subtracting malicious users’ contributions. Evaluations on two real-world datasets show GeoRecover outperforms state-of-the-art methods in recovery accuracy, defense capability, and practical performance.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 10","pages":"5919-5933"},"PeriodicalIF":10.4,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145051031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Instance-Dependent Incomplete Multi-Label Feature Selection by Fuzzy Tolerance Relation and Fuzzy Mutual Implication Granularity 基于模糊容差关系和模糊互隐含粒度的实例依赖不完全多标签特征选择

IF 10.4 2区计算机科学

IEEE Transactions on Knowledge and Data Engineering Pub Date : 2025-07-28 DOI: 10.1109/TKDE.2025.3591461

Jianhua Dai;Wenxiang Chen;Yuhua Qian;Witold Pedrycz

{"title":"Instance-Dependent Incomplete Multi-Label Feature Selection by Fuzzy Tolerance Relation and Fuzzy Mutual Implication Granularity","authors":"Jianhua Dai;Wenxiang Chen;Yuhua Qian;Witold Pedrycz","doi":"10.1109/TKDE.2025.3591461","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3591461","url":null,"abstract":"Multi-label feature selection is an effective approach to mitigate the high-dimensional feature problem in multi-label learning. Most existing multi-label feature selection methods either assume that the data is complete, or that either the features or the labels are incomplete. So far, there are few studies on multi-label data with missing features and labels. In many cases, missing features in instances of multi-label data often lead to missing labels, which is ignored by existing studies. We define this type of data as instance-dependent incomplete multi-label data. In this paper, we propose a feature selection method for instance-dependent incomplete multi-label data. Firstly, we use the positive correlations between features to reconstruct the feature space, thereby recovering missing values and enhancing non-missing values. Secondly, we use fuzzy tolerance relation to guide label recovery, and utilize fuzzy mutual implication granularity to impose structural constraint on the projection matrix. Thirdly, we achieve feature selection by eliminating the impact of incomplete instances and imposing sparse regularization on the projection matrix. Finally, we provide a convergent solution for the proposed feature selection framework. Comparative experiments with existing multi-label feature selection methods show that our method can perform effective feature selection on instance-dependent incomplete multi-label data.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 10","pages":"5994-6008"},"PeriodicalIF":10.4,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145051029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Smoothness-Induced Efficient Incomplete Multi-View Clustering 光滑诱导的高效不完全多视图聚类

IF 10.4 2区计算机科学

IEEE Transactions on Knowledge and Data Engineering Pub Date : 2025-07-28 DOI: 10.1109/TKDE.2025.3591500

Tianchuan Yang;Haiqiang Chen;Haoyan Yang;Man-Sheng Chen;Xiangcheng Li;Youming Sun;Chang-Dong Wang

{"title":"Smoothness-Induced Efficient Incomplete Multi-View Clustering","authors":"Tianchuan Yang;Haiqiang Chen;Haoyan Yang;Man-Sheng Chen;Xiangcheng Li;Youming Sun;Chang-Dong Wang","doi":"10.1109/TKDE.2025.3591500","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3591500","url":null,"abstract":"Efficient incomplete multi-view clustering has received increasing attention due to its ability to handle large-scale and missing data. Although existing methods have promising performance, 1) they typically generate anchors directly from incomplete and noisy raw data, resulting in uncomprehensive anchor coverage and unreliable results; 2) they typically use only sparse regularization to remove noise and overlook outliers; 3) they ignore the inherent consistency of features in a view. To address these issues, we propose a smoothness-induced efficient incomplete multi-view clustering (SEIC) method. SEIC regards available data as natural anchors selected from complete data, and performs matrix decomposition only on them to obtain reliable small-size representation matrices. View-specific representation matrices are constructed as a tensor to capture consensus and guide matrix decomposition. More significantly, we enforce both smoothness and low-rank coupling on the tensor. Smoothness induces continuous variation of the tensor to further eliminate noise and enhance the relation among features. Benefiting from the noise robustness of SEIC, we design an adaptive noise balance parameter that renders SEIC parameter-free. Furthermore, by constructing a sparse anchor graph on the learned tensor, we propose the spectral clustering version SEIC-SC. Experiments on multiple datasets demonstrate the superior performance and efficiency of SEIC and SEIC-SC.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 10","pages":"6173-6188"},"PeriodicalIF":10.4,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145036982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GI-Graph: A Generative Invariant Graph Learning Scheme Towards Out-of-Distribution Generalization GI-Graph：一种面向分布外泛化的生成不变图学习方案

IF 10.4 2区计算机科学

IEEE Transactions on Knowledge and Data Engineering Pub Date : 2025-07-24 DOI: 10.1109/TKDE.2025.3592640

Sanfeng Zhang;Xinyi Liu;Zihao Qi;Xingchen Yan;Wang Yang

{"title":"GI-Graph: A Generative Invariant Graph Learning Scheme Towards Out-of-Distribution Generalization","authors":"Sanfeng Zhang;Xinyi Liu;Zihao Qi;Xingchen Yan;Wang Yang","doi":"10.1109/TKDE.2025.3592640","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3592640","url":null,"abstract":"When distribution shifts occur between testing and training graph data, out-of-distribution (OOD) samples undermine the performance of graph neural networks (GNNs). To improve adaptive OOD generalization of GNNs, this paper introduces a novel generative invariant graph learning framework, named GI-Graph. It consists of four modules: subgraph extractor, generative environment subgraph augmentation, generative invariant subgraph learning, and query feedback module. The subgraph extractor decomposes a graph sample into an environment subgraph and an invariant subgraph and improves extraction accuracy through query feedback. GI-Graph uses a diffusion model to generate diverse environment subgraphs, augmenting the OOD data. By combining diffusion models, contrastive learning, and attribute prediction networks, GI-Graph also generates augmented invariant subgraphs with significant identically distributed features and consistency of labels. Experimental results demonstrate that the controllable environment subgraph and invariant subgraph augmentation effectively improve the OOD generalization capability of GI-Graph, especially in capturing invariant features and maintaining category consistency across environments. Additionally, the contrastive learning-based fine-tuning method enables GI-Graph to quickly adapt to evolving environments. This paper verifies the effectiveness of the generative invariant graph learning scheme in graph OOD generalization.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 10","pages":"5934-5947"},"PeriodicalIF":10.4,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145050822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Survey of Text-to-SQL in the Era of LLMs: Where Are We, and Where Are We Going? 法学硕士时代文本到sql的调查：我们在哪里，我们要去哪里？

IF 10.4 2区计算机科学

IEEE Transactions on Knowledge and Data Engineering Pub Date : 2025-07-24 DOI: 10.1109/TKDE.2025.3592032

Xinyu Liu;Shuyu Shen;Boyan Li;Peixian Ma;Runzhi Jiang;Yuxin Zhang;Ju Fan;Guoliang Li;Nan Tang;Yuyu Luo

{"title":"A Survey of Text-to-SQL in the Era of LLMs: Where Are We, and Where Are We Going?","authors":"Xinyu Liu;Shuyu Shen;Boyan Li;Peixian Ma;Runzhi Jiang;Yuxin Zhang;Ju Fan;Guoliang Li;Nan Tang;Yuyu Luo","doi":"10.1109/TKDE.2025.3592032","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3592032","url":null,"abstract":"Translating users’ natural language queries (NL) into SQL queries (i.e., Text-to-SQL, <italic>a.k.a. NL2SQL) can significantly reduce barriers to accessing relational databases and support various commercial applications. The performance of Text-to-SQL has been greatly enhanced with the emergence of Large Language Models (LLMs). In this survey, we provide a comprehensive review of Text-to-SQL techniques powered by LLMs, covering its entire lifecycle from the following four aspects: (1) <italic>Model: Text-to-SQL translation techniques that tackle not only NL ambiguity and under-specification, but also properly map NL with database schema and instances; (2) <italic>Data: From the collection of training data, data synthesis due to training data scarcity, to Text-to-SQL benchmarks; (3) <italic>Evaluation: Evaluating Text-to-SQL methods from multiple angles using different metrics and granularities; and (4) <italic>Error Analysis: analyzing Text-to-SQL errors to find the root cause and guiding Text-to-SQL models to evolve. Moreover, we offer a rule of thumb for developing Text-to-SQL solutions. Finally, we discuss the research challenges and open problems of Text-to-SQL in the LLMs era.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 10","pages":"5735-5754"},"PeriodicalIF":10.4,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145050794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

EGNN: Exploring Structure-Level Neighborhoods in Graphs With Varying Homophily Ratios 探索具有不同同态比的图中的结构级邻域

IF 10.4 2区计算机科学

IEEE Transactions on Knowledge and Data Engineering Pub Date : 2025-07-23 DOI: 10.1109/TKDE.2025.3591771

Songwei Zhao;Bo Yu;Sinuo Zhang;Zhejian Yang;Jifeng Hu;Philip S. Yu;Hechang Chen

{"title":"EGNN: Exploring Structure-Level Neighborhoods in Graphs With Varying Homophily Ratios","authors":"Songwei Zhao;Bo Yu;Sinuo Zhang;Zhejian Yang;Jifeng Hu;Philip S. Yu;Hechang Chen","doi":"10.1109/TKDE.2025.3591771","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3591771","url":null,"abstract":"Graph neural networks (GNNs) have garnered significant attention for their competitive performance on graph-structured data. However, many existing methods are commonly constrained by the homophily assumption, making them overly reliant on the uniform neighbor propagation, which limits their ability to generalize to heterophilous graphs. Although some approaches extend aggregation to multi-hop neighbors, adapting neighborhood sizes on a per-node basis remains a significant challenge. In view of this, we propose an Evolutionary Graph Neural Network (EGNN) with adaptive structure-level aggregation and label smoothing, offering a novel solution to the aforementioned drawback. The core innovation of EGNN lies in assigning each node a <italic>personalized neighborhood structure utilizing <italic>behavior-level crossover and mutation. Specifically, we first adaptively search for the optimal structure-level neighborhoods for nodes within the solution space, leveraging the exploratory capabilities of evolutionary computation. This approach enhances the exchange of information between the target node and surrounding nodes, achieving a smooth vector representation. Subsequently, we adopt the optimal structure obtained through evolutionary search to perform label smoothing, further boosting the robustness of the framework. We conduct experiments on nine real-world networks with different homophily ratios, where outstanding performance demonstrates that the ability of EGNN can match or surpass SOTA baselines.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 10","pages":"5852-5865"},"PeriodicalIF":10.4,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145051055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Precise Bayes Regression: Approaching Optimality, Using Multi-Dimensional Space Partitioning Trees 精确贝叶斯回归：利用多维空间划分树逼近最优性

IF 10.4 2区计算机科学

IEEE Transactions on Knowledge and Data Engineering Pub Date : 2025-07-23 DOI: 10.1109/TKDE.2025.3592074

Amin Vahedian

{"title":"Precise Bayes Regression: Approaching Optimality, Using Multi-Dimensional Space Partitioning Trees","authors":"Amin Vahedian","doi":"10.1109/TKDE.2025.3592074","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3592074","url":null,"abstract":"The Conditional Expectation Function (CEF) is an optimal estimator in real space. Artificial Neural Networks (ANN), as the current state-of-the-art method, lack interpretability. Estimating CEF offers a path to achieve both accuracy and interpretability. Previous attempts to estimate CEF rely on limiting assumptions such as independence and distributional form or perform the expensive nearest neighbor search. We propose Dynamically Ordered Precise Bayes Regression (DO-PBR), a novel method to estimate CEF in discrete space. We prove DO-PBR approaches optimality with increasing number of samples. DO-PBR dynamically learns importance rankings for the predictors, which are region-specific, allowing the importance of a predictor vary across the space. DO-PBR is fully interpretable and makes no assumptions on independence or the distributional form, while requiring minimal parameter setting. In addition, DO-PBR avoids the costly nearest-neighbor search, by using a hierarchy of binary trees. Our experiments confirm our theoretical claims on approaching optimality and show that DO-PBR achieves substantially higher accuracy compared to ANN, when given the same amount of time. Our experiments show that on average, ANN takes 32 times longer to achieve the same level of accuracy as DO-PBR.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 10","pages":"6107-6119"},"PeriodicalIF":10.4,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145036801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0