IEEE Transactions on Knowledge and Data Engineering最新文献

筛选
英文 中文
Training-Free Graph-Based Imputation of Missing Modalities in Multimodal Recommendation 多模态推荐中缺失模态的无训练图插值
IF 10.4 2区 计算机科学
IEEE Transactions on Knowledge and Data Engineering Pub Date : 2026-03-01 Epub Date: 2026-03-03 DOI: 10.1109/TKDE.2026.3667005
Daniele Malitesta;Emanuele Rossi;Claudio Pomo;Tommaso Di Noia;Fragkiskos D. Malliaros
{"title":"Training-Free Graph-Based Imputation of Missing Modalities in Multimodal Recommendation","authors":"Daniele Malitesta;Emanuele Rossi;Claudio Pomo;Tommaso Di Noia;Fragkiskos D. Malliaros","doi":"10.1109/TKDE.2026.3667005","DOIUrl":"https://doi.org/10.1109/TKDE.2026.3667005","url":null,"abstract":"Multimodal recommender systems (RSs) represent items in the catalog through multimodal data (e.g., product images and descriptions) that, in some cases, might be noisy or (even worse) missing. In those scenarios, the common practice is to drop items with missing modalities and train the multimodal RSs on a subsample of the original dataset. To date, the problem of missing modalities in multimodal recommendation has still received limited attention in the literature, lacking a precise formalisation as done with missing information in traditional machine learning. In this work, we first provide a problem formalisation for missing modalities in multimodal recommendation. Second, by leveraging the user-item graph structure, we re-cast the problem of missing multimodal information as a problem of graph features interpolation on the item-item co-purchase graph. On this basis, we propose four training-free approaches that propagate the available multimodal features throughout the item-item graph to impute the missing features. Extensive experiments on popular multimodal recommendation datasets demonstrate that our solutions can be seamlessly plugged into any existing multimodal RS and benchmarking framework while still preserving (or even widen) the performance gap between multimodal and traditional RSs. Moreover, we show that our graph-based techniques can perform better than traditional imputations in machine learning under different missing modalities settings. Finally, we analyse (for the first time in multimodal RSs) how feature homophily calculated on the item-item graph can influence our graph-based imputations.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 5","pages":"3250-3263"},"PeriodicalIF":10.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147665263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TrashToTreasure: An Informative and Interactive Multi-View Classification Framework 从垃圾到宝藏:一个信息丰富的交互式多视图分类框架
IF 10.4 2区 计算机科学
IEEE Transactions on Knowledge and Data Engineering Pub Date : 2026-03-01 Epub Date: 2026-03-20 DOI: 10.1109/TKDE.2026.3676286
Guoqing Chao;Mingjie Zhang;Xiru Wang;Jie Wen;Weiping Ding;Dianhui Chu
{"title":"TrashToTreasure: An Informative and Interactive Multi-View Classification Framework","authors":"Guoqing Chao;Mingjie Zhang;Xiru Wang;Jie Wen;Weiping Ding;Dianhui Chu","doi":"10.1109/TKDE.2026.3676286","DOIUrl":"https://doi.org/10.1109/TKDE.2026.3676286","url":null,"abstract":"As a basic machine learning task, Multi-View Classification (MVC) has garnered considerable attention and achieved great success. However, the existing MVC methods, especially late fusion style ones still suffer from some problems: 1) hidden valuable information is not well exploited; 2) a lack of interaction before decision making. To address these problems, we propose a novel framework named “TrashtoTreasure” that leverages mutual information to effectively exploit hidden valuable information. Specifically, the framework explicitly disentangles multi-view information into “useful” components and “trash” (noisy) components, and further extracts potentially valuable “treasure” information from the “trash” components of all views. Additionally, we design a tailored objective function that facilitates the effective separation of “useful” and “trash” components, as well as the synergistic extraction of “treasure” information. This function guides model optimization through triple mutual information constraints. Experimental results on synthetic data and several real-world data sets verified the effectiveness and superiority of the proposed method. The fresh perspective offered by this article may inspire more interesting exploration in this direction.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 5","pages":"3264-3276"},"PeriodicalIF":10.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147665280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Uncertainty-Aware Online Time Series Multi-Step Forecasting Framework in Cloud Systems 云系统中不确定性感知在线时间序列多步预测框架
IF 10.4 2区 计算机科学
IEEE Transactions on Knowledge and Data Engineering Pub Date : 2026-03-01 Epub Date: 2026-03-16 DOI: 10.1109/TKDE.2026.3674583
Jiadong Chen;Yang Luo;Xiuqi Huang;Fuxin Jiang;Yangguang Shi;Tieying Zhang;Xiaofeng Gao
{"title":"Uncertainty-Aware Online Time Series Multi-Step Forecasting Framework in Cloud Systems","authors":"Jiadong Chen;Yang Luo;Xiuqi Huang;Fuxin Jiang;Yangguang Shi;Tieying Zhang;Xiaofeng Gao","doi":"10.1109/TKDE.2026.3674583","DOIUrl":"https://doi.org/10.1109/TKDE.2026.3674583","url":null,"abstract":"Accurate resource planning in large-scale systems relies on reliable predictions of future workloads, a task inherently challenged by their variability and dynamism. Previous prediction methods are either ineffective to deal with the changing dynamics of the series, or are highly black-boxed and unable to conduct effective theoretical analysis. To address these issues, we design an effective ensemble framework, Interval Prediction with Online Chasing (<b>IPOC</b>), tailored for multi-step interval forecasting in real-time systems. Theoretically, by formulating the task as a Dynamic Deterministic Markov Decision Process (Dd-MDP), an advanced theoretical framework is introduced to analyze problem solvability and derive conditions for the existence of feasible solutions. Incorporating the proposed Adaptive Copula Conformal Inference (ACCI) module and a well-designed Chasing Oracle, <b>IPOC</b> captures the changing dynamics and temporal dependencies to enable multi-step forecasting. We organically integrate advanced online learning theories with time series forecasting tasks to construct a forecasting framework that is both theoretically rigorous and practically effective. Theoretical analysis underpins <b>IPOC</b>’s effectiveness, demonstrating sublinear regret and adherence to confidence interval specifications. The chasing regret of the Chasing Oracle is <inline-formula><tex-math>$O(L_{c})$</tex-math></inline-formula>, and the overall regret of <b>IPOC</b> is <inline-formula><tex-math>$O(sqrt{L_{c} T log |mathcal {F}|})$</tex-math></inline-formula>. Empirically, <b>IPOC</b> is validated through extensive experiments on five real-world datasets, including public datasets and different types of workload collected from Bytedance Cloud, with comparisons to 25 baselines and 4 forecasting horizons (1/5/10/30). Specifically, <b>IPOC</b> achieves an average reduction of over 20% in RMSE/MAE/SMAPE/<inline-formula><tex-math>$rho$</tex-math></inline-formula>-risk compared to baselines across five datasets. Besides, we apply our model to a case study on predictive auto-scaling tasks in actual large-scale cloud systems to validate its utility.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 5","pages":"3277-3290"},"PeriodicalIF":10.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147665218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Which Data Harms My Regression Model: Enhancing Model Performance on Low-Quality Data Through Fast Data Attribution 哪些数据损害了我的回归模型:通过快速数据归因提高低质量数据的模型性能
IF 10.4 2区 计算机科学
IEEE Transactions on Knowledge and Data Engineering Pub Date : 2026-03-01 Epub Date: 2026-03-19 DOI: 10.1109/TKDE.2026.3675903
Qingkai Sui;Yalin Wang;Chenliang Liu;Diju Liu;Xiaofang Chen;Yongfang Xie
{"title":"Which Data Harms My Regression Model: Enhancing Model Performance on Low-Quality Data Through Fast Data Attribution","authors":"Qingkai Sui;Yalin Wang;Chenliang Liu;Diju Liu;Xiaofang Chen;Yongfang Xie","doi":"10.1109/TKDE.2026.3675903","DOIUrl":"https://doi.org/10.1109/TKDE.2026.3675903","url":null,"abstract":"With the rapid advancement of model architectures, the accuracy of industrial predictive modeling now largely hinges on data quality. However, real-world industrial datasets frequently contain low-quality samples that compromise model performance. While existing data preprocessing methods can effectively remove salient outliers, they persistently struggle to detect latent anomalies. To address this challenge, this paper proposes a fast data attribution-based dataset selection method for regression models, termed <inline-formula><tex-math>${mathrm{F{scriptscriptstyle AST}}DAR}$</tex-math></inline-formula>, which enables the model to identify training samples that are detrimental to its performance and subsequently perform dataset selection. <inline-formula><tex-math>${mathrm{F{scriptscriptstyle AST}}DAR}$</tex-math></inline-formula> integrates deep network data attribution into the Leave-One-Out (LOO) influence calculation paradigm of linear regression models through model linearization and parameter dimensionality reduction. Considering the synergy among samples, the truncated Monte Carlo method is adopted to estimate marginal influences of each sample, and sample utility is defined for dataset selection. Validation on real-world industrial datasets demonstrates the effectiveness and practicality of our method. Experimental results show that models trained on <inline-formula><tex-math>${mathrm{F{scriptscriptstyle AST}}DAR}$</tex-math></inline-formula>-selected data achieve significant performance improvements on both validation and test sets, outperforming multiple baseline methods.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 5","pages":"3321-3334"},"PeriodicalIF":10.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147665298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Toward Learning Shift-Invariant Representations for Healthcare Series Classification 医疗保健系列分类的移位不变表示学习
IF 10.4 2区 计算机科学
IEEE Transactions on Knowledge and Data Engineering Pub Date : 2026-03-01 Epub Date: 2026-02-25 DOI: 10.1109/TKDE.2026.3667978
Yuanbo Liu;Xiucheng Li;Xinyang Chen;Hongwei Liu;Zhijun Li
{"title":"Toward Learning Shift-Invariant Representations for Healthcare Series Classification","authors":"Yuanbo Liu;Xiucheng Li;Xinyang Chen;Hongwei Liu;Zhijun Li","doi":"10.1109/TKDE.2026.3667978","DOIUrl":"https://doi.org/10.1109/TKDE.2026.3667978","url":null,"abstract":"Accurate classification of healthcare time series is critical for clinical decision-making. However, existing models often struggle under real-world data shifts and lack interpretability—two key requirements for reliable medical deployment. To address these challenges, we propose <bold>SHINE</b>, a novel end-to-end framework that learns disentangled and shift-invariant representations by modeling the generative process of multivariate healthcare signals. Specifically, SHINE first introduces a genuine data representation learning that disentangles healthcare signals into trend, seasonality, and noise components, reflecting distinct temporal dynamics of healthcare series. Then, we inject several inductive biases into each component to encourage latent representations to be invariant to data shifts and aligned with their corresponding semantic units. Extensive experiments on six healthcare benchmarks spanning ECG, EEG, and continuous glucose monitoring (CGM) domains—under a variety of simulated real-world shift scenarios—demonstrate that SHINE consistently outperforms state-of-the-art baselines, providing robust performance and clinically meaningful interpretations grounded in the estimated components.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 5","pages":"3222-3233"},"PeriodicalIF":10.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147665206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VMPQ: An Efficient Protocol for Privacy-Preserving and Verifiable Multi-Predicate Queries Over Time-Series Databases VMPQ:一种有效的时间序列数据库隐私保护和可验证多谓词查询协议
IF 10.4 2区 计算机科学
IEEE Transactions on Knowledge and Data Engineering Pub Date : 2026-03-01 Epub Date: 2026-02-17 DOI: 10.1109/TKDE.2026.3665631
Xuan Jing;Fei Xiao;Jianfeng Wang
{"title":"VMPQ: An Efficient Protocol for Privacy-Preserving and Verifiable Multi-Predicate Queries Over Time-Series Databases","authors":"Xuan Jing;Fei Xiao;Jianfeng Wang","doi":"10.1109/TKDE.2026.3665631","DOIUrl":"https://doi.org/10.1109/TKDE.2026.3665631","url":null,"abstract":"With the widespread adoption of cloud storage, time-series databases have become indispensable for managing and analyzing sequential data generated on the user side over time (i.e., time-series data), thereby alleviating the computational and storage burden on resource-constrained users. However, critical security and privacy challenges—such as query privacy leakage, data exposure, and threats to storage integrity—remain inadequately addressed by existing solutions. To this end, we propose VMPQ, an efficient protocol for privacy-preserving and verifiable multi-predicate queries over time-series databases. Specifically, we introduce a new cryptographic primitive, verifiable offline/online private information retrieval (V-OO-PIR), which supports sublinear retrieval complexity while simultaneously ensuring both query privacy and result verifiability against untrusted servers. Building on V-OO-PIR, we design a dual-layer security framework that integrates replicated secret sharing (RSS) and secure multiparty computation (MPC): 1) RSS splits time-series data into two shares stored across two non-colluding servers, ensuring data confidentiality and mitigating exposure risks, and 2) MPC performs secure multiplication directly on these shares, enabling efficient evaluation of multi-predicate queries without reconstructing the original data. As a result, VMPQ ensures query privacy by preventing servers from inferring user interests across multiple predicates, while simultaneously guaranteeing data confidentiality and the verifiability of query results. Theoretical analysis confirms the security of VMPQ against malicious adversaries. Experimental results demonstrate that VMPQ reduces query latency by up to 5× compared to the state-of-the-art solution Waldo, while also enhancing throughput and preserving high storage efficiency through optimized database encoding.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 5","pages":"3306-3320"},"PeriodicalIF":10.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147665239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Training-Free and Unbiased Graph Collaborative Filtering for Personalized Recommendations 个性化推荐的无训练无偏图协同过滤
IF 10.4 2区 计算机科学
IEEE Transactions on Knowledge and Data Engineering Pub Date : 2026-03-01 Epub Date: 2026-03-03 DOI: 10.1109/TKDE.2026.3669816
Ziyang Liu;Chaokun Wang;Cheng Wu;Leqi Zheng;Hao Feng;Hang Zhang
{"title":"Training-Free and Unbiased Graph Collaborative Filtering for Personalized Recommendations","authors":"Ziyang Liu;Chaokun Wang;Cheng Wu;Leqi Zheng;Hao Feng;Hang Zhang","doi":"10.1109/TKDE.2026.3669816","DOIUrl":"https://doi.org/10.1109/TKDE.2026.3669816","url":null,"abstract":"With the widespread adoption of collaborative filtering techniques for personalized recommendations, exposure bias has become a significant challenge. <italic>Exposure bias</i> refers to the tendency of recommendation models to disproportionately favor items with high exposure over those with low exposure. In graph collaborative filtering that uses graph neural networks (GNNs) for recommendations, exposure bias can be exacerbated due to 1) the reliance on positive feedback during graph construction and 2) the effects of the neighbor aggregation step in GNNs. To tackle this challenge, we propose a novel and efficient framework called FUGCF (training-<bold>F</b>ree and <bold>U</b>nbiased <bold>G</b>raph <bold>C</b>ollaborative <bold>F</b>iltering) to improve both the accuracy and bias mitigation of graph-based personalized recommendations. FUGCF employs a two-stage calculation strategy: it estimates exposure probabilities in the first stage and then leverages these exposure probabilities to help derive debiased node embeddings in the second stage. Furthermore, we design a training-free estimation method for FUGCF based on closed-form solutions to enhance its computation efficiency. The extensive experiments on a synthetic dataset and three real-world datasets demonstrate the effectiveness of FUGCF in reducing exposure bias, improving recommendation accuracy, and optimizing computation efficiency.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 5","pages":"3234-3249"},"PeriodicalIF":10.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147665262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unveiling Densest Multilayer Subgraphs via Greedy Peeling 通过贪婪剥离揭示最密集的多层子图
IF 10.4 2区 计算机科学
IEEE Transactions on Knowledge and Data Engineering Pub Date : 2026-03-01 Epub Date: 2026-03-02 DOI: 10.1109/TKDE.2026.3668969
Dandan Liu;Zhaonian Zou;Run-An Wang
{"title":"Unveiling Densest Multilayer Subgraphs via Greedy Peeling","authors":"Dandan Liu;Zhaonian Zou;Run-An Wang","doi":"10.1109/TKDE.2026.3668969","DOIUrl":"https://doi.org/10.1109/TKDE.2026.3668969","url":null,"abstract":"The densest subgraphs in multilayer (ML) graphs unveil intricate relationships that are missed by simple graph representations, offering profound insights and applications across diverse domains. In this paper, we present a layer-oriented view of existing density measures for ML graphs and highlight their problems in identifying the densest subgraphs under the layer-oriented densities, including inefficiency, poor approximation ratios, and the lack of a unified algorithmic framework. In light of this, we introduce a new family of vertex-oriented density measures called generalized density. The two parameters <inline-formula><tex-math>$q$</tex-math></inline-formula> and <inline-formula><tex-math>$p$</tex-math></inline-formula> allow the generalized density to flexibly adjust its focus in the density evaluation. We investigate the problem of finding the ML subgraph that maximizes the generalized density and show that the problem can be solved using a unified greedy vertex peeling framework with strong approximation guarantees for half of the <inline-formula><tex-math>$(q, p)$</tex-math></inline-formula> parameter space. Specifically, for four regimes of <inline-formula><tex-math>$(q, p)$</tex-math></inline-formula>, we design tailored vertex-peeling strategies that lead to approximation algorithms with provable approximation ratios and precise time complexity bounds. We also develop a highly efficient implementation that reduces the execution time of greedy peeling to near-linear time for two of the four explored regimes of <inline-formula><tex-math>$(q, p)$</tex-math></inline-formula>. Extensive experiments on ten real-world ML graphs reveal that our generalized density and greedy peeling algorithms can effectively uncover different types of dense ML subgraphs in large-scale ML graphs.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 5","pages":"3291-3305"},"PeriodicalIF":10.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147665264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
2025 Reviewers List 2025审稿人名单
IF 10.4 2区 计算机科学
IEEE Transactions on Knowledge and Data Engineering Pub Date : 2026-02-12 DOI: 10.1109/TKDE.2026.3652658
{"title":"2025 Reviewers List","authors":"","doi":"10.1109/TKDE.2026.3652658","DOIUrl":"https://doi.org/10.1109/TKDE.2026.3652658","url":null,"abstract":"","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 3","pages":"2108-2121"},"PeriodicalIF":10.4,"publicationDate":"2026-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11395241","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146162185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
XiYan-SQL: A Novel Multi-Generator Framework for Text-to-SQL XiYan-SQL:一个新的文本到sql的多生成器框架
IF 10.4 2区 计算机科学
IEEE Transactions on Knowledge and Data Engineering Pub Date : 2026-01-26 DOI: 10.1109/TKDE.2026.3657851
Yifu Liu;Yin Zhu;Yingqi Gao;Zhiling Luo;Xiaoxia Li;Xiaorong Shi;Yuntao Hong;Jinyang Gao;Yu Li;Bolin Ding;Jingren Zhou
{"title":"XiYan-SQL: A Novel Multi-Generator Framework for Text-to-SQL","authors":"Yifu Liu;Yin Zhu;Yingqi Gao;Zhiling Luo;Xiaoxia Li;Xiaorong Shi;Yuntao Hong;Jinyang Gao;Yu Li;Bolin Ding;Jingren Zhou","doi":"10.1109/TKDE.2026.3657851","DOIUrl":"https://doi.org/10.1109/TKDE.2026.3657851","url":null,"abstract":"To leverage the advantages of LLM in addressing challenges in the Text-to-SQL task, we present XiYan-SQL, an innovative framework effectively generating and utilizing multiple SQL candidates. It consists of three components: 1) a Schema Filter module filtering and obtaining multiple relevant schemas; 2) a multi-generator ensemble approach generating multiple high-quality and diverse SQL queries; 3) a selection model with a candidate reorganization strategy implemented to obtain the optimal SQL query. Specifically, for the multi-generator ensemble, we employ a multi-task fine-tuning strategy to enhance the capabilities of SQL generation models for the intrinsic alignment between SQL and text, and construct multiple generation models with distinct generation styles by fine-tuning across different SQL formats. The experimental results and comprehensive analysis demonstrate the effectiveness and robustness of our framework. Overall, XiYan-SQL achieves a new SOTA performance of 75.63% on the notable BIRD benchmark, surpassing all previous methods. It also attains SOTA performance on the Spider test set with an accuracy of 89.65%.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 4","pages":"2474-2487"},"PeriodicalIF":10.4,"publicationDate":"2026-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147374362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书