{"title":"MSC-DOLES: Multi-View Subspace Clustering in Diverse Orthogonal Latent Embedding Spaces","authors":"Yuan Fang;Geping Yang;Ruichu Cai;Yiyang Yang;Zhiguo Gong;Can Chen;Zhifeng Hao","doi":"10.1109/TKDE.2025.3610659","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3610659","url":null,"abstract":"In the domain of Multi-view Subspace Clustering (MSC) in Latent Embedding Space (LES), existing methods aim to capture and leverage critical multi-view information by mapping it into a low-dimensional LES. However, several aspects can be further improved: (i) Fusion Strategy: Existing methods adopt either early fusion or late fusion to integrate multi-view information, limiting the effectiveness of the fusion. (ii) Diversity: Current methods often overlook the inherent diversity in the multi-view data by focusing on a single LES. (iii) Efficiency: LES-based methods exhibit high computational complexity, with cubic time and quadratic space requirements based on the number of samples. To address these issues, we propose a novel framework called MSC-DOLES (Multi-view Subspace Clustering in Diverse Orthogonal Latent Embedding Spaces), a novel framework designed to tackle these challenges. MSC-DOLES incorporates a two-stage fusion approach that generates and learns from multiple LES to maximize cross-view diversity. Orthogonality constraints on individual LES ensure view-internal diversity, resulting in a set of Diverse Orthogonal Latent Embedding Spaces (DOLES). The DOLES are then fused into a consensus anchor graph using learnable anchors. The final clustering is induced by partitioning the obtained graph without pre-processing. We develop an eight-step optimization algorithm for MSC-DOLES, which exhibits nearly linear time and space complexities relative to the number of samples. Extensive experiments demonstrate the superiority of MSC-DOLES over state-of-the-art methods.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 12","pages":"7315-7327"},"PeriodicalIF":10.4,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145456055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Numerical Data Collection Under Input-Discriminative Local Differential Privacy","authors":"Youwen Zhu;Shibo Dai;Pengfei Zhang;Xiqi Kuang","doi":"10.1109/TKDE.2025.3610932","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3610932","url":null,"abstract":"Input-discriminative local differential privacy (ID-LDP) protects user data with a different range of values, which improves the utility of the estimated data compared to traditional LDP. However, the existing ID-LDP methods are used for categorical data and cannot be directly applied to numerical data. In this paper, we propose a numerical data collection (NDC) framework with ID-LDP to provide discriminative protection for the data with different inputs. This framework uses a piecewise mechanism to divide the numerical data into several segments and designs two perturbation methods to minimize the mean value of numerical data based on values submitted by users. We first create an NDC-UE method that encodes the raw data into a binary vector. This method sets the uploaded data bit as 1 and the rest as zero and perturbs each bit with a given probability. We further propose an NDC-GRR algorithm to perturb the numerical data with an optimal privacy budget. To reduce the complexity of NDC-GRR, we apply a greedy algorithm-based spanner to shorten the computation time and improve the accuracy. Theoretical analysis proves that our schemes satisfy the definition of ID-LDP. Experimental results based on two real-world datasets and a synthetic dataset show that the proposed schemes have less mean square error compared with the benchmarks.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 12","pages":"7346-7361"},"PeriodicalIF":10.4,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145455982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Empowering Explainable Artificial Intelligence Through Case-Based Reasoning: A Comprehensive Exploration","authors":"Preeja Pradeep;Marta Caro-Martínez;Anjana Wijekoon","doi":"10.1109/TKDE.2025.3609825","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3609825","url":null,"abstract":"Artificial intelligence (AI) advancements have significantly broadened its application across various sectors, simultaneously elevating concerns regarding the transparency and understandability of AI-driven decisions. Addressing these concerns, this paper embarks on an exploratory journey into Case-Based Reasoning (CBR) and Explainable Artificial Intelligence (XAI), critically examining their convergence and the potential this synergy holds for demystifying the decision-making processes of AI systems. We employ the concept of Explainable CBR (XCBR) system that leverages CBR to acquire case-based explanations or generate explanations using CBR methodologies to enhance AI decision explainability. Though the literature has few surveys on XCBR, recognizing its potential necessitates a detailed exploration of the principles for developing effective XCBR systems. We present a cycle-aligned perspective that examines how explainability functions can be embedded throughout the classical CBR phases: Retrieve, Reuse, Revise, and Retain. Drawing from a comprehensive literature review, we propose a set of six functional goals that reflect key explainability needs. These goals are mapped to six thematic categories, forming the basis of a structured XCBR taxonomy. The discussion extends to the broader challenges and prospects facing the CBR-XAI arena, setting the stage for future research directions. This paper offers design guidance and conceptual grounding for future XCBR research and system development.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 12","pages":"7120-7139"},"PeriodicalIF":10.4,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11165042","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145455857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Similarity and Dissimilarity Guided Co-Association Matrix Construction for Ensemble Clustering","authors":"Xu Zhang;Yuheng Jia;Mofei Song;Ran Wang","doi":"10.1109/TKDE.2025.3608721","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3608721","url":null,"abstract":"Ensemble clustering aggregates multiple weak clusterings to achieve a more accurate and robust consensus result. The Co-Association matrix (CA matrix) based method is the mainstream ensemble clustering approach that constructs the similarity relationships between sample pairs according the weak clustering partitions to generate the final clustering result. However, the existing methods neglect that the quality of cluster is related to its size, i.e., a cluster with smaller size tends to higher accuracy. Moreover, they also do not consider the valuable dissimilarity information in the base clusterings which can reflect the varying importance of sample pairs that are completely disconnected. To this end, we propose the Similarity and Dissimilarity Guided Co-association matrix (SDGCA) to achieve ensemble clustering. First, we introduce normalized ensemble entropy to estimate the quality of each cluster, and construct a similarity matrix based on this estimation. Then, we employ the random walk to explore high-order proximity of base clusterings to construct a dissimilarity matrix. Finally, the adversarial relationship between the similarity matrix and the dissimilarity matrix is utilized to construct a promoted CA matrix for ensemble clustering. We compared our method with 13 state-of-the-art methods across 12 datasets, and the results demonstrated the superior clustering ability and robustness of the proposed approach.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 11","pages":"6694-6707"},"PeriodicalIF":10.4,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145242590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qi Xiong;Kai Tang;Minbo Ma;Ji Zhang;Jie Xu;Tianrui Li
{"title":"Modeling Temporal Dependencies Within the Target for Long-Term Time Series Forecasting","authors":"Qi Xiong;Kai Tang;Minbo Ma;Ji Zhang;Jie Xu;Tianrui Li","doi":"10.1109/TKDE.2025.3609415","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3609415","url":null,"abstract":"Long-term time series forecasting (LTSF) is a critical task across diverse domains. Despite significant advancements in LTSF research, we identify a performance bottleneck in existing LTSF methods caused by the inadequate modeling of Temporal Dependencies within the Target (TDT). To address this issue, we propose a novel and generic temporal modeling framework, Temporal Dependency Alignment (TDAlign), that equips existing LTSF methods with TDT learning capabilities. TDAlign introduces two key innovations: 1) a loss function that aligns the change values between adjacent time steps in the predictions with those in the target, ensuring consistency with variation patterns, and 2) an adaptive loss balancing strategy that seamlessly integrates the new loss function with existing LTSF methods without introducing additional learnable parameters. As a plug-and-play framework, TDAlign enhances existing methods with minimal computational overhead, featuring only linear time complexity and constant space complexity relative to the prediction length. Extensive experiments on six strong LTSF baselines across seven real-world datasets demonstrate the effectiveness and flexibility of TDAlign. On average, TDAlign reduces baseline prediction errors by <bold>1.47%</b> to <bold>9.19%</b> and change value errors by <bold>4.57%</b> to <bold>15.78%</b>, highlighting its substantial performance improvements.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 12","pages":"7300-7314"},"PeriodicalIF":10.4,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145455930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Next-Generation Database Interfaces: A Survey of LLM-Based Text-to-SQL","authors":"Zijin Hong;Zheng Yuan;Qinggang Zhang;Hao Chen;Junnan Dong;Feiran Huang;Xiao Huang","doi":"10.1109/TKDE.2025.3609486","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3609486","url":null,"abstract":"Generating accurate SQL from users’ natural language questions (text-to-SQL) remains a long-standing challenge due to the complexities involved in user question understanding, database schema comprehension, and SQL generation. Traditional text-to-SQL systems, which combine human engineering and deep neural networks, have made significant progress. Subsequently, pre-trained language models (PLMs) have been developed for text-to-SQL tasks, achieving promising results. However, as modern databases and user questions grow more complex, PLMs with a limited parameter size often produce incorrect SQL. This necessitates more sophisticated and tailored optimization methods, which restrict the application of PLM-based systems. Recently, large language models (LLMs) have shown significant capabilities in natural language understanding as model scale increases. Thus, integrating LLM-based solutions can bring unique opportunities, improvements, and solutions to text-to-SQL research. In this survey, we provide a comprehensive review of existing LLM-based text-to-SQL studies. Specifically, we offer a brief overview of the technical challenges and evolutionary process of text-to-SQL. Next, we introduce the datasets and metrics designed to evaluate text-to-SQL systems. Subsequently, we present a systematic analysis of recent advances in LLM-based text-to-SQL. Finally, we make a summary and discuss the remaining challenges in this field and suggest expectations for future research directions.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 12","pages":"7328-7345"},"PeriodicalIF":10.4,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145455948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Toward Effective and Transferable Detection for Multi-Modal Fake News in the Social Media Stream","authors":"Jingyi Xie;Jiawei Liu;Zheng-jun Zha","doi":"10.1109/TKDE.2025.3609045","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3609045","url":null,"abstract":"The rapid proliferation of multimedia fake news on social media has raised significant concerns in recent years. Existing studies on fake news detection predominantly adopt an instance-based paradigm, where the detector evaluates a single post to determine its veracity. Despite notable advancements achieved in this domain, we argue that the instance-based approach is misaligned with real-world deployment scenarios. In practice, detectors typically operate on servers that process incoming posts in temporal order, striving to assess their authenticity promptly. Instance-based detectors lack awareness of temporal information and contextual relationships between surrounding posts, therefore fail to capture long-range dependencies from the timeline. To bridge this gap, we introduce a more practical stream-based multi-modal fake news detection paradigm, which assumes that social media posts arrive continuously over time and allows the utilization of previously seen posts to aid in the classification of incoming ones. To enable effective and transferable fake news detection under this novel paradigm, we propose maintaining historical knowledge as a collection of incremental high-level forgery patterns. Based on this principle, we design a novel framework called Incremental Forgery Pattern Learning and Clues Refinement (IPLCR). IPLCR incrementally learns high-level forgery patterns as the stream evolves, leveraging this knowledge to improve the detection of newly arrived posts. At the core of IPLCR is the Incremental Forgery Pattern Bank (IPB), which dynamically summarizes historical posts into a set of latent forgery patterns. IPB is designed to continuously incorporate timely knowledge and actively discard obsolete information, even during inference. When a new post arrives, IPLCR retrieves the most relevant forgery pattern knowledge from IPB and refines the clues for fake news detection. The refined clues are subsequently incorporated into IPB to enrich its knowledge base. Extensive experiments validate IPLCR’s effectiveness as a robust stream-based detector. Moreover, IPLCR addresses several critical issues relevant to industrial applications, including seamless context transfer and efficient model upgrading, making it a practical solution for real-world deployment.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 11","pages":"6723-6737"},"PeriodicalIF":10.4,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145242610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Flexible Keyword-Aware Top-$k$k Route Search","authors":"Ziqiang Yu;Xiaohui Yu;Yueting Chen;Wei Liu;Anbang Song;Bolong Zheng","doi":"10.1109/TKDE.2025.3609302","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3609302","url":null,"abstract":"With the rise of Large Language Models (LLMs), tourists increasingly use it for route planning by entering keywords for attractions, instead of relying on traditional manual map services. LLMs provide generally reasonable suggestions, but often fail to generate optimal plans that account for detailed user requirements, given the vast number of potential POIs and possible routes based on POI combinations within a real-world road network. In this case, a route-planning API could serve as an external tool, accepting a sequence of keywords and returning the top-<inline-formula><tex-math>$k$</tex-math></inline-formula> best routes tailored to user requests. To address this need, this paper introduces the Keyword-Aware Top-<inline-formula><tex-math>$k$</tex-math></inline-formula> Routes (KATR) query that provides a more flexible and comprehensive semantic to route planning that caters to various user’s preferences including flexible POI visiting order, flexible travel distance budget, and personalized POI ratings. Subsequently, we propose an explore-and-bound paradigm to efficiently process KATR queries by eliminating redundant candidates based on estimated score bounds from global to local levels. Extensive experiments demonstrate our approach’s superior performance over existing methods across different scenarios.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 12","pages":"7184-7198"},"PeriodicalIF":10.4,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145455869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Uncertain Priors for Graphical Causal Models: A Multi-Objective Optimization Perspective","authors":"Zidong Wang;Xiaoguang Gao;Qingfu Zhang","doi":"10.1109/TKDE.2025.3608723","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3608723","url":null,"abstract":"Learning graphical causal models from observational data can effectively elucidate the underlying causal mechanism behind the variables. In the context of limited datasets, modelers often incorporate prior knowledge, which is assumed to be correct, as a penalty in single-objective optimization. However, this approach struggles to adapt complex and uncertain priors effectively. This paper introduces UpCM, which tackles the issue from a multi-objective optimization perspective. Instead of focusing exclusively on the DAG as the optimization goal, UpCM methodically evaluate the effect of uncertain priors on specific structures, merging data-driven and knowledge-driven objectives. Utilizing the MOEA/D framework, it achieve a balanced trade-off between these objectives. Furthermore, since uncertain priors may introduce erroneous constraints, resulting in PDAGs lacking consistent extensions, the minimal non-consistent extension is explored. This extension, which separately incorporates positive and negative constraints, aims to approximate the true causality of the PDAGs. Experimental results demonstrate that UpCM achieves significant structural accuracy improvements compared to baseline methods. It reduces the SHD by 7.94%, 13.23%, and 12.8% relative to PC_stable, GES, and MAHC, respectively, when incorporating uncertain priors. In downstream inference tasks, UpCM outperforms domain-expert knowledge graphs, owing to its ability to learn explainable causal relationships that balance data-driven evidence with prior knowledge.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 12","pages":"7426-7439"},"PeriodicalIF":10.4,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145455998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SandwichSketch: A More Accurate Sketch for Frequent Object Mining in Data Streams","authors":"Zhuochen Fan;Ruixin Wang;Zihan Jiang;Ruwen Zhang;Tong Yang;Sha Wang;Yuhan Wu;Ruijie Miao;Kaicheng Yang;Bui Cui","doi":"10.1109/TKDE.2025.3607691","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3607691","url":null,"abstract":"Frequent object mining has gained considerable interest in the research community and can be split into frequent item mining and frequent set mining depending on the type of object. While existing sketch-based algorithms have made significant progress in addressing these two tasks concurrently, they also possess notable limitations. They either support only software platforms with low throughput or compromise accuracy for faster processing speed and better hardware compatibility. In this paper, we make a substantial stride towards supporting frequent object mining by designing SandwichSketch, which draws inspiration from sandwich making and proposes two techniques including the double fidelity enhancement and hierarchical hot locking to guarantee high fidelity on both two tasks. We implement SandwichSketch on three platforms (CPU, Redis, and FPGA) and show that it enhances accuracy by <inline-formula><tex-math>$38.4times$</tex-math></inline-formula> and <inline-formula><tex-math>$5times$</tex-math></inline-formula> for two tasks on three real-world datasets, respectively. Additionally, it supports a distributed measurement scenario with less than a 0.01% decrease in Average Relative Error (ARE) when the number of nodes increases from 1 to 16.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 11","pages":"6636-6650"},"PeriodicalIF":10.4,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145242587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}