{"title":"Intent Propagation Contrastive Collaborative Filtering","authors":"Haojie Li;Junwei Du;Guanfeng Liu;Feng Jiang;Yan Wang;Xiaofang Zhou","doi":"10.1109/TKDE.2025.3543241","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3543241","url":null,"abstract":"Disentanglement techniques used in collaborative filtering uncover interaction intents between nodes, improving the interpretability of node representations and enhancing recommendation performance. However, existing disentanglement methods still face the following two problems. 1) They focus on local structural features derived from direct node interactions, overlooking the comprehensive graph structure, which limits disentanglement accuracy. 2) The disentanglement process depends on backpropagation signals derived from recommendation tasks, lacking direct supervision, which may lead to biases and overfitting. To address the issues, we propose the <bold>I</b>ntent <bold>P</b>ropagation <bold>C</b>ontrastive <bold>C</b>ollaborative <bold>F</b>iltering (IPCCF) algorithm. Specifically, we design a double helix message propagation framework to more effectively extract the deep semantic information of nodes, thereby improving the model's understanding of interactions between nodes. An intent message propagation method is also developed that incorporates graph structure information into the disentanglement process, thereby expanding the consideration scope of disentanglement. In addition, contrastive learning techniques are employed to align node representations derived from the structure and intents, providing direct supervision for the disentanglement process, mitigating biases, and enhancing the model's robustness to overfitting. The experiments on three real data graphs illustrate the superiority of the proposed approach.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 5","pages":"2665-2679"},"PeriodicalIF":8.9,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143769557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Practical Equi-Join Over Encrypted Database With Reduced Leakage","authors":"Qiaoer Xu;Jianfeng Wang;Shi-Feng Sun;Zhipeng Liu;Xiaofeng Chen","doi":"10.1109/TKDE.2025.3543168","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3543168","url":null,"abstract":"Secure join schemes, an important class of queries over encrypted databases, have attracted increasing attention. While efficient querying is paramount, data owners also emphasize the significance of privacy preservation. The state-of-the-art JXT (Jutla and Patranabis ASIACRYPT 2022) enables efficient join queries over encrypted tables with a symmetric-key solution. However, we observe that JXT inadvertently leaks undesirable query results as the number of queries increases. In this paper, we propose a novel equi-join scheme, One-Time Join Cross-Tags (OTJXT), which can avoid additional result leakage in multiple queries and extend to equi-join as opposed to natural join in JXT. Specifically, we design a new data encoding method using nonlinear transformations that reveals only the union of results for each query without extra leakage observed in JXT. Moreover, OTJXT addresses the linear search complexity issue (Shafieinejad et al. ICDE 2022) while preventing multiple query leakage. Finally, we implement OTJXT and compare its performance with JXT and Shafieinejad et al.'s scheme on the TPC-H dataset. The results show that OTJXT outperforms in search and storage efficiency, achieving a <inline-formula><tex-math>$mathbf {98.5times }$</tex-math></inline-formula> (resp., <inline-formula><tex-math>$mathbf {10^{6}times }$</tex-math></inline-formula>) speedup in search latency and reducing storage cost by 62.5% (resp., 78.5%), compared to JXT (resp., Shafieinejad et al.'s scheme). Using OTJXT, a TPC-H query on a 40 MB database only takes 21 ms.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 5","pages":"2846-2860"},"PeriodicalIF":8.9,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143769540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jingwei Chen;Zihan Wu;Jingqing Cheng;Xiaohua Xu;Feiping Nie
{"title":"Graph Clustering With Harmonic-Maxmin Cut Guidance","authors":"Jingwei Chen;Zihan Wu;Jingqing Cheng;Xiaohua Xu;Feiping Nie","doi":"10.1109/TKDE.2025.3542839","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3542839","url":null,"abstract":"Graph clustering has become a crucial technique for uncovering community structures in complex network data. However, existing approaches often introduce cumbersome regularization or constraints (hyperparameter tuning burden) to obtain balanced clustering results, thereby increasing hyperparameter tuning requirements and intermediate variables. These limitations can lead to suboptimal performance, particularly in scenarios involving imbalanced clusters or large-scale datasets. Besides, most graph cut clustering methods solve two separate discrete problems, resulting in information loss and relying on time-consuming eigen-decomposition. To address these challenges, this paper propose an effective graph cut framework, termed Harmonic MaxMin Cut (HMMC), inspired by worst-case objective optimization and the harmonic mean. Unlike traditional spectral clustering, HMMC produces all cluster assignments in a single step, eliminating the need for additional discretization and notably enhancing robustness to “worst-case cluster” boundaries. this paper further devise a fast coordinate descent (CD) solver that scales linearly complexity with the graph size, offering a computationally efficient alternative to eigen decomposition. Extensive experiments on real-world datasets demonstrate that HMMC is comparable to, or even surpasses, state-of-the-art methods, while also finding more favorable local solutions than non-negative matrix factorization techniques.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 5","pages":"2600-2613"},"PeriodicalIF":8.9,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143769366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SCHENO: Measuring Schema vs. Noise in Graphs","authors":"Justus Isaiah Hibshman;Adnan Hoq;Tim Weninger","doi":"10.1109/TKDE.2025.3543032","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3543032","url":null,"abstract":"Real-world data is typically a noisy manifestation of a core pattern (<italic>schema</i>), and the purpose of data mining algorithms is to uncover that pattern, thereby splitting (<italic>i.e.</i> decomposing) the data into schema and noise. We introduce SCHENO, a principled evaluation metric for the goodness of a schema-noise decomposition of a graph. SCHENO captures how schematic the schema is, how noisy the noise is, and how well the combination of the two represent the original graph data. We visually demonstrate what this metric prioritizes in small graphs, then show that if SCHENO is used as the fitness function for a simple optimization strategy, we can uncover a wide variety of patterns. Finally, we evaluate several well-known graph mining algorithms with this metric; we find that although they produce patterns, those patterns are not always the best representation of the input data.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 5","pages":"2946-2957"},"PeriodicalIF":8.9,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143769359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yang An;Zhibin Li;Xiaoyu Li;Wei Liu;Xinghao Yang;Haoliang Sun;Meng Chen;Yu Zheng;Yongshun Gong
{"title":"Spatio-Temporal Multivariate Probabilistic Modeling for Traffic Prediction","authors":"Yang An;Zhibin Li;Xiaoyu Li;Wei Liu;Xinghao Yang;Haoliang Sun;Meng Chen;Yu Zheng;Yongshun Gong","doi":"10.1109/TKDE.2025.3539680","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3539680","url":null,"abstract":"Traffic prediction is an essential task in intelligent transportation systems dealing with complex and dynamic spatio-temporal correlations. To date, most work is focused on point estimation models, which only output a single value w.r.t an attribute of traffic data at a time, falling short of depicting diverse situations and uncertainty in future. Besides, most methods are not flexible enough to handle real complex traffic scenarios, involving missing values and non-uniformly sampled data. The interactions among different attributes of traffic data are also rarely explored explicitly. In this paper, we focus on probabilistic estimation in traffic prediction tasks, proposing a spatio-temporal multivariate probabilistic predictive model to estimate the distributions of traffic data. Specifically, we devise a multivariate spatio-temporal fusion graph block to extract spatio-temporal correlations of multiple traffic attributes at different locations. A multi-graph fusion module is designed to capture time-varying spatial relationships. We estimate the joint distributions of missing traffic data using copulas. The proposed model can simultaneously perform traffic forecasting and interpolation tasks with non-uniformly sampled data. Our experiments on two real-world traffic datasets demonstrate the advantages of our model over the state-of-the-art<sup>1</sup>.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 5","pages":"2986-3000"},"PeriodicalIF":8.9,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143769556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Location-Guided Time-Series Shapelets","authors":"Akihiro Yamaguchi;Ken Ueno;Hisashi Kashima","doi":"10.1109/TKDE.2025.3536462","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3536462","url":null,"abstract":"Shapelets are interclass discriminative subsequences that can be used to characterize target classes. Learning shapelets by continuous optimization has recently been studied to improve classification accuracy. However, there are two issues in previous studies. First, since the locations where shapelets appear in the time series are determined by only their shapes, shapelets may appear at incorrect and non-discriminative locations in the time series, degrading the accuracy and interpretability. Second, the theoretical interpretation of learned shapelets has been limited to binary classification. To tackle the first issue, we propose a continuous optimization that learns not only shapelets but also their probable locations in a time series, and we show theoretically that this enhances feature discriminability. To tackle the second issue, we provide a theoretical interpretation of shapelet closeness to the time series for target / off-target classes when learning with softmax loss, which allows for multi-class classification. We demonstrate the effectiveness of the proposed method in terms of accuracy, runtime, and interpretability on the UCR archive.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 5","pages":"2712-2726"},"PeriodicalIF":8.9,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143769456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RAGIC: Risk-Aware Generative Framework for Stock Interval Construction","authors":"Jingyi Gu;Wenlu Du;Guiling Wang","doi":"10.1109/TKDE.2025.3533492","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3533492","url":null,"abstract":"Efforts to predict stock market outcomes have yielded limited success due to the inherently stochastic nature of the market, influenced by numerous unpredictable factors. Many existing prediction approaches focus on single-point predictions, lacking the depth needed for effective decision-making and often overlooking market risk. To bridge this gap, we propose <italic>RAGIC</i>, a novel risk-aware framework for stock <italic>interval</i> prediction to quantify uncertainty. Our approach leverages a Generative Adversarial Network (GAN) to produce future price sequences infused with randomness inherent in financial markets. <italic>RAGIC</i>’s generator detects the risk perception of informed investors and captures historical price trends globally and locally. Then the <italic>risk-sensitive intervals</i> is built upon the simulated future prices from sequence generation through statistical inference, incorporating <italic>horizon-wise</i> insights. The interval’s width is adaptively adjusted to reflect market volatility. Importantly, our approach relies solely on publicly available data and incurs only low computational overhead. <italic>RAGIC</i>’s evaluation across globally recognized broad-based indices demonstrates its balanced performance, offering both accuracy and informativeness. Achieving a consistent 95% coverage, <italic>RAGIC</i> maintains a narrow interval width. This promising outcome suggests that our approach effectively addresses the challenges of stock market prediction while incorporating vital risk considerations.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 4","pages":"2085-2096"},"PeriodicalIF":8.9,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143570797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-View Riemannian Manifolds Fusion Enhancement for Knowledge Graph Completion","authors":"Linyu Li;Zhi Jin;Xuan Zhang;Haoran Duan;Jishu Wang;Zhengwei Tao;Haiyan Zhao;Xiaofeng Zhu","doi":"10.1109/TKDE.2025.3538110","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3538110","url":null,"abstract":"As the application of knowledge graphs becomes increasingly widespread, the issue of knowledge graph incompleteness has garnered significant attention. As a classical type of non-euclidean spatial data, knowledge graphs possess various complex structural types. However, most current knowledge graph completion models are developed within a single space, which makes it challenging to capture the inherent knowledge information embedded in the entire knowledge graph. This limitation hinders the representation learning capability of the models. To address this issue, this paper focuses on how to better extend the representation learning from a single space to Riemannian manifolds, which are capable of representing more complex structures. We propose a new knowledge graph completion model called MRME-KGC, based on multi-view Riemannian Manifolds fusion to achieve this. Specifically, MRME-KGC simultaneously considers the fusion of four views: two hyperbolic Riemannian spaces with negative curvature, a Euclidean Riemannian space with zero curvature, and a spherical Riemannian space with positive curvature to enhance knowledge graph modeling. Additionally, this paper proposes a contrastive learning method for Riemannian spaces to mitigate the noise and representation issues arising from Multi-view Riemannian Manifolds Fusion. This paper presents extensive experiments on MRME-KGC across multiple datasets. The results consistently demonstrate that MRME-KGC significantly outperforms current state-of-the-art models, achieving highly competitive performance even with low-dimensional embeddings.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 5","pages":"2756-2770"},"PeriodicalIF":8.9,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143769363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Finding Rule-Interpretable Non-Negative Data Representation","authors":"Matej Mihelčić;Pauli Miettinen","doi":"10.1109/TKDE.2025.3538327","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3538327","url":null,"abstract":"Non-negative Matrix Factorization (NMF) is an intensively used technique for obtaining parts-based, lower dimensional and non-negative representation. Researchers in biology, medicine, pharmacy and other fields often prefer NMF over other dimensionality reduction approaches (such as PCA) because the non-negativity of the approach naturally fits the characteristics of the domain problem and its results are easier to analyze and understand. Despite these advantages, obtaining exact characterization and interpretation of the NMF’s latent factors can still be difficult due to their numerical nature. Rule-based approaches, such as rule mining, conceptual clustering, subgroup discovery and redescription mining, are often considered more interpretable but lack lower-dimensional representation of the data. We present a version of the NMF approach that merges rule-based descriptions with advantages of part-based representation offered by the NMF. Given the numerical input data with non-negative entries and a set of rules with high entity coverage, the approach creates the lower-dimensional non-negative representation of the input data in such a way that its factors are described by the appropriate subset of the input rules. In addition to revealing important attributes for latent factors, their interaction and value ranges, this approach allows performing focused embedding potentially using multiple overlapping target labels.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 5","pages":"2538-2549"},"PeriodicalIF":8.9,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10887020","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143777910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shanshan Wang;Fangzheng Yuan;Keyang Wang;Xun Yang;Xingyi Zhang;Meng Wang
{"title":"Dual-State Personalized Knowledge Tracing With Emotional Incorporation","authors":"Shanshan Wang;Fangzheng Yuan;Keyang Wang;Xun Yang;Xingyi Zhang;Meng Wang","doi":"10.1109/TKDE.2025.3538121","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3538121","url":null,"abstract":"Knowledge tracing has been widely used in online learning systems to guide the students’ future learning. However, most existing KT models primarily focus on extracting abundant information from the question sets and explore the relationships between them, but ignore the personalized student behavioral information in the learning process. This will limit the model’s ability to accurately capture the personalized knowledge states of students and reasonably predict their performances. To alleviate this limitation, we explicitly models the personalized learning process by incorporating the emotions, a representative personalized behavior in the learning process, into KT framework. Specifically, we present a novel Dual-State Personalized Knowledge Tracing with Emotional Incorporation model to achieve this goal: First, we incorporate emotional information into the modeling process of knowledge state, resulting in the Knowledge State Boosting Module. Second, we design an Emotional State Tracing Module to monitor students’ personalized emotional states, and propose an emotion prediction method based on personalized emotional states. Finally, we apply the predicted emotions to enhance students’ response prediction. Furthermore, to extend the generalization capability of our model across different datasets, we design a transferred version of DEKT, named Transfer Learning-based Self-loop model (T-DEKT). Extensive experiments show our method achieves the state-of-the-art performance.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 5","pages":"2440-2455"},"PeriodicalIF":8.9,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143769365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}