Wensheng Gan;Gengsen Huang;Jian Weng;Tianlong Gu;Philip S. Yu
{"title":"Towards Target Sequential Rules","authors":"Wensheng Gan;Gengsen Huang;Jian Weng;Tianlong Gu;Philip S. Yu","doi":"10.1109/TKDE.2025.3547394","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3547394","url":null,"abstract":"In many real-world applications, sequential rule mining (SRM) can offer prediction and recommendation functions for a variety of services. It is an important technique of pattern mining to discover all valuable rules that can reveal the temporal relationship between objects. Although several algorithms of SRM are proposed to solve various practical problems, there are no studies on the problem of targeted mining. Targeted sequential rule mining aims to obtain those interesting sequential rules that users focus on, thus avoiding the generation of other invalid and unnecessary rules. It can further improve the efficiency of users in analyzing rules and reduce the consumption of computing resources. In this paper, we first present the relevant definitions of target sequential rules and formulate the problem of targeted sequential rule mining. Then, we propose an efficient algorithm called TaSRM. Several pruning strategies and an optimization are introduced to improve the efficiency of TaSRM. Finally, a large number of experiments are conducted on different benchmarks, and we analyze the results in terms of running time, memory consumption, and scalability, as well as query cases with different query rules. It is shown that the novel algorithm TaSRM and its variants can achieve better experimental performance compared to the baseline algorithm.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 6","pages":"3766-3780"},"PeriodicalIF":8.9,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143896482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jun Zhang;Jue Wang;Huan Li;Zhongle Xie;Ke Chen;Lidan Shou
{"title":"${sf CHASe}$CHASe: Client Heterogeneity-Aware Data Selection for Effective Federated Active Learning","authors":"Jun Zhang;Jue Wang;Huan Li;Zhongle Xie;Ke Chen;Lidan Shou","doi":"10.1109/TKDE.2025.3547423","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3547423","url":null,"abstract":"Active learning (AL) reduces human annotation costs for machine learning systems by strategically selecting the most informative unlabeled data for annotation, but performing it individually may still be insufficient due to restricted data diversity and annotation budget. Federated Active Learning (FAL) addresses this by facilitating collaborative data selection and model training, while preserving the confidentiality of raw data samples. Yet, existing FAL methods fail to account for the heterogeneity of data distribution across clients and the associated fluctuations in global and local model parameters, adversely affecting model accuracy. To overcome these challenges, we propose <inline-formula><tex-math>${sf CHASe}$</tex-math></inline-formula> (Client Heterogeneity-Aware Data Selection), specifically designed for FAL. <inline-formula><tex-math>${sf CHASe}$</tex-math></inline-formula> focuses on identifying those unlabeled samples with high epistemic variations (EVs), which notably oscillate around the decision boundaries during training. To achieve both effectiveness and efficiency, <inline-formula><tex-math>${sf CHASe}$</tex-math></inline-formula> encompasses techniques for 1) tracking EVs by analyzing inference inconsistencies across training epochs, 2) calibrating decision boundaries of inaccurate models with a new alignment loss, and 3) enhancing data selection efficiency via a data freeze and awaken mechanism with subset sampling. Experiments show that <inline-formula><tex-math>${sf CHASe}$</tex-math></inline-formula> surpasses various established baselines in terms of effectiveness and efficiency, validated across diverse datasets, model complexities, and heterogeneous federation settings.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 6","pages":"3088-3102"},"PeriodicalIF":8.9,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143896222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dual-Channel Multiplex Graph Neural Networks for Recommendation","authors":"Xiang Li;Chaofan Fu;Zhongying Zhao;Guangjie Zheng;Chao Huang;Yanwei Yu;Junyu Dong","doi":"10.1109/TKDE.2025.3544081","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3544081","url":null,"abstract":"Effective recommender systems play a crucial role in accurately capturing user and item attributes that mirror individual preferences. Some existing recommendation techniques have started to shift their focus towards modeling various types of interactive relations between users and items in real-world recommendation scenarios, such as clicks, marking favorites, and purchases on online shopping platforms. Nevertheless, these approaches still grapple with two significant challenges: (1) Insufficient modeling and exploitation of the impact of various behavior patterns formed by multiplex relations between users and items on representation learning, and (2) ignoring the effect of different relations within behavior patterns on the target relation in recommender system scenarios. In this work, we introduce a novel recommendation framework, <bold><u>D</u></b>ual-<bold><u>C</u></b>hannel <bold><u>M</u></b>ultiplex <bold><u>G</u></b>raph <bold><u>N</u></b>eural <bold><u>N</u></b>etwork (DCMGNN), which addresses the aforementioned challenges. It incorporates an explicit behavior pattern representation learner to capture the behavior patterns composed of multiplex user-item interactive relations, and includes a relation chain representation learner and a relation chain-aware encoder to discover the impact of various auxiliary relations on the target relation, the dependencies between different relations, and mine the appropriate order of relations in a behavior pattern. Extensive experiments on three real-world datasets demonstrate that our DCMGNN surpasses various state-of-the-art recommendation methods. It outperforms the best baselines by 10.06% and 12.15% on average across all datasets in terms of Recall@10 and NDCG@10 respectively. The source code of our paper is available at <uri>https://github.com/lx970414/TKDE-DCMGNN</uri>.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 6","pages":"3327-3341"},"PeriodicalIF":8.9,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143896385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GPU-Accelerated Structural Diversity Search in Graphs","authors":"Jinbin Huang;Xin Huang;Jianliang Xu;Byron Choi;Yun Peng","doi":"10.1109/TKDE.2025.3547443","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3547443","url":null,"abstract":"The problem of structural diversity search has been widely studied recently, which aims to find out the users with the highest structural diversity in social networks. The structural diversity of a user is depicted by the number of social contexts inside his/her contact neighborhood. Three structural diversity models based on cohesive subgraph models (e.g., k-sized component, k-core, and k-truss), have been proposed. Previous solutions only focus on CPU-based sequential solutions, suffering from several key steps of that cannot be highly parallelized. GPUs enjoy high-efficiency performance in parallel computing for solving many complex graph problems such as triangle counting, subgraph pattern matching, and graph decomposition. In this paper, we provide a unified framework to utilize multiple GPUs to accelerate the computation of structural diversity search under the mentioned three structural diversity models. We first propose a GPU-based lock-free method to efficiently extract ego-networks in CSR format in parallel. Second, we design detailed GPU-based solutions for computing <italic>k</i>-sized component-based, <italic>k</i>-core-based, and also <italic>k</i>-truss-based structural diversity scores by dynamically grouping GPU resources. To effectively optimize the workload balance among multiple GPUs, we propose a greedy work-packing scheme and a dynamic work-stealing strategy to fulfill usage. Extensive experiments on real-world datasets validate the superiority of our GPU-based structural diversity search solutions in terms of efficiency and effectiveness.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 6","pages":"3413-3428"},"PeriodicalIF":8.9,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143896424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Summary Graph Induced Invariant Learning for Generalizable Graph Learning","authors":"Xuecheng Ning;Yujie Wang;Kui Yu;Jiali Miao;Fuyuan Cao;Jiye Liang","doi":"10.1109/TKDE.2025.3547226","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3547226","url":null,"abstract":"As a promising strategy to achieve generalizable graph learning tasks, graph invariant learning emphasizes identifying invariant subgraphs for stable predictions on biased unknown distribution by selecting the important edges/nodes based on their contributions to the predictive tasks (i.e., subgraph predictivity). However, the existing approaches solely relying on subgraph predictivity face a challenge: the learned invariant subgraph often contains numerous spurious nodes and shows poor connectivity, undermining the generalization power of Graph Neural Networks (GNNs). To tackle this issue, we propose a summary graph-induced Invariant Learning (SIL) model that innovatively adopts a summary graph to leverage both the subgraph connectivity and predictivity for learning strong connected and accurate invariant subgraphs. Specifically, SIL first learns a summary graph containing multiple strongly connected supernodes while maintaining structure consistency with the original graph. Second, the learned summary graph is disentangled into an invariant supernode and spurious counterparts to eliminate the interference of highly predictive edges and nodes. Finally, SIL identifies a potential invariant subgraph from the invariant supernode to accomplish generalization tasks. Additionally, we provide a theoretical analysis of the summary graph learning mechanism, guaranteeing that the learned summary graph is consistent with the original graph. Experimental results validate the effectiveness of the SIL model.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 6","pages":"3739-3752"},"PeriodicalIF":8.9,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143896226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaobao Wang;Yujing Wang;Dongxiao He;Zhe Yu;Yawen Li;Longbiao Wang;Jianwu Dang;Di Jin
{"title":"Elevating Knowledge-Enhanced Entity and Relationship Understanding for Sarcasm Detection","authors":"Xiaobao Wang;Yujing Wang;Dongxiao He;Zhe Yu;Yawen Li;Longbiao Wang;Jianwu Dang;Di Jin","doi":"10.1109/TKDE.2025.3547055","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3547055","url":null,"abstract":"Sarcasm thrives on popular social media platforms such as Twitter and Reddit, where users frequently employ it to convey emotions in an ironic or satirical manner. The ability to detect sarcasm plays a pivotal role in comprehending individuals’ true sentiments. To achieve a comprehensive grasp of sentence semantics, it is crucial to integrate external knowledge that can aid in deciphering entities and their intricate relationships within a sentence. Although some efforts have been made in this regard, their use of external knowledge is still relatively superficial. Specifically, Knowledge-enhanced entity and relationship understanding still face significant challenges. In this paper, we propose the Knowledge Enhanced Sentiment Dependency Graph Convolutional Network (KSDGCN) framework, which constructs a commonsense-augmented sentiment graph and a commonsense-replaced dependency graph for each text to explicitly capture the role of external knowledge for sarcasm detection. Furthermore, we validate the irrational relationships between co-occurring entity pairs within sentences and background knowledge by a signed attention mechanism. We conduct experiments on four benchmark datasets, and the results show that KSDGCN outperforms existing state-of-the-art methods and is highly interpretable.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 6","pages":"3356-3371"},"PeriodicalIF":8.9,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143896387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Binhong Li;Licheng Lin;Shijie Zhang;Jianliang Xu;Jiang Xiao;Bo Li;Hai Jin
{"title":"FlexIM: Efficient and Verifiable Index Management in Blockchain","authors":"Binhong Li;Licheng Lin;Shijie Zhang;Jianliang Xu;Jiang Xiao;Bo Li;Hai Jin","doi":"10.1109/TKDE.2025.3546997","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3546997","url":null,"abstract":"Blockchain-based query with its traceability and data provenance has become increasingly popular and widely adopted in numerous applications. Yet existing index-based query approaches are only efficient under static blockchain query workloads where the query attribute or type must be fixed. It turns out to be particularly challenging to construct an efficient index for dynamic workloads due to prohibitively long construction time and excessive storage consumption. In this paper, we present FlexIM, the first efficient and verifiable index management system for blockchain dynamic queries. The key innovation in FlexIM is to uncover the inherent characteristics of blockchain, i.e., data distribution and block access frequency, and then to optimally choose the index by utilizing reinforcement learning technique under varying workloads. In addition, we enhance and facilitate verifiability with low storage overhead by leveraging Root Merkle Tree (RMT) and Bloom Filter Merkle Tree (BMT). Our comprehensive evaluations demonstrate that FlexIM outperforms the state-of-the-art blockchain query mechanism, vChain+, by achieving a 26.5% speedup while consuming 94.2% less storage, on average, over real-world Bitcoin datasets.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 6","pages":"3399-3412"},"PeriodicalIF":8.9,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10908875","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143896221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lyuheng Yuan;Guimu Guo;Da Yan;Saugat Adhikari;Jalal Khalil;Cheng Long;Lei Zou
{"title":"G-Thinkerq: A General Subgraph Querying System With a Unified Task-Based Programming Model","authors":"Lyuheng Yuan;Guimu Guo;Da Yan;Saugat Adhikari;Jalal Khalil;Cheng Long;Lei Zou","doi":"10.1109/TKDE.2025.3537964","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3537964","url":null,"abstract":"Given a large graph <inline-formula><tex-math>$G$</tex-math></inline-formula>, a subgraph query <inline-formula><tex-math>$Q$</tex-math></inline-formula> finds the set of all subgraphs of <inline-formula><tex-math>$G$</tex-math></inline-formula> that satisfy certain conditions specified by <inline-formula><tex-math>$Q$</tex-math></inline-formula>. Examples of subgraph queries including finding a community containing designated members to organize an event, and subgraph matching. To overcome the weakness of existing graph-parallel systems that underutilize CPU cores when finding subgraphs, our prior system, G-thinker, was proposed that adopts a novel think-like-a-task (TLAT) parallel programming model. However, G-thinker targets offline analytics and cannot support interactive online querying where users continually submit subgraph queries with different query contents. The challenges here are (i) how to maintain fairness that queries are answered in the order that they are received: a later query is processed only if earlier queries cannot saturate the available computation resources; (ii) how to track the progress of active queries (each with many tasks under computation) so that users can be timely notified as soon as a query completes; and (iii) how to maintain memory boundedness and high task concurrency as in G-thinker. In this article, we propose a novel TLAT programming framework, called G-thinkerQ, for answering online subgraph queries. G-thinkerQ inherits the memory boundedness and high task concurrency of G-thinker by organizing the tasks of each query using a “task capsule” structure, and designs a novel task-capsule list is to ensure fairness among queries. A novel lineage-based mechanism is also designed to keep track of when the last task of a query is completed. Parallel counterparts of the state-of-the-art algorithms for 4 recent advanced subgraph queries are implemented on G-thinkerQ to demonstrate its CPU-scalability.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 6","pages":"3429-3444"},"PeriodicalIF":8.9,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143896225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Causal Representations Based on a GAE Embedded Autoencoder","authors":"Kuang Zhou;Ming Jiang;Bogdan Gabrys;Yong Xu","doi":"10.1109/TKDE.2025.3546607","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3546607","url":null,"abstract":"Traditional machine-learning approaches face limitations when confronted with insufficient data. Transfer learning addresses this by leveraging knowledge from closely related domains. The key in transfer learning is to find a transferable feature representation to enhance cross-domain classification models. However, in some scenarios, some features correlated with samples in the source domain may not be relevant to those in the target. Causal inference enables us to uncover the underlying patterns and mechanisms within the data, mitigating the impact of confounding factors. Nevertheless, most existing causal inference algorithms have limitations when applied to high-dimensional datasets with nonlinear causal relationships. In this work, a new causal representation method based on a Graph autoencoder embedded AutoEncoder, named GeAE, is introduced to learn invariant representations across domains. The proposed approach employs a causal structure learning module, similar to a graph autoencoder, to account for nonlinear causal relationships present in the data. Moreover, the cross-entropy loss as well as the causal structure learning loss and the reconstruction loss are incorporated in the objective function designed in a united autoencoder. This method allows for the handling of high-dimensional data and can provide effective representations for cross-domain classification tasks. Experimental results on generated and real-world datasets demonstrate the effectiveness of GeAE compared with the state-of-the-art methods.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 6","pages":"3472-3484"},"PeriodicalIF":8.9,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143896227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Valuing Training Data via Causal Inference for In-Context Learning","authors":"Xiaoling Zhou;Wei Ye;Zhemg Lee;Lei Zou;Shikun Zhang","doi":"10.1109/TKDE.2025.3546761","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3546761","url":null,"abstract":"In-context learning (ICL) empowers large pre-trained language models (PLMs) to predict outcomes for unseen inputs without parameter updates. However, the efficacy of ICL heavily relies on the choice of demonstration examples. Randomly selecting from the training set frequently leads to inconsistent performance. Addressing this challenge, this study takes a novel approach by focusing on training data valuation through causal inference. Specifically, we introduce the concept of average marginal effect (AME) to quantify the contribution of individual training samples to ICL performance, encompassing both its generalization and robustness. Drawing inspiration from multiple treatment effects and randomized experiments, we initially sample diverse training subsets to construct prompts and evaluate the ICL performance based on these prompts. Subsequently, we employ Elastic Net regression to collectively estimate the AME values for all training data, considering subset compositions and inference performance. Ultimately, we prioritize samples with the highest values to prompt the inference of the test data. Across various tasks and with seven PLMs ranging in size from 0.8B to 33B, our approach consistently achieves state-of-the-art performance. Particularly, it outperforms Vanilla ICL and the best-performing baseline by an average of 14.1% and 5.2%, respectively. Moreover, prioritizing the most valuable samples for prompting leads to a significant enhancement in performance stability and robustness across various learning scenarios. Impressively, the valuable samples exhibit transferability across diverse PLMs and generalize well to out-of-distribution tasks.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 6","pages":"3824-3840"},"PeriodicalIF":8.9,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143902653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}