{"title":"Enhancing Precision Drug Recommendations via In-Depth Exploration of Motif Relationships","authors":"Chuang Zhao;Hongke Zhao;Xiaofang Zhou;Xiaomeng Li","doi":"10.1109/TKDE.2024.3437775","DOIUrl":"10.1109/TKDE.2024.3437775","url":null,"abstract":"Making accurate and safe clinical decisions for patients has long been a challenging task. With the proliferation of electronic health records and the rapid advancement of technology, drug recommender systems have emerged as invaluable aids for healthcare professionals, offering precise and secure prescriptions. Among prevailing methods, the exploration of motifs, defined as substructures with specific biological functions, has largely been overlooked. Nevertheless, the substantial impact of the motifs on drug efficacy and patient diseases implies that a more extensive incorporation could potentially improve the recommender systems. In light of this, we introduce \u0000<italic>DEPOT</i>\u0000, an innovative drug recommendation framework developed from a motif-aware perspective. In our approach, we employ chemical decomposition to partition drug molecules into semantic motif-trees and design a structure-aware graph transformer to capture motif collaboration. This innovative practice preserves the topology knowledge and facilitates perception of drug functionality. To delve into the dynamic correlation between motifs and disease progression, we conduct a meticulous investigation from two perspectives: repetition and exploration. This comprehensive analysis allows us to gain valuable insights into the drug turnover, with the former focusing on reusability and the latter on discovering new requirements. We further formulate a historical weighting strategy for drug-drug interaction (DDI) objective, enabling adaptive control over the trade-off between accuracy and safety criteria throughout the training process. Extensive experiments conducted on four data sets validate the effectiveness and robustness of \u0000<italic>DEPOT</i>\u0000.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"8164-8178"},"PeriodicalIF":8.9,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141969044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guoliang He;Dawei Jin;Lifang Dai;Xin Xin;Zhiwen Yu;C. L. Philip Chen
{"title":"Online Learning of Temporal Association Rule on Dynamic Multivariate Time Series Data","authors":"Guoliang He;Dawei Jin;Lifang Dai;Xin Xin;Zhiwen Yu;C. L. Philip Chen","doi":"10.1109/TKDE.2024.3438259","DOIUrl":"10.1109/TKDE.2024.3438259","url":null,"abstract":"Recently, rule-based classification on multivariate time series (MTS) data has gained lots of attention, which could improve the interpretability of classification. However, state-of-the-art approaches suffer from three major issues. 1) few existing studies consider temporal relations among features in a rule, which could not adequately express the essential characteristics of MTS data. 2) due to the concept drift and time warping of MTS data, traditional methods could not mine essential characteristics of MTS data. 3) existing online learning algorithms could not effectively update shapelet-based temporal association rules of MTS data due to its temporal relationships among features of different variables. To handle these issues, we propose an online learning method for temporal association rule on dynamically collected MTS data (OTARL). First, a new type of rule named temporal association rule is defined and mined to represent temporal relationships among features in a rule. Second, an online learning mechanism with a probability correlation-based evaluation criterion is proposed to realize the online learning of temporal association rules on dynamically collected MTS data. Finally, an ensemble classification approach based on maximum-likelihood estimation is advanced to further enhance the classification performance. We conduct experiments on ten real-world datasets to verify the effectiveness and efficiency of our approach.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"8954-8966"},"PeriodicalIF":8.9,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141935928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Shapley Value Approximation Based on Complementary Contribution","authors":"Qiheng Sun;Jiayao Zhang;Jinfei Liu;Li Xiong;Jian Pei;Kui Ren","doi":"10.1109/TKDE.2024.3438213","DOIUrl":"10.1109/TKDE.2024.3438213","url":null,"abstract":"Shapley value provides a unique way to fairly assess each player's contribution in a coalition and has enjoyed many applications. However, the exact computation of Shapley value is #P-hard due to the combinatoric nature of Shapley value. Many existing applications of Shapley value are based on Monte-Carlo approximation, which requires a large number of samples and the assessment of utility on many coalitions to reach high-quality approximation, and thus is still far from being efficient. Can we achieve an efficient approximation of Shapley value by smartly obtaining samples? In this paper, we treat the sampling approach to Shapley value approximation as a stratified sampling problem. Our main technical contributions are a novel stratification design and a sampling method based on Neyman allocation. Moreover, computing the Shapley value in a dynamic setting, where new players may join the game and others may leave it poses an additional challenge due to the considerable cost of recomputing from scratch. To tackle this issue, we propose to capture changes in Shapley value, making our approaches applicable to scenarios with dynamic players. Experimental results on several real data sets and synthetic data sets demonstrate the effectiveness and efficiency of our approaches.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"9263-9281"},"PeriodicalIF":8.9,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141935927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robustness-Reinforced Knowledge Distillation With Correlation Distance and Network Pruning","authors":"Seonghak Kim;Gyeongdo Ham;Yucheol Cho;Daeshik Kim","doi":"10.1109/TKDE.2024.3438074","DOIUrl":"10.1109/TKDE.2024.3438074","url":null,"abstract":"The improvement in the performance of efficient and lightweight models (i.e., the student model) is achieved through knowledge distillation (KD), which involves transferring knowledge from more complex models (i.e., the teacher model). However, most existing KD techniques rely on Kullback-Leibler (KL) divergence, which has certain limitations. First, if the teacher distribution has high entropy, the KL divergence's mode-averaging nature hinders the transfer of sufficient target information. Second, when the teacher distribution has low entropy, the KL divergence tends to excessively focus on specific modes, which fails to convey an abundant amount of valuable knowledge to the student. Consequently, when dealing with datasets that contain numerous confounding or challenging samples, student models may struggle to acquire sufficient knowledge, resulting in subpar performance. Furthermore, in previous KD approaches, we observed that data augmentation, a technique aimed at enhancing a model's generalization, can have an adverse impact. Therefore, we propose a Robustness-Reinforced Knowledge Distillation (R2KD) that leverages correlation distance and network pruning. This approach enables KD to effectively incorporate data augmentation for performance improvement. Extensive experiments on various datasets, including CIFAR-100, FGVR, TinyImagenet, and ImageNet, demonstrate our method's superiority over current state-of-the-art methods.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"9163-9175"},"PeriodicalIF":8.9,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141935925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adversarial Attack and Defense on Discrete Time Dynamic Graphs","authors":"Ziwei Zhao;Yu Yang;Zikai Yin;Tong Xu;Xi Zhu;Fake Lin;Xueying Li;Enhong Chen","doi":"10.1109/TKDE.2024.3438238","DOIUrl":"10.1109/TKDE.2024.3438238","url":null,"abstract":"Graph learning methods have achieved remarkable performance in various domains such as social recommendation, financial fraud detection, and so on. In real applications, the underlying graph is often dynamically evolving and thus, some recent studies focus on integrating the temporal topology information of graphs into the GNN for learning graph embedding. However, the robustness of training GNNs for dynamic graphs has not been discussed so far. The major reason is how to attack dynamic graph embedding still remains largely untouched, let alone how to defend against the attacks. To enable robust training of GNNs for dynamic graphs, in this paper, we investigate the problem of how to generate attacks and defend against attacks for dynamic graph embedding. Attacking dynamic graph embedding is more challenging than attacking static graph embedding as we need to understand the temporal dynamics of graphs as well as its impact on the embedding and the injected perturbations should be distinguished from the natural evolution. In addition, the defense is very challenging as the perturbations may be hidden within the natural evolution. To tackle these technical challenges, in this paper, we first develop a novel gradient-based attack method from an optimization perspective to generate perturbations to fool dynamic graph learning methods, where a key idea is to use gradient dynamics to attack the natural dynamics of the graph. Further, we borrow the idea of the attack method and integrate it with adversarial training to train a more robust dynamic graph learning method to defend against hand-crafted attacks. Finally, extensive experiments on two real-world datasets demonstrate the effectiveness of the proposed attack and defense method, where our defense method not only achieves comparable performance on clean graphs but also significantly increases the defense performance on attacked graphs.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"7600-7611"},"PeriodicalIF":8.9,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141935924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mixed-Modality Clustering via Generative Graph Structure Matching","authors":"Xiaxia He;Boyue Wang;Junbin Gao;Qianqian Wang;Yongli Hu;Baocai Yin","doi":"10.1109/TKDE.2024.3434556","DOIUrl":"10.1109/TKDE.2024.3434556","url":null,"abstract":"The goal of mixed-modality clustering, which differs from typical multi-modality/view clustering, is to divide samples derived from various modalities into several clusters. This task has to solve two critical semantic gap problems: i) how to generate the missing modalities without the pairwise-modality data; and ii) how to align the representations of heterogeneous modalities. To tackle the above problems, this paper proposes a novel mixed-modality clustering model, which integrates the missing-modality generation and the heterogeneous modality alignment into a unified framework. During the missing-modality generation process, a bidirectional mapping is established between different modalities, enabling generation of preliminary representations for the missing-modality using information from another modality. Then the intra-modality bipartite graphs are constructed to help generate better missing-modality representations by weighted aggregating existing intra-modality neighbors. In this way, a pairwise-modality representation for each sample can be obtained. In the process of heterogeneous modality alignment, each modality is modelled as a graph to capture the global structure among intra-modality samples and is aligned against the heterogeneous modality representations through the adaptive heterogeneous graph matching module. Experimental results on three public datasets show the effectiveness of the proposed model compared to multiple state-of-the-art multi-modality/view clustering methods.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"8773-8786"},"PeriodicalIF":8.9,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10623373","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141969014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yan Li;Chenyu Ma;Rong Gao;Youxi Wu;Jinyan Li;Wenjian Wang;Xindong Wu
{"title":"OPF-Miner: Order-Preserving Pattern Mining With Forgetting Mechanism for Time Series","authors":"Yan Li;Chenyu Ma;Rong Gao;Youxi Wu;Jinyan Li;Wenjian Wang;Xindong Wu","doi":"10.1109/TKDE.2024.3438274","DOIUrl":"10.1109/TKDE.2024.3438274","url":null,"abstract":"Order-preserving pattern (OPP) mining is a type of sequential pattern mining method in which a group of ranks of time series is used to represent an OPP. This approach can discover frequent trends in time series. Existing OPP mining algorithms consider data points at different time to be equally important; however, newer data usually have a more significant impact, while older data have a weaker impact. We therefore introduce the forgetting mechanism into OPP mining to reduce the importance of older data. This paper explores the mining of OPPs with forgetting mechanism (OPF) and proposes an algorithm called OPF-Miner that can discover frequent OPFs. OPF-Miner performs two tasks, candidate pattern generation and support calculation. In candidate pattern generation, OPF-Miner employs a maximal support priority strategy and a group pattern fusion strategy to avoid redundant pattern fusions. For support calculation, we propose an algorithm called support calculation with forgetting mechanism, which uses prefix and suffix pattern pruning strategies to avoid redundant support calculations. The experiments are conducted on nine datasets and 12 alternative algorithms. The results verify that OPF-Miner is superior to other competitive algorithms. More importantly, OPF-Miner yields good clustering performance for time series, since the forgetting mechanism is employed.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"8981-8995"},"PeriodicalIF":8.9,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141935926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zi Chen;Xinyu Ji;Long Yuan;Xuemin Lin;Wenjie Zhang;Shan Huang
{"title":"Parallel Contraction Hierarchies Construction on Road Networks","authors":"Zi Chen;Xinyu Ji;Long Yuan;Xuemin Lin;Wenjie Zhang;Shan Huang","doi":"10.1109/TKDE.2024.3437243","DOIUrl":"10.1109/TKDE.2024.3437243","url":null,"abstract":"Shortest path query on road networks is a fundamental problem to support many location-based services and wide variant applications. Contraction Hierarchies(CH) is widely adopted to accelerate the shortest path query by leveraging shortcuts among vertices. However, the state-of-the-art CH construction method named \u0000<inline-formula><tex-math>$mathsf{VCHCons}$</tex-math></inline-formula>\u0000 suffers from inefficiencies due to their strong reliance on pre-determined vertex order. This leads to the generation of a large number of invalid shortcuts and the limit of parallel processing capability. Motivated by it, in this paper, an innovative CH construction algorithm called \u0000<inline-formula><tex-math>$mathsf{ECHCons}$</tex-math></inline-formula>\u0000 is devised following an edge-centric paradigm, which addresses the issue of invalid shortcut production by introducing a novel edge-ordering strategy. Furthermore, it optimizes shortcut calculation within a dynamically constructed optimal subgraph, which is significantly smaller than the original network, thus shrinking the traversal space during index construction. To further enhance efficiency and overcome the limitations in parallelism inherent to \u0000<inline-formula><tex-math>$mathsf{VCHCons}$</tex-math></inline-formula>\u0000, our approach leverages batch contraction of edges and introduces a well-defined lower bound technique to unlock more efficient parallel computation resources. Our approach provides both theoretical guarantee and practical advancement in CH construction. Extensive and comprehensive experiments are conducted on real road networks. The experimental results demonstrate the effectiveness and efficiency of our proposed approach.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"9011-9024"},"PeriodicalIF":8.9,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141883009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Prioritized Node-Wise Message Propagation in Graph Neural Networks","authors":"Yao Cheng;Minjie Chen;Caihua Shan;Xiang Li","doi":"10.1109/TKDE.2024.3436909","DOIUrl":"10.1109/TKDE.2024.3436909","url":null,"abstract":"Graph neural networks (GNNs) have recently received significant attention. Learning node-wise message propagation in GNNs aims to set personalized propagation steps for different nodes in the graph. Despite the success, existing methods ignore node priority that can be reflected by node influence and heterophily. In this paper, we propose a versatile framework PriPro, which can be integrated with most existing GNN models and aim to learn prioritized node-wise message propagation in GNNs. Specifically, the framework consists of three components: a backbone GNN model, a propagation controller to determine the optimal propagation steps for nodes, and a weight controller to compute the priority scores for nodes. We design a mutually enhanced mechanism to compute node priority, optimal propagation step and label prediction. We also propose an alternative optimization strategy to learn the parameters in the backbone GNN model and two parametric controllers. We conduct extensive experiments to compare our framework with other 12 state-of-the-art competitors on 10 benchmark datasets. Experimental results show that our framework can lead to superior performance in terms of propagation strategies and node representations.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"8670-8681"},"PeriodicalIF":8.9,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141883007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jie Bai;Kang Zhao;Linjing Li;Daniel Zeng;Qiudan Li;Fan Yang;Quannan Zu
{"title":"Graph Representation Learning Based on Cognitive Spreading Activations","authors":"Jie Bai;Kang Zhao;Linjing Li;Daniel Zeng;Qiudan Li;Fan Yang;Quannan Zu","doi":"10.1109/TKDE.2024.3437781","DOIUrl":"10.1109/TKDE.2024.3437781","url":null,"abstract":"Graph representation learning is an emerging area for graph analysis and inference. However, existing approaches for large-scale graphs either sample nodes in sequential walks or manipulate the adjacency matrices of graphs. The former approach can cause sampling bias against less-connected nodes, whereas the latter may suffer from sparsity that exists in many real-world graphs. To learn from structural information in a graph more efficiently and comprehensively, this paper proposes a new graph representation learning approach inspired by the cognitive model of spreading-activation mechanisms in human memory. This approach learns node embeddings by adopting a graph activation model that allows nodes to “activate” their neighbors and spread their own structural information to other nodes through the paths simultaneously. Comprehensive experiments demonstrate that the proposed model performs better than existing methods on several empirical datasets for multiple graph inference tasks. Meanwhile, the spreading-activation-based model is computationally more efficient than existing approaches–the training process converges after only a small number of iterations, and the training time is linear in the number of edges in a graph. The proposed method works for both homogeneous and heterogeneous graphs.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"8408-8420"},"PeriodicalIF":8.9,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141883008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}