Yizhou Wang;Can Qin;Rongzhe Wei;Yi Xu;Yue Bai;Yun Fu
{"title":"SLA$^{{text{2}}}$2P: Self-Supervised Anomaly Detection With Adversarial Perturbation","authors":"Yizhou Wang;Can Qin;Rongzhe Wei;Yi Xu;Yue Bai;Yun Fu","doi":"10.1109/TKDE.2024.3448473","DOIUrl":"10.1109/TKDE.2024.3448473","url":null,"abstract":"Anomaly detection is a foundational yet difficult problem in machine learning. In this work, we propose a new and effective framework, dubbed as SLA\u0000<sup>2</sup>\u0000P, for unsupervised anomaly detection. Following the extraction of delegate embeddings from raw data, we implement random projections on the features and consider features transformed by disparate projections as being associated with separate pseudo-classes. We then train a neural network for classification on these transformed features to conduct self-supervised learning. Subsequently, we introduce adversarial disturbances to the modified attributes, and we develop anomaly scores built on the classifier's predictive uncertainties concerning these disrupted features. Our approach is motivated by the fact that as anomalies are relatively rare and decentralized, 1) the training of the pseudo-label classifier concentrates more on acquiring the semantic knowledge of regular data instead of anomalous data; 2) the altered attributes of the normal data exhibit greater resilience to disturbances compared to those of the anomalous data. Therefore, the disrupted modified attributes of anomalies can not be well classified and correspondingly tend to attain lesser anomaly scores. The results of experiments on various benchmark datasets for images, text, and inherently tabular data demonstrate that SLA\u0000<sup>2</sup>\u0000P achieves state-of-the-art performance consistently.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"9282-9293"},"PeriodicalIF":8.9,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142220115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xinyin Zhang;Ran Wang;Shuyue Chen;Yuheng Jia;Debby D. Wang
{"title":"AME-LSIFT: Attention-Aware Multi-Label Ensemble With Label Subset-SpecIfic FeaTures","authors":"Xinyin Zhang;Ran Wang;Shuyue Chen;Yuheng Jia;Debby D. Wang","doi":"10.1109/TKDE.2024.3447878","DOIUrl":"10.1109/TKDE.2024.3447878","url":null,"abstract":"Multi-label ensemble can achieve superior performance on multi-label learning problems by integrating a number of base classifiers. In existing multi-label ensemble methods, the base classifiers are usually trained with the same original features; it is difficult for each base classifier to capture label-relevant or label subset-relevant information. Meanwhile, the manually designed integrating strategies cannot automatically distinguish the importance of the base classifiers, which also lack flexibility and scalability. In order to resolve these problems, this paper proposes a new multi-label ensemble framework, named Attention-aware Multi-label Ensemble with Label Subset-specIfic FeaTures (AME-LSIFT). It utilizes \u0000<inline-formula><tex-math>$c$</tex-math></inline-formula>\u0000-means clustering to produce Label Subset-specIfic FeaTures (LSIFT), constructs a neural network based model for each label subset, and integrates the base models with a dynamic and automatic attention-aware mechanism. Moreover, an objective function that considers both the label subset accuracy and ensemble accuracy is developed for training the proposed AME-LSIFT. Experiments conducted on ten benchmark datasets demonstrate the superior performance of the proposed method compared with state-of-the-art approaches.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"7627-7642"},"PeriodicalIF":8.9,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142220117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiangxiang Dai;Zhiyong Wang;Jize Xie;Tong Yu;John C. S. Lui
{"title":"Online Learning and Detecting Corrupted Users for Conversational Recommendation Systems","authors":"Xiangxiang Dai;Zhiyong Wang;Jize Xie;Tong Yu;John C. S. Lui","doi":"10.1109/TKDE.2024.3448250","DOIUrl":"10.1109/TKDE.2024.3448250","url":null,"abstract":"Conversational recommendation systems (CRSs) are increasingly prevalent, but they are susceptible to the influence of corrupted user behaviors, such as deceptive click ratings. These behaviors can skew the recommendation process, resulting in suboptimal results. Traditional bandit algorithms, which are typically oriented to single users, do not capitalize on implicit social connections between users, which could otherwise enhance learning efficiency. Furthermore, they cannot identify corrupted users in a real-time, multi-user environment. In this paper, we propose a novel bandit problem, Online Learning and Detecting Corrupted Users (OLDCU), to learn and utilize unknown user relations from disrupted behaviors to speed up learning and detect corrupted users in an online setting. This problem is non-trivial due to the dynamic nature of user behaviors and the difficulty of online detection. To robustly learn and leverage the unknown relations among potentially corrupted users, we propose a novel bandit algorithm RCLUB-WCU, incorporating a conversational mechanism. This algorithm is designed to handle the complexities of disrupted behaviors and to make accurate user relation inferences. To detect corrupted users with bandit feedback, we further devise a novel online detection algorithm, OCCUD, which is based on RCLUB-WCU’s inferred user relations and designed to adapt over time. We prove a sub-linear regret bound for RCLUB-WCU, demonstrating its efficiency. We also analyze the detection accuracy of OCCUD, showing its effectiveness in identifying corrupted users. Through extensive experiments, we validate the performance of our methods. Our results show that RCLUB-WCU and OCCUD outperform previous bandit algorithms and achieve high corrupted user detection accuracy, providing robust and efficient solutions in the field of CRSs.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"8939-8953"},"PeriodicalIF":8.9,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10643701","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142220116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Heterogeneous Graph Learning via Random Projection","authors":"Jun Hu;Bryan Hooi;Bingsheng He","doi":"10.1109/TKDE.2024.3434956","DOIUrl":"10.1109/TKDE.2024.3434956","url":null,"abstract":"Heterogeneous Graph Neural Networks (HGNNs) are powerful tools for deep learning on heterogeneous graphs. Typical HGNNs require repetitive message passing during training, limiting efficiency for large-scale real-world graphs. Recent pre-computation-based HGNNs use one-time message passing to transform a heterogeneous graph into regular-shaped tensors, enabling efficient mini-batch training. Existing pre-computation-based HGNNs can be mainly categorized into two styles, which differ in how much information loss is allowed and efficiency. We propose a hybrid pre-computation-based HGNN, named Random Projection Heterogeneous Graph Neural Network (RpHGNN), which combines the benefits of one style's efficiency with the low information loss of the other style. To achieve efficiency, the main framework of RpHGNN consists of propagate-then-update iterations, where we introduce a Random Projection Squashing step to ensure that complexity increases only linearly. To achieve low information loss, we introduce a Relation-wise Neighbor Collection component with an Even-odd Propagation Scheme, which aims to collect information from neighbors in a finer-grained way. Experimental results indicate that our approach achieves state-of-the-art results on seven small and large benchmark datasets while also being 230% faster compared to the most effective baseline. Surprisingly, our approach not only surpasses pre-processing-based baselines but also outperforms end-to-end methods.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"8093-8107"},"PeriodicalIF":8.9,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142220119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Li Yang;Zhipeng Luo;Shiming Zhang;Fei Teng;Tianrui Li
{"title":"Continual Learning for Smart City: A Survey","authors":"Li Yang;Zhipeng Luo;Shiming Zhang;Fei Teng;Tianrui Li","doi":"10.1109/TKDE.2024.3447123","DOIUrl":"10.1109/TKDE.2024.3447123","url":null,"abstract":"With the digitization of modern cities, large data volumes and powerful computational resources facilitate the rapid update of intelligent models deployed in smart cities. Continual learning (CL) is a novel machine learning paradigm that constantly updates models to adapt to changing environments, where the learning tasks, data, and distributions can vary over time. Our survey provides a comprehensive review of continual learning methods that are widely used in smart city development. The content consists of three parts: 1) Methodology-wise. We categorize a large number of basic CL methods and advanced CL frameworks in combination with other learning paradigms including graph learning, spatial-temporal learning, multi-modal learning, and federated learning. 2) Application-wise. We present numerous CL applications covering transportation, environment, public health, safety, networks, and associated datasets related to urban computing. 3) Challenges. We discuss current problems and challenges and envision several promising research directions. We believe this survey can help relevant researchers quickly familiarize themselves with the current state of continual learning research used in smart city development and direct them to future research trends.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"7805-7824"},"PeriodicalIF":8.9,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142220120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Neighbor Distribution Learning for Minority Class Augmentation","authors":"Mengting Zhou;Zhiguo Gong","doi":"10.1109/TKDE.2024.3447014","DOIUrl":"10.1109/TKDE.2024.3447014","url":null,"abstract":"Graph Neural Networks (GNNs) have achieved remarkable success in graph-based tasks. However, learning unbiased node representations under class-imbalanced training data remains challenging. Existing solutions may face overfitting due to extensive reuse of those limited labeled data in minority classes. Furthermore, many works address the class-imbalanced issue based on the embeddings generated from the biased GNNs, which make models intrinsically biased towards majority classes. In this paper, we propose a novel data augmentation strategy GraphGLS for semi-supervised class-imbalanced node classification, which aims to select informative unlabeled nodes to augment minority classes with consideration of both global and local information. Specifically, we first design a Global Selection module to learn global information (pseudo-labels) for unlabeled nodes and then select potential ones from them for minority classes. The Local Selection module further conducts filtering over those potential nodes by comparing their neighbor distributions with minority classes. To achieve this, we further design a neighbor distribution auto-encoder to learn a robust node-level neighbor distribution for each node. Then, we define class-level neighbor distribution to capture the overall neighbor characteristics of nodes within the same class. We conduct extensive experiments on multiple datasets, and the results demonstrate the superiority of GraphGLS over state-of-the-art baselines.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"8901-8913"},"PeriodicalIF":8.9,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142220121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PrivFusion: Privacy-Preserving Model Fusion via Decentralized Federated Graph Matching","authors":"Qian Chen;Yiqiang Chen;Xinlong Jiang;Teng Zhang;Weiwei Dai;Wuliang Huang;Bingjie Yan;Zhen Yan;Wang Lu;Bo Ye","doi":"10.1109/TKDE.2024.3430819","DOIUrl":"10.1109/TKDE.2024.3430819","url":null,"abstract":"Model fusion is becoming a crucial component in the context of model-as-a-service scenarios, enabling the delivery of high-quality model services to local users. However, this approach introduces privacy risks and imposes certain limitations on its applications. Ensuring secure model exchange and knowledge fusion among users becomes a significant challenge in this setting. To tackle this issue, we propose PrivFusion, a novel architecture that preserves privacy while facilitating model fusion under the constraints of local differential privacy. PrivFusion leverages a graph-based structure, enabling the fusion of models from multiple parties without additional training. By employing randomized mechanisms, PrivFusion ensures privacy guarantees throughout the fusion process. To enhance model privacy, our approach incorporates a hybrid local differentially private mechanism and decentralized federated graph matching, effectively protecting both activation values and weights. Additionally, we introduce a perturbation filter adapter to alleviate the impact of randomized noise, thereby recovering the utility of the fused model. Through extensive experiments conducted on diverse image datasets and real-world healthcare applications, we provide empirical evidence showcasing the effectiveness of PrivFusion in maintaining model performance while preserving privacy. Our contributions offer valuable insights and practical solutions for secure and collaborative data analysis within the domain of privacy-preserving model fusion.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"9051-9064"},"PeriodicalIF":8.9,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142220122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cheng Liu;Rui Li;Hangjun Che;Man-Fai Leung;Si Wu;Zhiwen Yu;Hau-San Wong
{"title":"Latent Structure-Aware View Recovery for Incomplete Multi-View Clustering","authors":"Cheng Liu;Rui Li;Hangjun Che;Man-Fai Leung;Si Wu;Zhiwen Yu;Hau-San Wong","doi":"10.1109/TKDE.2024.3445992","DOIUrl":"10.1109/TKDE.2024.3445992","url":null,"abstract":"Incomplete multi-view clustering (IMVC) presents a significant challenge due to the need for effectively exploring complementary and consistent information within the context of missing views. One promising strategy to tackle this challenge is to recover missing views by inferring the missing samples. However, such approaches often fail to fully utilize discriminative structural information or adequately address consistency, as it requires such information to be known or learnable in advance, which contradicts the incomplete data setting. In this study, we propose a novel approach called \u0000<bold>La</b>\u0000tent \u0000<bold>S</b>\u0000tructure-\u0000<bold>A</b>\u0000ware view recovery (LaSA) for the IMVC task. Our objective is to recover missing views through discriminative latent representations by leveraging structural information. Specifically, our method offers a unified closed-form formulation that simultaneously performs missing data inference and latent representation learning, using a learned intrinsic graph as structural information. This formulation, incorporating graph structure information, enhances the inference of missing data while facilitating discriminative feature learning. Even when intrinsic graph is initially unknown due to incomplete data, our formulation allows for effective view recovery and intrinsic graph learning through an iterative optimization process. To further enhance performance, we introduce an iterative consistency diffusion process, which effectively leverages the consistency and complementary information across multiple views. Extensive experiments demonstrate the effectiveness of the proposed method compared to state-of-the-art approaches.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"8655-8669"},"PeriodicalIF":8.9,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142220126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Discovering the Representation Bottleneck of Graph Neural Networks","authors":"Fang Wu;Siyuan Li;Stan Z. Li","doi":"10.1109/TKDE.2024.3446584","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3446584","url":null,"abstract":"Graph neural networks (GNNs) rely mainly on the message-passing paradigm to propagate node features and build interactions, and different graph learning problems require different ranges of node interactions. In this work, we explore the capacity of GNNs to capture node interactions under contexts of different complexities. We discover that \u0000<italic>GNNs usually fail to capture the most informative kinds of interaction styles for diverse graph learning tasks</i>\u0000, and thus name this phenomenon as GNNs’ representation bottleneck. As a response, we demonstrate that the inductive bias introduced by existing graph construction mechanisms can result in this representation bottleneck, i.e., preventing GNNs from learning interactions of the most appropriate complexity. To address that limitation, we propose a novel graph rewiring approach based on interaction patterns learned by GNNs to adjust each node's receptive fields dynamically. Extensive experiments on both real-world and synthetic datasets prove the effectiveness of our algorithm in alleviating the representation bottleneck and its superiority in enhancing the performance of GNNs over state-of-the-art graph rewiring baselines.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"7998-8008"},"PeriodicalIF":8.9,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10640313","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142636450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Budget-Constrained Ego Network Extraction With Maximized Willingness","authors":"Bay-Yuan Hsu;Chia-Hsun Lu;Ming-Yi Chang;Chih-Ying Tseng;Chih-Ya Shen","doi":"10.1109/TKDE.2024.3446169","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3446169","url":null,"abstract":"Many large-scale machine learning approaches and graph algorithms are proposed recently to address a variety of problems in online social networks (OSNs). To evaluate and validate these algorithms and models, the data of ego-centric networks (ego networks) are widely adopted. Therefore, effectively extracting large-scale ego networks from OSNs becomes an important issue, particularly when privacy policies become increasingly strict nowadays. In this paper, we study the problem of extracting ego network data by considering jointly the user willingness, crawling cost, and structure of the network. We formulate a new research problem, named \u0000<i>Structure and Willingness Aware Ego Network Extraction (SWAN)</i>\u0000 and analyze its NP-hardness. We first propose a \u0000<inline-formula><tex-math>$(1-frac{1}{e})$</tex-math></inline-formula>\u0000-approximation algorithm, named \u0000<i>Tristar-Optimized Ego Network Identification with Maximum Willingness (TOMW)</i>\u0000. In addition to the deterministic approximation algorithm, we also propose to automatically \u0000<i>learn</i>\u0000 an effective heuristic approach with machine learning, to avoid the huge efforts for human to devise a good algorithm. The learning approach is named \u0000<i>Willingness-maximized and Structure-aware Ego Network Extraction with Reinforcement Learning (WSRL)</i>\u0000, in which we propose a novel constrastive learning strategy, named \u0000<i>Contrastive Learning with Performance-boosting Graph Augmentation</i>\u0000. We recruited 1,810 real-world participants and conducted an evaluation study to validate our problem formulation and proposed approaches. Moreover, experimental results on real social network datasets show that the proposed approaches outperform the other baselines significantly.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"7692-7707"},"PeriodicalIF":8.9,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142645398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}