Ruize Shi;Hong Huang;Xue Lin;Kehan Yin;Wei Zhou;Hai Jin
{"title":"Efficient Learning for Billion-Scale Heterogeneous Information Networks","authors":"Ruize Shi;Hong Huang;Xue Lin;Kehan Yin;Wei Zhou;Hai Jin","doi":"10.1109/TBDATA.2024.3428331","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3428331","url":null,"abstract":"<i>Heterogeneous graph neural networks (HGNNs)</i> excel at understanding <i>heterogeneous information networks</i> (HINs) and have demonstrated state-of-the-art performance across numerous tasks. However, previous works tend to study small datasets, which deviate significantly from real-world scenarios. More specifically, their heterogeneous message passing results in substantial memory and time overheads, as it requires aggregating heterogeneous neighbor features multiple times. To address this, we propose an <i>Efficient Heterogeneous Graph Neural Network</i> (EHGNN) that leverages <i>heterogeneous personalized PageRank</i> (HPPR) to preserve the influence between all nodes, then approximates message passing and selectively loads neighbor information for one aggregation, significantly reducing memory and time usage. In addition, we employ some lightweight techniques to ensure the performance of EHGNN. Evaluations on various HIN benchmarks in node classification and link prediction tasks unequivocally establish the superiority of EHGNN, surpassing the State-of-the-Art by 11<inline-formula><tex-math>$%$</tex-math></inline-formula> in terms of performance. In addition, EHGNN achieves a remarkable 400<inline-formula><tex-math>$%$</tex-math></inline-formula> boost in training and inference speed while utilizing less memory. Notably, EHGNN can handle a 200-million-node, 1-billion-link HIN within 18 hours on a single machine, using only 170 GB of memory, which is much lower than the previous minimum requirement of 600 GB.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 2","pages":"748-760"},"PeriodicalIF":7.5,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10598347","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143629655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust Joint Graph Learning for Multi-View Clustering","authors":"Yanfang He;Umi Kalsom Yusof","doi":"10.1109/TBDATA.2024.3426277","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3426277","url":null,"abstract":"In real-world applications, multi-view datasets often comprise diverse data sources or views, inevitably accompanied by noise. However, most existing graph-based multi-view clustering methods utilize fixed graph similarity matrices to handle noisy multi-view data, necessitating additional clustering steps for obtaining the final clustering. This paper proposes a Robust Joint Graph learning for Multi-view Clustering (RJGMC) based on <inline-formula><tex-math>$ ell _{1}$</tex-math></inline-formula>-norm to address these problems. RJGMC integrates the learning processes of the graph similarity matrix and the unified graph matrix to improve mutual reinforcement between these graph matrices. Simultaneously, employing the <inline-formula><tex-math>$ ell _{1}$</tex-math></inline-formula>-norm to generate the unified graph matrix enhances the algorithm's robustness. A rank constraint is imposed on the graph Laplacian matrix of the unified graph matrix, where clustering can be divided directly without additional processing. In addition, we also introduce a method for automatically assigning optimal weights to each view. The optimization of this objective function employs an alternating optimization approach. Experimental results on synthetic and real-world datasets demonstrate that the proposed method outperforms other state-of-the-art techniques regarding clustering performance and robustness.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 2","pages":"722-734"},"PeriodicalIF":7.5,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143629653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Worker Similarity-Based Label Completion for Crowdsourcing","authors":"Xue Wu;Liangxiao Jiang;Wenjun Zhang;Chaoqun Li","doi":"10.1109/TBDATA.2024.3426310","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3426310","url":null,"abstract":"In real-world crowdsourcing scenarios, it is a common phenomenon that each worker only annotates a few instances, resulting in a significantly sparse crowdsourcing label matrix. Consequently, only a small number of workers influence the inferred integrated label of each instance, which may weaken the performance of label integration algorithms. To address this problem, we propose a novel label completion algorithm called Worker Similarity-based Label Completion (WSLC). WSLC is grounded on the assumption that workers with similar cognitive abilities will annotate similar labels on the same instances. Specifically, we first construct a data set for each worker that includes all instances annotated by this worker and learn a feature vector for each worker. Then, we define a metric based on cosine similarity to estimate worker similarity based on the learned feature vectors. Finally, we complete the labels for each worker on unannotated instances based on the worker similarity and the annotations of similar workers. The experimental results on one real-world and 34 simulated crowdsourced data sets consistently show that WSLC effectively addresses the problem of the sparse crowdsourcing label matrix and enhances the integration accuracies of label integration algorithms.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 2","pages":"710-721"},"PeriodicalIF":7.5,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143629650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Graph Convolutional Networks With Collaborative Feature Fusion for Sequential Recommendation","authors":"Jianping Gou;Youhui Cheng;Yibing Zhan;Baosheng Yu;Weihua Ou;Yi Zhang","doi":"10.1109/TBDATA.2024.3426355","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3426355","url":null,"abstract":"Sequential recommendation seeks to understand user preferences based on their past actions and predict future interactions with items. Recently, several techniques for sequential recommendation have emerged, primarily leveraging graph convolutional networks (GCNs) for their ability to model relationships effectively. However, real-world scenarios often involve sparse interactions, where early and recent short-term preferences play distinct roles in the recommendation process. Consequently, vanilla GCNs struggle to effectively capture the explicit correlations between these early and recent short-term preferences. To address these challenges, we introduce a novel approach termed Graph Convolutional Networks with Collaborative Feature Fusion (COFF). Specifically, our method addresses the issue by initially dividing each user interaction sequence into two segments. We then construct two separate graphs for these segments, aiming to capture the user's early and recent short-term preferences independently. To obtain robust prediction, we employ multiple GCNs in a collaborative distillation manner, incorporating a feature fusion module to establish connections between the early and recent short-term preferences. This approach enables a more precise representation of user preferences. Experimental evaluations conducted on five popular sequential recommendation datasets demonstrate that our COFF model outperforms recent state-of-the-art methods in terms of recommendation accuracy.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 2","pages":"735-747"},"PeriodicalIF":7.5,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143627852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DCLCSE: Dynamic Curriculum Learning Based Contrastive Learning of Sentence Embeddings","authors":"Chang Liu;Dacao Zhang;Meng Wang","doi":"10.1109/TBDATA.2024.3423650","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3423650","url":null,"abstract":"Recently, Contrastive Learning (CL) has made impressive progress in natural language processing, especially in sentence representation learning. Plenty of data augmentation methods have been proposed for the generation of positive samples. However, due to the highly abstract nature of natural language, these augmentations cannot maintain the quality of generated positive samples, e.g., too easy or hard samples. To this end, we propose to improve the quality of positive examples from a data arrangement perspective and develop a novel model-agnostic approach: <italic>Dynamic Curriculum Learning based Contrastive Sentence Embedding framework</i> (<italic>DCLCSE</i>) for sentence embeddings. Specifically, we propose to incorporate a curriculum learning strategy to control the positive example usage. At the early learning stage, easy samples are selected to optimize the CL-based model. As the model's capability increases, we gradually select harder samples for model training, ensuring the learning efficiency of the model. Furthermore, we design a novel difficulty measurement module to calculate the difficulty of generated positives, in which the model's capability is considered for the accurate sample difficulty measurement. Based on this, we develop multiple arrangement strategies to facilitate the model learning process based on learned difficulties. Finally, extensive experiments over multiple representative models demonstrate the superiority of <italic>DCLCSE</i>. As a byproduct, we have released the codes to facilitate other researchers.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 2","pages":"635-647"},"PeriodicalIF":7.5,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143611762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Symbolic Knowledge Reasoning on Hyper-Relational Knowledge Graphs","authors":"Zikang Wang;Linjing Li;Daniel Dajun Zeng","doi":"10.1109/TBDATA.2024.3423670","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3423670","url":null,"abstract":"Knowledge reasoning has been widely researched in knowledge graphs (KGs), but there has been relatively less research on hyper-relational KGs, which also plays an important role in downstream tasks. Existing reasoning methods on hyper-relational KGs are based on representation learning. Though this approach is effective, it lacks interpretability and ignores the graph structure information. In this paper, we make the first attempt at symbolic reasoning on hyper-relational KGs. We introduce rule extraction methods based on both individual facts and paths, and propose a rule-based symbolic reasoning approach, HyperPath. This approach is simple and interpretable, it can serve as a baseline model for symbolic reasoning in hyper-relational KGs. We provide experimental results on almost all datasets, including five large-scale datasets and seven sub-datasets of them. Experiments show that the expressive power of the proposed model is similar to simple neural networks like convolutional networks, but not as advanced as more complex networks such as Transformer and graph convolutional networks, which is consistent with the performance of symbolic methods on KGs. Furthermore, we also analyze the impact of rule length and hyperparameters on the model's performance, which can provide insights for future research in hypergraph symbolic reasoning.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 2","pages":"578-590"},"PeriodicalIF":7.5,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143611889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tianpeng Liu;Jing Li;Amin Beheshti;Jia Wu;Jun Chang;Beihang Song;Lezhi Lian
{"title":"HEART: Historically Information Embedding and Subspace Re-Weighting Transformer-Based Tracking","authors":"Tianpeng Liu;Jing Li;Amin Beheshti;Jia Wu;Jun Chang;Beihang Song;Lezhi Lian","doi":"10.1109/TBDATA.2024.3423672","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3423672","url":null,"abstract":"Transformers-based trackers offer significant potential for integrating semantic interdependence between template and search features in tracking tasks. Transformers possess inherent capabilities for processing long sequences and extracting correlations within them. Several researchers have explored the feasibility of incorporating Transformers to model continuously changing search areas in tracking tasks. However, their approach has substantially increased the computational cost of an already resource-intensive Transformer. Additionally, existing Transformers-based trackers rely solely on mechanically employing multi-head attention to obtain representations in different subspaces, without any inherent bias. To address these challenges, we propose HEART (Historical Information Embedding And Subspace Re-weighting Tracker). Our method embeds historical information into the queries in a lightweight and Markovian manner to extract discriminative attention maps for robust tracking. Furthermore, we develop a multi-head attention distribution mechanism to retrieve the most promising subspace weights for tracking tasks. HEART has demonstrated its effectiveness on five datasets, including OTB-100, LaSOT, UAV123, TrackingNet, and GOT-10k.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 2","pages":"566-577"},"PeriodicalIF":7.5,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143611925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dual Graph Convolutional Networks for Social Network Alignment","authors":"Xiaoyu Guo;Yan Liu;Daofu Gong;Fenlin Liu","doi":"10.1109/TBDATA.2024.3423699","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3423699","url":null,"abstract":"Social network alignment aims to discover the potential correspondence between users across different social platforms. Recent advances in graph representation learning have brought a new upsurge to network alignment. Most existing representation-based methods extract local structural information of social networks from users’ neighborhoods, but the global structural information has not been fully exploited. Therefore, this manuscript proposes a dual graph convolutional networks-based method (DualNA) for social network alignment, which combines user representation learning and user alignment in a unified framework. Specifically, we design dual graph convolutional networks as feature extractors to capture the local and global structural information of social networks, and apply a two-part constraint mechanism, including reconstruction loss and contrastive loss, to jointly optimize the graph representation learning process. As a result, the learned user representations can not only preserve the local and global features of original networks, but also be distinguishable and suitable for the downstream task of social network alignment. Extensive experiments on three real-world datasets show that our proposed method outperforms all baselines. The ablation studies further illustrate the rationality and effectiveness of our method.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 2","pages":"684-695"},"PeriodicalIF":7.5,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143627828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Denoising Implicit Feedback for Graph Collaborative Filtering via Causal Intervention","authors":"Huiting Liu;Huaxiu Zhang;Peipei Li;Peng Zhao;Xindong Wu","doi":"10.1109/TBDATA.2024.3423727","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3423727","url":null,"abstract":"The performance of graph collaborative filtering (GCF) models could be affected by noisy user-item interactions. Existing studies on data denoising either ignore the nature of noise in implicit feedback or seldom consider the long-tail distribution of historical interaction data. For the first challenge, we analyze the role of noise from a causal perspective: noise is an unobservable confounder. Therefore, we use the instrumental variable for causal intervention without requiring confounder observation. For the second challenge, we consider degree distribution of nodes in the course of causal intervention. And then we propose a model named causal graph collaborative filtering (CausalGCF) to denoise implicit feedback for GCF. Specifically, we design a degree augmentation strategy as the instrumental variable. First, we divide nodes into head and tail nodes according to their degree. Then, we purify the interactions of the head nodes and enrich those of the tail nodes based on similarity. We perform degree augmentation strategy from the user and item sides to obtain two different graph structures, which are trained together with self-supervised learning. Empirical studies on four real and four synthetic datasets demonstrate the effectiveness of CausalGCF, which is more robust against noisy interactions in implicit feedback than the baselines.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 2","pages":"696-709"},"PeriodicalIF":7.5,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143629654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DBNetVizor: Visual Analysis of Dynamic Basketball Player Networks","authors":"Baofeng Chang;Guodao Sun;Sujia Zhu;Qi Jiang;Wang Xia;Jingwei Tang;Ronghua Liang","doi":"10.1109/TBDATA.2024.3423721","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3423721","url":null,"abstract":"Visual analysis has been increasingly integrated into the exploration of temporal networks, as visualization methods have the capability to present time-varying attributes and relationships of entities in an easy-to-read manner. Visualization techniques have been employed in a variety of dynamic network datasets, including social media networks, academic citation networks, and financial transaction networks. However, effectively visualizing dynamic basketball player network data, which consists of numerical networks, intensive timestamps, and subtle changes, remains a challenge for analysts. To address this issue, we propose a snapshot extraction algorithm that involves human-in-the-loop methodology to help users divide a series of networks into hierarchical snapshots for subsequent network analysis tasks, such as node exploration and network pattern analysis. Furthermore, we design and implement a prototype system, called DBNetVizor, for dynamic basketball player network data visualization. DBNetVizor integrates a graphical user interface to help users extract snapshots visually and interactively, as well as multiple linked visualization charts to display macro- and micro-level information of dynamic basketball player network data. To demonstrate the usability and efficiency of our proposed methods, we present two case studies based on dynamic basketball player network data in a competition. Additionally, we conduct an evaluation and receive positive feedback.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 2","pages":"591-605"},"PeriodicalIF":7.5,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143611823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}