Yihui Li;Yuanfang Guo;Junfu Wang;Shihao Nie;Liang Yang;Di Huang;Yunhong Wang
{"title":"ALD-GCN: Graph Convolutional Networks With Attribute-Level Defense","authors":"Yihui Li;Yuanfang Guo;Junfu Wang;Shihao Nie;Liang Yang;Di Huang;Yunhong Wang","doi":"10.1109/TBDATA.2024.3433553","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3433553","url":null,"abstract":"Graph Neural Networks(GNNs), such as Graph Convolutional Network, have exhibited impressive performance on various real-world datasets. However, many researches have confirmed that deliberately designed adversarial attacks can easily confuse GNNs on the classification of target nodes (targeted attacks) or all the nodes (global attacks). According to our observations, different attributes tend to be differently treated when the graph is attacked. Unfortunately, most of the existing defense methods can only defend at the graph or node level, which ignores the diversity of different attributes within each node. To address this limitation, we propose to leverage a new property, named Attribute-level Smoothness (ALS), which is defined based on the local differences of graph. We then propose a novel defense method, named GCN with Attribute-level Defense (ALD-GCN), which utilizes the ALS property to provide attribute-level protection to each attributes. Extensive experiments on real-world graphs have demonstrated the superiority of the proposed work and the potentials of our ALS property in the attacks.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 2","pages":"788-799"},"PeriodicalIF":7.5,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143629652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"AFS-FCM With Memory: A Model for Air Quality Multi-Dimensional Prediction With Interpretability","authors":"Zhen Peng;Wanquan Liu;Sung-Kwun Oh","doi":"10.1109/TBDATA.2024.3433467","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3433467","url":null,"abstract":"In order to represent the influences of different semantics on targets and improve the prediction with interpretability ability for multi-dimensional time series, we integrate Axiomatic Fuzzy Set (AFS) and Fuzzy Cognitive Map (FCM) with memory for fuzzy knowledge representation and prediction in this paper. The AFS is used to extract semantics of concepts for fuzzy representation using data distribution. The FCM with memory is trained to model the influence relationships between different semantics of concepts and multiple targets based on multi-dimensional time series data. And a multi- dimensional learning algorithm of AFS-FCM with memory based on gradient descent is developed to investigate the influences of different semantics of concepts on multiple targets. Finally, we validate our model by comparing with other FCMs, intrinsic interpretable models and machine learning methods for prediction of air quality multidimensional time series data, and discuss the performance of AFS-FCM with different transformation functions. The model can not only predict air quality accurately, but also explicitly reveal the specific quantitative relationship of different semantics of meteorology on air quality.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 2","pages":"810-820"},"PeriodicalIF":7.5,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143629658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Local High-Order Graph Learning for Multi-View Clustering","authors":"Zhi Wang;Qiang Lin;Yaxiong Ma;Xiaoke Ma","doi":"10.1109/TBDATA.2024.3433525","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3433525","url":null,"abstract":"As the accumulation of multi-view data continues to grow, multi-view clustering has become increasingly important in research fields like data mining. However, current methods have been criticized for their unsatisfactory performance, such as insufficient exploration of intra-view high-order relationships and poor characterization of inter-view diverse features. To overcome these challenges, we propose a novel approach called Local High-order Graph Learning for Multi-View Clustering (LHGL_MVC). Our method aims to explore high-order relationships within a view while also considering diverse information between views. In LHGL_MVC, we learn the initial graphs of each view through self-representation, which are decomposed into consistent and diverse parts to better capture the diversity of different views. Based on consistent parts, we propose a novel local high-order graph learning approach to more effectively explore high-order relationships between samples within each view. At the same time, we leverage high-order relationships between views using the rotated tensor nuclear norm. Finally, we obtain a unified graph for clustering by fusing all consistent affinity graphs and their high-order graphs with adaptive weights. All procedures are integrated into an overall objective function, which mutually promotes during the optimization process. The comprehensive experiments conducted on eleven real-world datasets demonstrate that LHGL_MVC significantly outperforms existing algorithms in various measurements, highlighting the superiority of the proposed method.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 2","pages":"761-773"},"PeriodicalIF":7.5,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143629657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wenxiang Zhang;Ye Yuan;Hang Wei;Wenjing Zhang;Bin Liu
{"title":"A Systemic Pipeline of Identifying lncRNA-Disease Associations to the Prognosis and Treatment of Hepatocellular Carcinoma","authors":"Wenxiang Zhang;Ye Yuan;Hang Wei;Wenjing Zhang;Bin Liu","doi":"10.1109/TBDATA.2024.3433380","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3433380","url":null,"abstract":"Exploring disease mechanisms at the lncRNA level provides valuable guidance for disease prognosis and treatment. Recently, there has been a surge of interest in exploring disease mechanisms via computational methods to overcome the challenge of tremendous manpower and material resources in biological experiments. However, current computational methods suffer from two main limitations: simple data structures that do not consider the close association between multiple types of data, and the lack of a systematic pathogenesis analysis that identified disease-associated lncRNAs are not applied to the downstream disease prognosis and therapeutic analysis from the perspective of data analysis. In this end, we present a systemic pipeline including disease-associated lncRNAs identification and downstream pathogenesis analysis on how the predicted lncRNAs are involved in the disease prognosis and therapy. Due to the importance of identifying disease-associated lncRNAs and the weak interpretability of existing computational identification methods, we propose a novel approach named iLncDA-PT to identify disease-associated lncRNAs considering the interactions between various bio-entities outperforming the other state-of-the-art methods, and then we conduct a systematically subsequent analysis on prognosis and therapy for a specific disease, hepatocellular carcinoma (HCC), as an example. Finally, we reveal a significant association between immune checkpoint expression, tumor microenvironment, and drug treatment.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 2","pages":"800-809"},"PeriodicalIF":7.5,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143629576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Secret Specification Based Personalized Privacy-Preserving Analysis in Big Data","authors":"Jiajun Chen;Chunqiang Hu;Zewei Liu;Tao Xiang;Pengfei Hu;Jiguo Yu","doi":"10.1109/TBDATA.2024.3433433","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3433433","url":null,"abstract":"The pursuit of refined data analysis and the preservation of privacy in Big Data pose significant concerns. Among the paramount paradigms for addressing these challenges, differential privacy stands out as a vital area of research. However, traditional differential privacy tends to be excessively restrictive when it comes to individuals’ control over their own data. It often treats all data as inherently sensitive, whereas in reality, not all information related to individuals is sensitive and requires an identical level of protection. In this paper, we define secret specification-based differential privacy (SSDP), where the term “secret specification” implies enabling users to decide what aspects of their information are sensitive and what are not, prior to data generation or processing. By allowing individuals to independently define their secret specifications, the SSDP achieves personalized privacy protection and facilitates effective data analysis. To enable the targeted application of SSDP, we further present task-specific mechanisms designed for database and graph data scenarios. Finally, we assess the trade-offs between privacy and utility inherent in the proposed mechanisms through comparative experiments conducted on real datasets, demonstrating the utility enhancements offered by SSDP mechanisms in practical applications.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 2","pages":"774-787"},"PeriodicalIF":7.5,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143629573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multimodal Deep Learning for Semisupervised Classification of Hyperspectral and LiDAR Data","authors":"Chunyu Pu;Yingxu Liu;Shuai Lin;Xu Shi;Zhengying Li;Hong Huang","doi":"10.1109/TBDATA.2024.3433494","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3433494","url":null,"abstract":"Deep learning (DL) has emerged as a competitive method in single-modality-dominated remote sensing (RS) data classification tasks, but its classification performance inevitably encounters a bottleneck due to the lack of representation diversity in complicated spatial structures with various land cover types. Therefore, the RS community has been actively researching multimodal feature learning techniques for the same scene. However, expert annotation of multisource data consumes a significant amount of time and cost. This article proposes an end-to-end method called semisupervised multimodal dual-path network (SMDN). This method simultaneously explores spatial-spectral features contained in hyperspectral images (HSI) and elevation information provided by light detection and ranging (LiDAR). SMDN exploits an unsupervised novel encoder-decoder structure as the backbone network to construct a multimodal DL architecture by jointly training with a data-specific branch. To obtain discriminative multimodal representations, SMDN is able to guide the collaborative training of two different unsupervised features mapped in the latent subspace with limited labeled training samples. Furthermore, after a simple modification of the fusion strategy in SMDN, it can be applied to unsupervised classification problems. Experimental results on benchmark RS datasets validate the effectiveness of the developed SMDN compared over many state-of-the-art methods.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 2","pages":"821-834"},"PeriodicalIF":7.5,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143629575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ruize Shi;Hong Huang;Xue Lin;Kehan Yin;Wei Zhou;Hai Jin
{"title":"Efficient Learning for Billion-Scale Heterogeneous Information Networks","authors":"Ruize Shi;Hong Huang;Xue Lin;Kehan Yin;Wei Zhou;Hai Jin","doi":"10.1109/TBDATA.2024.3428331","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3428331","url":null,"abstract":"<i>Heterogeneous graph neural networks (HGNNs)</i> excel at understanding <i>heterogeneous information networks</i> (HINs) and have demonstrated state-of-the-art performance across numerous tasks. However, previous works tend to study small datasets, which deviate significantly from real-world scenarios. More specifically, their heterogeneous message passing results in substantial memory and time overheads, as it requires aggregating heterogeneous neighbor features multiple times. To address this, we propose an <i>Efficient Heterogeneous Graph Neural Network</i> (EHGNN) that leverages <i>heterogeneous personalized PageRank</i> (HPPR) to preserve the influence between all nodes, then approximates message passing and selectively loads neighbor information for one aggregation, significantly reducing memory and time usage. In addition, we employ some lightweight techniques to ensure the performance of EHGNN. Evaluations on various HIN benchmarks in node classification and link prediction tasks unequivocally establish the superiority of EHGNN, surpassing the State-of-the-Art by 11<inline-formula><tex-math>$%$</tex-math></inline-formula> in terms of performance. In addition, EHGNN achieves a remarkable 400<inline-formula><tex-math>$%$</tex-math></inline-formula> boost in training and inference speed while utilizing less memory. Notably, EHGNN can handle a 200-million-node, 1-billion-link HIN within 18 hours on a single machine, using only 170 GB of memory, which is much lower than the previous minimum requirement of 600 GB.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 2","pages":"748-760"},"PeriodicalIF":7.5,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10598347","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143629655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust Joint Graph Learning for Multi-View Clustering","authors":"Yanfang He;Umi Kalsom Yusof","doi":"10.1109/TBDATA.2024.3426277","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3426277","url":null,"abstract":"In real-world applications, multi-view datasets often comprise diverse data sources or views, inevitably accompanied by noise. However, most existing graph-based multi-view clustering methods utilize fixed graph similarity matrices to handle noisy multi-view data, necessitating additional clustering steps for obtaining the final clustering. This paper proposes a Robust Joint Graph learning for Multi-view Clustering (RJGMC) based on <inline-formula><tex-math>$ ell _{1}$</tex-math></inline-formula>-norm to address these problems. RJGMC integrates the learning processes of the graph similarity matrix and the unified graph matrix to improve mutual reinforcement between these graph matrices. Simultaneously, employing the <inline-formula><tex-math>$ ell _{1}$</tex-math></inline-formula>-norm to generate the unified graph matrix enhances the algorithm's robustness. A rank constraint is imposed on the graph Laplacian matrix of the unified graph matrix, where clustering can be divided directly without additional processing. In addition, we also introduce a method for automatically assigning optimal weights to each view. The optimization of this objective function employs an alternating optimization approach. Experimental results on synthetic and real-world datasets demonstrate that the proposed method outperforms other state-of-the-art techniques regarding clustering performance and robustness.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 2","pages":"722-734"},"PeriodicalIF":7.5,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143629653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Worker Similarity-Based Label Completion for Crowdsourcing","authors":"Xue Wu;Liangxiao Jiang;Wenjun Zhang;Chaoqun Li","doi":"10.1109/TBDATA.2024.3426310","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3426310","url":null,"abstract":"In real-world crowdsourcing scenarios, it is a common phenomenon that each worker only annotates a few instances, resulting in a significantly sparse crowdsourcing label matrix. Consequently, only a small number of workers influence the inferred integrated label of each instance, which may weaken the performance of label integration algorithms. To address this problem, we propose a novel label completion algorithm called Worker Similarity-based Label Completion (WSLC). WSLC is grounded on the assumption that workers with similar cognitive abilities will annotate similar labels on the same instances. Specifically, we first construct a data set for each worker that includes all instances annotated by this worker and learn a feature vector for each worker. Then, we define a metric based on cosine similarity to estimate worker similarity based on the learned feature vectors. Finally, we complete the labels for each worker on unannotated instances based on the worker similarity and the annotations of similar workers. The experimental results on one real-world and 34 simulated crowdsourced data sets consistently show that WSLC effectively addresses the problem of the sparse crowdsourcing label matrix and enhances the integration accuracies of label integration algorithms.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 2","pages":"710-721"},"PeriodicalIF":7.5,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143629650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Graph Convolutional Networks With Collaborative Feature Fusion for Sequential Recommendation","authors":"Jianping Gou;Youhui Cheng;Yibing Zhan;Baosheng Yu;Weihua Ou;Yi Zhang","doi":"10.1109/TBDATA.2024.3426355","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3426355","url":null,"abstract":"Sequential recommendation seeks to understand user preferences based on their past actions and predict future interactions with items. Recently, several techniques for sequential recommendation have emerged, primarily leveraging graph convolutional networks (GCNs) for their ability to model relationships effectively. However, real-world scenarios often involve sparse interactions, where early and recent short-term preferences play distinct roles in the recommendation process. Consequently, vanilla GCNs struggle to effectively capture the explicit correlations between these early and recent short-term preferences. To address these challenges, we introduce a novel approach termed Graph Convolutional Networks with Collaborative Feature Fusion (COFF). Specifically, our method addresses the issue by initially dividing each user interaction sequence into two segments. We then construct two separate graphs for these segments, aiming to capture the user's early and recent short-term preferences independently. To obtain robust prediction, we employ multiple GCNs in a collaborative distillation manner, incorporating a feature fusion module to establish connections between the early and recent short-term preferences. This approach enables a more precise representation of user preferences. Experimental evaluations conducted on five popular sequential recommendation datasets demonstrate that our COFF model outperforms recent state-of-the-art methods in terms of recommendation accuracy.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 2","pages":"735-747"},"PeriodicalIF":7.5,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143627852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}