{"title":"Streaming Local Community Detection Through Approximate Conductance","authors":"Meng Wang;Yanhao Yang;David Bindel;Kun He","doi":"10.1109/TBDATA.2023.3310251","DOIUrl":"10.1109/TBDATA.2023.3310251","url":null,"abstract":"Community is a universal structure in various complex networks, and community detection is a fundamental task for network analysis. With the rapid growth of network scale, networks are massive, changing rapidly, and could naturally be modeled as graph streams. Due to the limited memory and access constraint in graph streams, existing non-streaming community detection methods are no longer applicable. This raises an emerging need for online approaches. In this work, we consider the problem of uncovering the local community containing a few query nodes in graph streams, termed streaming local community detection. This new problem raised recently is more challenging for community detection, and only a few works address this online setting. Correspondingly, we design an online single-pass streaming local community detection approach. Inspired by the local property of communities, our method samples the local structure around the query nodes in graph streams and extracts the target community on the sampled subgraph using our proposed metric called approximate conductance. Comprehensive experiments show that our method remarkably outperforms the streaming baseline on both effectiveness and efficiency, and even achieves similar accuracy compared to the state-of-the-art non-streaming local community detection methods that use static and complete graphs.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 1","pages":"12-22"},"PeriodicalIF":7.2,"publicationDate":"2023-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89772759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaosai Huang;Jing Li;Jia Wu;Jun Chang;Donghua Liu
{"title":"Transfer Learning With Document-Level Data Augmentation for Aspect-Level Sentiment Classification","authors":"Xiaosai Huang;Jing Li;Jia Wu;Jun Chang;Donghua Liu","doi":"10.1109/TBDATA.2023.3310267","DOIUrl":"10.1109/TBDATA.2023.3310267","url":null,"abstract":"Aspect-level sentiment classification (ASC) seeks to reveal the emotional tendency of a designated aspect of a text. Some researchers have recently tried to exploit large amounts of document-level sentiment classification (DSC) data available to help improve the performance of ASC models through transfer learning. However, these studies often ignore the difference in sentiment distribution between document-level and aspect-level data without preprocessing the document-level knowledge. Our study provides a transfer learning with document-level data augmentation (TL-DDA) framework to transfer more accurate document-level knowledge to the ASC model by means of \u0000<italic>document-level data augmentation</i>\u0000 and \u0000<italic>attention fusion</i>\u0000. First, we use \u0000<italic>document data selection</i>\u0000 and \u0000<italic>text concatenation</i>\u0000 to produce document-level data with various sentiment distributions. The augmented document data is then utilized for pre-training a well-designed DSC model. Finally, after \u0000<italic>attention adjustment</i>\u0000, we \u0000<italic>fuse the word attention</i>\u0000 obtained from this DSC model into the ASC model. Results of experiments utilizing two publicly available datasets suggest that TL-DDA is reliable.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 6","pages":"1643-1657"},"PeriodicalIF":7.2,"publicationDate":"2023-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62972763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xingyu Cao;Xiangtao Zhang;Ce Zhu;Jiani Liu;Yipeng Liu
{"title":"TS-RTPM-Net: Data-Driven Tensor Sketching for Efficient CP Decomposition","authors":"Xingyu Cao;Xiangtao Zhang;Ce Zhu;Jiani Liu;Yipeng Liu","doi":"10.1109/TBDATA.2023.3310254","DOIUrl":"10.1109/TBDATA.2023.3310254","url":null,"abstract":"Tensor decomposition is widely used in feature extraction, data analysis, and other fields. As a means of tensor decomposition, the robust tensor power method based on tensor sketch (TS-RTPM) can quickly mine the potential features of tensor, but in some cases, its approximation performance is limited. In this paper, we propose a data-driven framework called TS-RTPM-Net, which improves the estimation accuracy of TS-RTPM by jointly training the TS value matrices with the RTPM initial matrices. It also uses two greedy initialization algorithms to optimize the TS location matrices. In addition, TS-RTPM-Net accelerates TS-RTPM by using fast power iteration modules. Comparative experiments on real-world datasets verify that TS-RTPM-Net outperforms TS-RTPM in terms of estimation accuracy, running speed, and memory consumption.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 1","pages":"1-11"},"PeriodicalIF":7.2,"publicationDate":"2023-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62972644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improved Box Embeddings for Fine-Grained Entity Typing","authors":"Yixiu Qin;Yizhao Wang;Jiawei Li;Shun Mao;He Wang;Yuncheng Jiang","doi":"10.1109/TBDATA.2023.3310239","DOIUrl":"10.1109/TBDATA.2023.3310239","url":null,"abstract":"Different from traditional vector-based fine-grained entity typing methods, the box-based method is more effective in capturing the complex relationships between entity mentions and entity types. The box-based fine-grained entity typing method projects entity types and entity mentions into high-dimensional box space, where entity types and entity mentions are embedded as \u0000<italic>d</i>\u0000-dimensional hyperrectangles. However, the impacts of entity types are not considered during classification in high-dimensional box space, and the model cannot be optimized precisely when two boxes are completely separated or overlapped in high-dimensional box space. Based on the above shortcomings, an \u0000<bold>I</b>\u0000mproved \u0000<bold>B</b>\u0000ox \u0000<bold>E</b>\u0000mbeddings (IBE) method for fine-grained entity typing is proposed in this work. The IBE not only introduces the impacts of entity types during classification in high-dimensional box space, but also proposes a distance based module to optimize the model precisely when two boxes are completely separated or overlapped in high-dimensional box space. Experimental results on four fine-grained entity typing datasets verify the effectiveness of the proposed IBE, demonstrating that IBE is a state-of-the-art method for fine-grained entity typing.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 6","pages":"1631-1642"},"PeriodicalIF":7.2,"publicationDate":"2023-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62972523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PredLife: Predicting Fine-Grained Future Activity Patterns","authors":"Wenjing Li;Xiaodan Shi;Dou Huang;Xudong Shen;Jinyu Chen;Hill Hiroki Kobayashi;Haoran Zhang;Xuan Song;Ryosuke Shibasaki","doi":"10.1109/TBDATA.2023.3310241","DOIUrl":"10.1109/TBDATA.2023.3310241","url":null,"abstract":"Activity pattern prediction is a critical part of urban computing, urban planning, intelligent transportation, and so on. Based on a dataset with more than 10 million GPS trajectory records collected by mobile sensors, this research proposed a CNN-BiLSTM-VAE-ATT-based encoder-decoder model for fine-grained individual activity sequence prediction. The model combines the long-term and short-term dependencies crosswise and also considers randomness, diversity, and uncertainty of individual activity patterns. The proposed results show higher accuracy compared to the ten baselines. The model can generate high diversity results while approximating the original activity patterns distribution. Moreover, the model also has interpretability in revealing the time dependency importance of the activity pattern prediction.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 6","pages":"1658-1669"},"PeriodicalIF":7.2,"publicationDate":"2023-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62972573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cosine Multilinear Principal Component Analysis for Recognition","authors":"Feng Han;Chengcai Leng;Bing Li;Anup Basu;Licheng Jiao","doi":"10.1109/TBDATA.2023.3301389","DOIUrl":"10.1109/TBDATA.2023.3301389","url":null,"abstract":"Existing two-dimensional principal component analysis methods can only handle second-order tensors (i.e., matrices). However, with the advancement of technology, tensors of order three and higher are gradually increasing. This brings new challenges to dimensionality reduction. Thus, a multilinear method called MPCA was proposed. Although MPCA can be applied to all tensors, using the square of the F-norm makes it very sensitive to outliers. Several two-dimensional methods, such as Angle 2DPCA, have good robustness but cannot be applied to all tensors. We extend the robust Angle 2DPCA method to a multilinear method and propose Cosine Multilinear Principal Component Analysis (CosMPCA) for tensor representation. Our CosMPCA method considers the relationship between the reconstruction error and projection scatter and selects the cosine metric. In addition, our method naturally uses the F-norm to reduce the impact of outliers. We introduce an iterative algorithm to solve CosMPCA. We provide detailed theoretical analysis in both the proposed method and the analysis of the algorithm. Experiments show that our method is robust to outliers and is suitable for tensors of any order.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 6","pages":"1620-1630"},"PeriodicalIF":7.2,"publicationDate":"2023-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62972600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive Powerball Stochastic Conjugate Gradient for Large-Scale Learning","authors":"Zhuang Yang","doi":"10.1109/TBDATA.2023.3300546","DOIUrl":"10.1109/TBDATA.2023.3300546","url":null,"abstract":"The extreme success of stochastic optimization (SO) in large-scale machine learning problems, information retrieval, bioinformatics, etc., has been widely reported, especially in recent years. As an effective tactic, conjugate gradient (CG) has been gaining its popularity in accelerating SO algorithms. This paper develops a novel type of stochastic conjugate gradient descent (SCG) algorithms from the perspective of the Powerball strategy and the hypergradient descent (HD) technique. The crucial idea behind the resulting methods is inspired by pursuing the equilibrium of ordinary differential equations (ODEs). We elucidate the effect of the Powerball strategy in SCG algorithms. The introduction of HD, on the other side, makes the resulting methods work with an online learning rate. Meanwhile, we provide a comprehension of the theoretical results for the resulting algorithms under non-convex assumptions. As a byproduct, we bridge the gap between the learning rate and powered stochastic optimization (PSO) algorithms, which is still an open problem. Resorting to numerical experiments on numerous benchmark datasets, we test the parameter sensitivity of the proposed methods and demonstrate the superior performance of our new algorithms over state-of-the-art algorithms.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 6","pages":"1598-1606"},"PeriodicalIF":7.2,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62972533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chenxu Wang;Fuli Feng;Yang Zhang;Qifan Wang;Xunhan Hu;Xiangnan He
{"title":"Rethinking Missing Data: Aleatoric Uncertainty-Aware Recommendation","authors":"Chenxu Wang;Fuli Feng;Yang Zhang;Qifan Wang;Xunhan Hu;Xiangnan He","doi":"10.1109/TBDATA.2023.3300547","DOIUrl":"https://doi.org/10.1109/TBDATA.2023.3300547","url":null,"abstract":"Historical interactions are the default choice for recommender model training, which typically exhibit high sparsity, i.e., most user-item pairs are unobserved missing data. A standard choice is treating the missing data as negative training samples and estimating interaction likelihood between user-item pairs along with the observed interactions. In this way, some potential interactions are inevitably mislabeled during training, which will hurt the model fidelity, hindering the model to recall the mislabeled items, especially the long-tail ones. In this work, we investigate the mislabeling issue from a new perspective of \u0000<italic>aleatoric uncertainty</i>\u0000, which describes the inherent randomness of missing data. The randomness pushes us to go beyond merely the interaction likelihood and embrace aleatoric uncertainty modeling. Towards this end, we propose a new \u0000<italic>Aleatoric Uncertainty-aware Recommendation</i>\u0000 (AUR) framework that consists of a new uncertainty estimator along with a normal recommender model. According to the theory of aleatoric uncertainty, we derive a new recommendation objective to learn the estimator. As the chance of mislabeling reflects the potential of a pair, AUR makes recommendations according to the uncertainty, which is demonstrated to improve the recommendation performance of less popular items without sacrificing the overall performance. We instantiate AUR on three representative recommender models: Matrix Factorization (MF), LightGCN, and VAE from mainstream model architectures. Extensive results on four real-world datasets validate the effectiveness of AUR w.r.t. better recommendation results, especially on long-tail items.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 6","pages":"1607-1619"},"PeriodicalIF":7.2,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138138211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Black-Box Adversarial Attack Method via Nesterov Accelerated Gradient and Rewiring Towards Attacking Graph Neural Networks","authors":"Shu Zhao;Wenyu Wang;Ziwei Du;Jie Chen;Zhen Duan","doi":"10.1109/TBDATA.2023.3296936","DOIUrl":"10.1109/TBDATA.2023.3296936","url":null,"abstract":"Recent studies have shown that Graph Neural Networks (GNNs) are vulnerable to well-designed and imperceptible adversarial attack. Attacks utilizing gradient information are widely used in the field of attack due to their simplicity and efficiency. However, several challenges are faced by gradient-based attacks: 1) Generate perturbations use white-box attacks (i.e., requiring access to the full knowledge of the model), which is not practical in the real world; 2) It is easy to drop into local optima; and 3) The perturbation budget is not limited and might be detected even if the number of modified edges is small. Faced with the above challenges, this article proposes a black-box adversarial attack method, named NAG-R, which consists of two modules known as \u0000<bold>N</b>\u0000esterov \u0000<bold>A</b>\u0000ccelerated \u0000<bold>G</b>\u0000radient attack module and \u0000<bold>R</b>\u0000ewiring optimization module. Specifically, inspired by adversarial attacks on images, the first module generates perturbations by introducing Nesterov Accelerated Gradient (NAG) to avoid falling into local optima. The second module keeps the fundamental properties of the graph (e.g., the total degree of the graph) unchanged through a rewiring operation, thus ensuring that perturbations are imperceptible. Intensive experiments show that our method has significant attack success and transferability over existing state-of-the-art gradient-based attack methods.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 6","pages":"1586-1597"},"PeriodicalIF":7.2,"publicationDate":"2023-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62972856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-Discriminator Active Adversarial Network for Multi-Center Brain Disease Diagnosis","authors":"Qi Zhu;Qiming Yang;Mingming Wang;Xiangyu Xu;Yuwu Lu;Wei Shao;Daoqiang Zhang","doi":"10.1109/TBDATA.2023.3294000","DOIUrl":"10.1109/TBDATA.2023.3294000","url":null,"abstract":"Multi-center analysis has attracted increasing attention in brain disease diagnosis, because it provides effective approaches to improve disease diagnostic performance by making use of the information from different centers. However, in practical multi-center applications, data uncertainty is more common than that in single center, which brings challenge to robust modeling of diagnosis. In this article, we proposed a multi-discriminator active adversarial network (MDAAN) to alleviate the uncertainties at the center, feature, and label levels for multi-center brain disease diagnosis. First, we extract the latent invariant representation of the source center and target center to reduce domain shift by adversarial learning strategy. Second, the proposed method adaptively evaluates the contribution of different source centers in fusion by measuring data distribution difference between source and target center. Moreover, only the hard learning samples in target center are identified to label with low sample annotation cost. Finally, we treat the selected samples as the auxiliary domain to alleviate the negative transfer and improve the robustness of the multi-center model. We extensively compare the proposed approach with several state-of-the-art multi-center methods on the five-center schizophrenia dataset, and the results demonstrate that our method is superior to the previous methods in identifying brain disease.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 6","pages":"1575-1585"},"PeriodicalIF":7.2,"publicationDate":"2023-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62972246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}