Ren Li;Huazhong Liu;Xiaotong Zhou;Jiawei Wang;Jihong Ding;Laurence T. Yang;Hua Li;Yunfan Zhang
{"title":"Tucker-Based High-Accuracy Multi-Modal Clustering for Social Information Network","authors":"Ren Li;Huazhong Liu;Xiaotong Zhou;Jiawei Wang;Jihong Ding;Laurence T. Yang;Hua Li;Yunfan Zhang","doi":"10.1109/TBDATA.2024.3524830","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3524830","url":null,"abstract":"With the explosion of social media platforms, a substantial amount of data is generated from social information network. Tensor-based multi-modal clustering methods have been widely applied in various scenarios of social information network by mining potential correlative relationships from large-scale heterogeneous data. Nevertheless, the accuracy and efficiency of tensor-based multi-modal clustering methods are seriously restricted by noise data and the curse of dimensionality. Therefore, this paper presents a Tucker-based multi-modal clustering (TuMC) and an improved TuMC (ITuMC) to enhance the accuracy and efficiency of multi-modal clustering. First, we propose two Tucker-based attribute weight ranking learning approaches to calculate weight tensor efficiently. Then, we present a calculation approach for Tucker-based selective weighted tensor distance (SWTD) and a TuMC method. Meanwhile, an ITuMC method is explored by optimizing the calculation efficiency of the SWTD to further improve clustering speed. Finally, we present a Tucker-based multi-modal clustering and service framework for social information network. Extensive experimental results based on social Geolife GPS trajectory and electricity consumption datasets demonstrate that the TuMC and ITuMC methods can cluster multi-source heterogeneous data with both higher accuracy and efficiency under complex social information network by DVI, AR and execution time measurement.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 4","pages":"1677-1691"},"PeriodicalIF":7.5,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144597724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tun Li;Di Lei;Qian Li;Rong Wang;Chaolong Jia;Yunpeng Xiao
{"title":"A Marketing Topic Traceability Model Based on Domain Preference and Heterogeneous Network","authors":"Tun Li;Di Lei;Qian Li;Rong Wang;Chaolong Jia;Yunpeng Xiao","doi":"10.1109/TBDATA.2024.3524831","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3524831","url":null,"abstract":"The development of social networks has prompted a shift in marketing strategies, with a surging demand for marketing in vertical domains characterized by high user stickiness and specialization. To address this, we propose a traceability model based on domain preference and heterogeneous networks. First, considering the problem of marketing topic vertical domains features metric and the influence of users’ preference degree for domains on topic propagation, the domains are treated as latent semantics, and the user-topic association matrix sparse matrix is densified using a latent factor model to mine the domain preference information efficiently. Second, considering the complexity of the association between multi-type elements in marketing topics, the HLN2vec (Heterogeneous Layer-wise Networks) model is proposed. This model uses heterogeneous network representation learning and incorporates multi-layer attention networks to learn the representations to portray a marketing topic’s key elements and their relationships. Finally, this paper proposes the DP-Rank(Domain Preference-based) algorithm, which uses domain preference features and an adaptive random walking strategy to quantify element influence. Based on experiments, the proposed model robustly applies in social networks and exhibits clear advantages in measuring vertical domain features of marketing topics, constructing multi-type element relationship networks, and discovering core element influence.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 4","pages":"1692-1706"},"PeriodicalIF":7.5,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144598063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qingxi Peng;Zhenjie Weng;Wei Wang;Xinyi Wang;Lan You
{"title":"A Collaborative Network-Based Retrieval Model for Open Source Domain Experts","authors":"Qingxi Peng;Zhenjie Weng;Wei Wang;Xinyi Wang;Lan You","doi":"10.1109/TBDATA.2024.3524829","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3524829","url":null,"abstract":"Aiming at the problem that the GitHub platform only supports the retrieval of developers through their usernames and it is difficult to directly obtain developers' expertise information, this paper proposes an open source domain expert retrieval model (OSDERM) based on the network representation learning algorithm OSC2vec (Open Source Collaboration to Vector). The model mainly consists of two core parts: Expert Profiling and Expert Finding. Expert Profiling aims to enrich the expertise information in the search results by labeling the expertise of developers; while Expert Finding achieves rapid location of the most suitable domain experts through keyword matching, which greatly saves the time and effort of searching for experts in the open source community. Experiments using the GitHub ecological dataset show that the model outperforms existing comparative algorithms in discovering open source domain experts, and can provide an effective reference for enterprise recruitment","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 4","pages":"1720-1732"},"PeriodicalIF":7.5,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144598066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Casformer: Information Popularity Prediction With Adaptive Cascade Sampling and Graph Transformer in Social Networks","authors":"Biao Wang;Zhao Li;Zenghui Xu;Ji Zhang","doi":"10.1109/TBDATA.2024.3524839","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3524839","url":null,"abstract":"Predicting the popularity of information in social networks is crucial for effective social marketing and recommendation systems. However, accurately comprehending the complex dynamics of information diffusion remains a challenging task. Existing methods, including feature-based approaches, point process models, and deep learning techniques, often fail to capture the fine-grained features of information cascades, such as dynamic diffusion patterns, cascade statistics, and the interplay between spatial and temporal information. To address these limitations, we propose Casformer, a novel graph-based Transformer architecture that effectively learns both micro-level time-aware structural information and macro-level long-term influence along the information propagation process. Casformer employs a cascade attention network (CAT) to capture the micro-level features and a Transformer model to learn the macro-level influence. Furthermore, we introduce an adaptive cascade graph sampling strategy based on the temporal diffusion pattern and cascade statistics of information to obtain the most informative cascade graph sequence. By leveraging multi-level fine-grained evolving features of information cascades, Casformer achieves high accuracy in information popularity prediction. Experimental results on real-world social network and scientific citation network datasets demonstrate the effectiveness and superiority of Casformer compared to state-of-the-art methods in information popularity prediction.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 4","pages":"1652-1663"},"PeriodicalIF":7.5,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144597734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reducing Re-Indexing for Top-k Personalized PageRank Computation on Dynamic Graphs","authors":"Tsuyoshi Yamashita;Naoki Matsumoto;Kunitake Kaneko","doi":"10.1109/TBDATA.2024.3524833","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3524833","url":null,"abstract":"Top-k Personalized PageRank (PPR) is a graph analysis method used to determine the <inline-formula><tex-math>$k$</tex-math></inline-formula> most important nodes with respect to a source node. To realize fast Top-k PPR computation, indexing for each node is effective. When we apply the index-based Top-k PPR methods to dynamic graphs, the index becomes stale with edge updates, and index correction is required. Although the existing methods perform index correction for every update to guarantee Top-k PPR accuracy, they involve heavy re-indexing computation or significant memory overhead. This paper proposes a method that achieves comparable accuracy to guaranteed methods while significantly reducing re-indexing by focusing on the fact that index references are concentrated on the nodes whose index is unlikely to change due to edge updates. In particular, our method omits re-indexing as long as we achieve comparable accuracy. Furthermore, our method involves the minimum memory overhead among the existing index-based methods. The space complexity of the index is <inline-formula><tex-math>$Theta (n + m)$</tex-math></inline-formula>, where <inline-formula><tex-math>$n$</tex-math></inline-formula> and <inline-formula><tex-math>$m$</tex-math></inline-formula> are the number of nodes and edges of the graph, respectively. The evaluation results using real-world datasets show that our method achieves more than 0.999 Normalized Discounted Cumulative Gain until 20% of edges are updated from index generation.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 4","pages":"1707-1719"},"PeriodicalIF":7.5,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10819623","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144598067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Khondhaker Al Momin;Arif Mohaimin Sadri;Kristin Olofsson;K.K. Muraleetharan;Hugh Gladwin
{"title":"Information Switching Patterns of Risk Communication in Social Media During Disasters","authors":"Khondhaker Al Momin;Arif Mohaimin Sadri;Kristin Olofsson;K.K. Muraleetharan;Hugh Gladwin","doi":"10.1109/TBDATA.2024.3524828","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3524828","url":null,"abstract":"In an era increasingly affected by natural and human-caused disasters, the role of social media in disaster communication has become ever more critical. Despite substantial research on social media use during crises, a significant gap remains in detecting crisis-related misinformation. Detecting deviations in information is fundamental for identifying and curbing the spread of misinformation. This study introduces a novel <italic>Information Switching Pattern Model</i> to identify dynamic shifts in perspectives among users who mention each other in crisis-related narratives on social media. These shifts serve as evidence of crisis misinformation affecting user-mention network interactions. The study utilizes advanced natural language processing, network science, and census data to analyze geotagged tweets related to compound disaster events in Oklahoma in 2022. The impact of misinformation is revealed by distinct engagement patterns among various user types, such as bots, private organizations, non-profits, government agencies, and news media throughout different disaster stages. These patterns show how different disasters influence public sentiment, highlight the heightened vulnerability of mobile home communities, and underscore the importance of education and transportation access in crisis response. Understanding these engagement patterns is crucial for detecting misinformation and leveraging social media as an effective tool for risk communication during disasters.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 4","pages":"1733-1744"},"PeriodicalIF":7.5,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10820023","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144606221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jinsong Chen;Chang Liu;Kaiyuan Gao;Gaichao Li;Kun He
{"title":"NAGphormer+: A Tokenized Graph Transformer With Neighborhood Augmentation for Node Classification in Large Graphs","authors":"Jinsong Chen;Chang Liu;Kaiyuan Gao;Gaichao Li;Kun He","doi":"10.1109/TBDATA.2024.3524081","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3524081","url":null,"abstract":"Graph Transformers, emerging as a new architecture for graph representation learning, suffer from the quadratic complexity and can only handle graphs with at most thousands of nodes. To this end, we propose a Neighborhood Aggregation Graph Transformer (NAGphormer) that treats each node as a sequence containing a series of tokens constructed by our proposed Hop2Token module. For each node, Hop2Token aggregates the neighborhood features from different hops into different representations, producing a sequence of token vectors as one input. In this way, NAGphormer could be trained in a mini-batch manner and thus could scale to large graphs with millions of nodes. To further enhance the model's generalization, we propose NAGphormer+, an extended model of NAGphormer with a novel data augmentation method called Neighborhood Augmentation (NrAug). Based on the output of Hop2Token, NrAug simultaneously augments the features of neighborhoods from global as well as local views. In this way, NAGphormer+ can fully utilize the neighborhood information of multiple nodes, thereby undergoing more comprehensive training and improving the model's generalization capability. Extensive experiments on benchmark datasets from small to large demonstrate the superiority of NAGphormer+ against existing graph Transformers and mainstream GNNs, as well as the original NAGphormer.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 4","pages":"2085-2098"},"PeriodicalIF":7.5,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144598062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Federated Multi-View Multi-Label Classification","authors":"Hongdao Meng;Yongjian Deng;Qiyu Zhong;Yipeng Wang;Zhen Yang;Gengyu Lyu","doi":"10.1109/TBDATA.2024.3522812","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3522812","url":null,"abstract":"Multi-view multi-label classification is a crucial machine learning paradigm aimed at building robust multi-label predictors by integrating heterogeneous features from various sources while addressing multiple correlated labels. However, in real-world applications, concerns over data confidentiality and security often prevent data exchange or fusion across different sources, leading to the challenging issue of data islands. To tackle this problem, we propose a general federated multi-view multi-label classification method, FMVML, which integrates a novel multi-view multi-label classification technique into a federated learning framework. This approach enables cross-view feature fusion and multi-label semantic classification while preserving the data privacy of each independent source. Within this federated framework, we first extract view-specific information from each individual client to capture unique characteristics and then consolidate consensus information from different views on the global server to represent shared features. Unlike previous methods, our approach enhances cross-view fusion and semantic expression by jointly capturing both feature and semantic aspects of specificity and commonality. The final label predictions are generated by combining the view-specific predictions from individual clients and the consensus predictions from the global server. Extensive experiments across various applications demonstrate that FMVML fully leverages multi-view data in a privacy-preserving manner and consistently outperforms state-of-the-art methods.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 4","pages":"2072-2084"},"PeriodicalIF":7.5,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144598078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unlocking Large Language Model Power in Industry: Privacy-Preserving Collaborative Creation of Knowledge Graph","authors":"Liqiao Xia;Junming Fan;Ajith Parlikad;Xiao Huang;Pai Zheng","doi":"10.1109/TBDATA.2024.3522814","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3522814","url":null,"abstract":"Semantic expertise remains a reliable foundation for industrial decision-making, while Large Language Models (LLMs) can augment the often limited empirical knowledge by generating domain-specific insights, though the quality of this generative knowledge is uncertain. Integrating LLMs with the collective wisdom of multiple stakeholders could enhance the quality and scale of knowledge, yet this integration might inadvertently raise privacy concerns for stakeholders. In response to this challenge, Federated Learning (FL) is harnessed to improve the knowledge base quality by cryptically leveraging other stakeholders’ knowledge, where knowledge base is represented in Knowledge Graph (KG) form. Initially, a multi-field hyperbolic (MFH) graph embedding method vectorizes entities, furnishing mathematical representations in lieu of solely semantic meanings. The FL framework subsequently encrypted identifies and fuses common entities, whereby the updated entities’ embedding can refine other private entities’ embedding locally, thus enhancing the overall KG quality. Finally, the KG complement method refines and clarifies triplets to improve the overall quality of the KG. An experiment assesses the proposed approach across different industrial KGs, confirming its effectiveness as a viable solution for collaborative KG creation, all while maintaining data security.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 4","pages":"2046-2060"},"PeriodicalIF":7.5,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144597812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Online Non-Stationary Pricing Incentives for Budget-Limited Crowdsensing","authors":"Jiajun Sun;Dianliang Wu","doi":"10.1109/TBDATA.2024.3522804","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3522804","url":null,"abstract":"The promising applications of mobile crowdsensing (MCS) have attracted much research interest recently, especially for the posted-pricing scenes. However, existing works mainly focus on the stationary MCS, no matter whether in a stochastic or adversarial environment, where each price (or arm) remains identical over time. However, in many realistic MCS applications such as environment monitoring and recommendation systems, stationary bandits do not model the posted-pricing sequential decision problems where the reward distributions of each price (arm) and cost distribution vary over time due to the changes in light intensity and mobile devices’ remnant energy. While in this paper, we study a more general submodular crowdsensing scene to address the non-stationary sequential pricing problems, and construct a monotonic submodular function merging the marginal reward and temporal difference errors (TD-errors) of deep reinforcement learning (DRL). Moreover, we explore a weighted budget-limited non-stationary pricing mechanism by using the deep deterministic policy gradient (DDPG) method for submodular MCS from the perspectives of the hard-drop and soft-drop weights. Our mechanism can readily be extended to non-submodular MCS or other MCS scenes. Extensive simulations demonstrate that our mechanism outweighs existing benchmarks.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 4","pages":"2025-2035"},"PeriodicalIF":7.5,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144597723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}