{"title":"2025 Reviewers List*","authors":"","doi":"10.1109/TBDATA.2026.3652336","DOIUrl":"https://doi.org/10.1109/TBDATA.2026.3652336","url":null,"abstract":"","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"12 1","pages":"301-306"},"PeriodicalIF":5.7,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11357242","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145982289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaoxuan Zhang;Xiujuan Lei;Ling Guo;Ming Chen;Fang-Xiang Wu;Yi Pan
{"title":"Dual-Channel Learning Framework for miRNA-Drug Interaction Prediction Based on Structural Features and Signed Bipartite Graph Neural Network","authors":"Xiaoxuan Zhang;Xiujuan Lei;Ling Guo;Ming Chen;Fang-Xiang Wu;Yi Pan","doi":"10.1109/TBDATA.2025.3639954","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3639954","url":null,"abstract":"MicroRNAs (miRNAs) play a vital role in regulating a wide range of biological functions and are key players in the development of many complex human diseases, making them novel therapeutic targets for drug development. Given the high expenses and time demands of traditional experimental methods, it is essential to develop efficient computational approaches for predicting miRNA-drug interactions (MDIs). This article presents a dual-channel learning framework, SSMDI, based on structural features and Signed Bipartite Graph Neural Network (SBGNN) for predicting MDIs. Firstly, Graph Isomorphism Networks (GIN) is employed to extract molecular graph features of drugs. Meanwhile, a combined framework of Convolutional Neural Network (CNN), Bidirectional Long Short-Term Memory (BiLSTM) network and Self-attention Mechanism is utilized to capture sequence features of miRNAs. Compared with traditional networks, signed networks can deliver richer semantic information in drugs and miRNAs. Therefore, SBGNN is then used to aggregate and update the signed topological features of miRNAs and drugs. Finally, structural and signed topological features are integrated to predict MDIs. The predictive performance of the model is evaluated using 5-fold cross-validation (CV), achieving AUC of 0.9447 and AUPR of 0.9238. The case study further demonstrates the effectiveness of SSMDI in predicting MDIs. In summary, the SSMDI model proves to be an accurate tool for predicting MDIs, which holds significant implications for drug development and miRNA-based therapeutic research.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"12 2","pages":"688-701"},"PeriodicalIF":5.7,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147440665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lafa: Unlocking Superior Memory Efficiency via Adaptive Metadata Strategy for Scalable Large-Scale Dataset Loading","authors":"Cong Wang;Yang Luo;Ke Wang;Hui Zhang;Naijie Gu;Ran Zhang;Wenzhuo Du;Fan Yu;Jun Yu","doi":"10.1109/TBDATA.2025.3640011","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3640011","url":null,"abstract":"The rapid growth of deep learning models and the increasing demand for large-scale datasets have posed unprecedented challenges for data loading and memory management. Existing frameworks (e.g., PyTorch, TensorFlow) often encounter performance bottlenecks when handling large datasets resulting in inefficiencies and excessive memory usage. To address these issues, we propose Lafa, a dynamic metadata loading mechanism optimized for efficient large-scale dataset processing. Lafa introduces the. Lafa format and an adaptive loading strategy with three modes to balance memory usage and loading performance, along with a local shuffle approach that reduces memory overhead and computational complexity while preserving data randomness. Experimental results on GPU (RTX 3090) and Ascend (910 A) platforms demonstrate that Lafa significantly improves memory efficiency compared to existing frameworks. Specifically, for every 10 million samples loaded, Lafa reduces additional memory consumption by a factor of 1.33× to 31.34× across various dataset types, relative to the most memory-efficient baseline among PyTorch, TensorFlow, and MindSpore.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"12 2","pages":"674-687"},"PeriodicalIF":5.7,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147440667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohammad Maksood Akhter;Abdul Atif Khan;Rashmi Maheshwari;Sraban Kumar Mohanty
{"title":"A Fast Linearithmic Graph Clustering Approach for Big Data Using Gravitational Attraction Principle","authors":"Mohammad Maksood Akhter;Abdul Atif Khan;Rashmi Maheshwari;Sraban Kumar Mohanty","doi":"10.1109/TBDATA.2025.3639917","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3639917","url":null,"abstract":"With the exponential growth of Big Data in domains such as healthcare, genomics, and sensor networks, computationally efficient and effective clustering techniques have become essential for uncovering meaningful patterns. Traditional clustering methods face fundamental limitations in Big Data analysis. K-means is among the fastest known approaches, but it fails to capture non-spherical clusters. Hierarchical clustering can detect arbitrary shapes but suffers from sub-cubic complexity, while many state-of-the-art methods still incur quadratic complexity. Moreover, most existing approaches fail to capture the intrinsic structure of data. In this context, graph-based clustering has emerged as a powerful alternative due to its ability to model geometric relationships and reveal underlying structures. However, existing graph-based techniques typically incur quadratic complexity, limiting their scalability. The objective of this work is to develop a scalable graph-based clustering framework that reduces complexity while preserving clustering quality in large, noisy, and high-dimensional datasets. To achieve this, we propose a fast graph clustering framework with overall complexity <inline-formula><tex-math>$mathcal {O}(N lg N)$</tex-math></inline-formula>, where <inline-formula><tex-math>$N$</tex-math></inline-formula> denotes the number of data points. The method employs a two-stage dispersion-based partitioning to generate cohesive sub-clusters, followed by the construction of a sparse graph on sub-cluster centers to efficiently capture adjacency. Sub-clusters are then merged iteratively using a gravitational-force-inspired attraction model, enabling the discovery of coherent structures with reduced computation. Extensive experiments on 41 multi-scale datasets demonstrate that our method consistently outperforms traditional and state-of-the-art approaches, achieving average 27.33% higher clustering accuracy while reducing runtime by more than 86.64% on average. These results highlight both the innovation and the effectiveness of the proposed approach, making it highly suitable for Big Data analytics.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"12 2","pages":"661-673"},"PeriodicalIF":5.7,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147440666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Temporal Recommendation Based on Adaptive Deep Matrix Factorization","authors":"Yali Feng;Zhifeng Hao;Wen Wen;Ruichu Cai","doi":"10.1109/TBDATA.2025.3621144","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3621144","url":null,"abstract":"Temporal recommendation is an important class of tasks in recommender systems, which focuses on modeling and capturing temporal patterns in user behavior to achieve finer-grained and higher-quality recommendations. In real-world scenario, users’ temporal behaviors are not only characterized by sequential dependencies among consecutive items, but also by periodic correlations of different items and time-varying similarity of different users. In this paper, we propose an Adaptive Temporal Recommendation (AdaTR) algorithm to capture the inherent features of temporal behaviors and dynamic collaborative signals. Firstly, based on the periodic characteristics of user behaviors, the user-item interactions are counted and aggregated in different time segments across multiple periods, which forms the temporal user-item interaction matrix. Then, in order to capture the time-varying collaborative signals between different users, a deep spectral clustering (DSC) method is implemented on the temporal user-item interaction matrix, where the original representation of user-item interaction is projected into a latent space, and users’ temporal behaviors are clustered into different groups. Furthermore, an Adaptive Deep Matrix Factorization (AdaDMF) module is designed to learn the time-varying representations of user preferences on each cluster of temporal user behaviors, which incoporate dynamic collaborative signals among different users. Finally, we combine users’ short-term and long-term preferences to generate personalized temporal recommendations. Extensive experiments on four datasets demonstrate that AdaTR performs significantly better than the state-of-the-art baselines.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"12 1","pages":"288-300"},"PeriodicalIF":5.7,"publicationDate":"2025-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145982272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bridging User Dynamic Preferences: A Unified Bridge-Based Diffusion Model for Next POI Recommendation","authors":"Jiankai Zuo;Zihao Yao;Yaying Zhang","doi":"10.1109/TBDATA.2025.3618453","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3618453","url":null,"abstract":"Next POI recommendation plays a crucial role in delivering personalized location-based services, but it faces significant challenges in capturing complex user behavior and adapting to dynamic interest distributions. Most methods often provide insufficient modeling of implicit features in user trajectories, such as directional transitions and latent edge relationships, which are essential for understanding user behavior. Moreover, existing diffusion models, constrained by Gaussian priors, struggle to handle the diverse and evolving nature of user preferences. The lack of a unified scheduling for noise and sampling also limits the flexibility of diffusion models. In this paper, we propose a Unified Bridge-based Diffusion model (UB-Diff) for the next POI recommendation. UB-Diff incorporates a direction-aware POI transition graph learning, which jointly captures spatio-temporal and directional features. To overcome the limitations of Gaussian priors, we introduce a bridge-based diffusion POI generative model. It can achieve distribution translation from the user’s historical distribution to the target distribution by learning a bridge to associate user behavior with POI recommendation, adapting to dynamic user interests. In the end, we design a novel intermediate function to unify the diffusion process, enabling precise control over noise scheduling and modular optimization. Extensive experiments on five real-world datasets demonstrate the superiority of UB-Diff over advanced baseline methods.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"12 1","pages":"261-275"},"PeriodicalIF":5.7,"publicationDate":"2025-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145982375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Two-Step Nyström Sampling for Large-Scale Kernel Approximation","authors":"Li He;Hong Zhang","doi":"10.1109/TBDATA.2025.3618472","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3618472","url":null,"abstract":"Nyström approximation is one of the most popular approximation methods to accelerate kernel analysis on large-scale data sets. Nyström employs one single landmark set to obtain eigenvectors (low-rank decomposition) and projects the entire data set to the eigenvectors (embedding). Most existing methods focus on accelerating landmark selection. For extremely large-scale data sets, however, the embedding time cost, rather than that of low-rank decomposition, is critical. In addition, both accuracy and embedding time cost are dominated by the landmark set size. As a result, using more landmarks is the <italic>only</i> way to improve accuracy at the cost of extremely high embedding costs. In this paper, we propose a method for the first time to decouple embedding cost from that of low-rank decomposition. We first obtain the eigenvectors from a large landmark set for a low error, and then optimize a small landmark set that minimizes the landmark-set-embedding error to ensure a low embedding cost. In return, our accuracy is close to that of the large landmark set but the small one dominates the embedding time cost. Our method can deal with popular kernels and be plugged into most existing methods. Experimental results demonstrate the superiority of the proposed method.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"12 1","pages":"249-260"},"PeriodicalIF":5.7,"publicationDate":"2025-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145982347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Differential Encoding for Improved Representation Learning Over Graphs","authors":"Haimin Zhang;Jiaohao Xia;Min Xu","doi":"10.1109/TBDATA.2025.3618447","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3618447","url":null,"abstract":"Combining the message-passing paradigm with the global attention mechanism has emerged as an effective framework for learning over graphs. The message-passing paradigm and the global attention mechanism basically generate embeddings of nodes by taking the sum of information from a node’s local neighbourhood and from the entire graph, respectively. However, this simple summation aggregation approach fails to distinguish between the information from a node itself or from the node’s neighbours. Therefore, there exists information lost at each layer of embedding generation, and this information lost could be accumulated and become more serious in deeper model layers. In this paper, we present a differential encoding method to address the issue of information lost. Instead of simply taking the sum to aggregate local or global information, we explicitly encode the difference between the information from a node itself and that from the node’s local neighbours (or from the rest of the entire graph nodes). The obtained differential encoding is then combined with the original aggregated representation to generate the updated node embedding. By combining differential encodings, the representational ability of generated node embeddings is improved, and therefore the model performance is improved. The differential encoding method is empirically evaluated on different graph tasks on seven benchmark datasets. The results show that it is a general method that improves the message-passing update and the global attention update, advancing the state-of-the-art performance for graph representation learning on these benchmark datasets.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"12 1","pages":"276-287"},"PeriodicalIF":5.7,"publicationDate":"2025-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145982266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast Convergent Federated Learning via Decaying SGD Updates","authors":"Md Palash Uddin;Yong Xiang;Mahmudul Hasan;Yao Zhao;Youyang Qu;Longxiang Gao","doi":"10.1109/TBDATA.2025.3618454","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3618454","url":null,"abstract":"Federated Learning (FL), a groundbreaking approach for collaborative model training across decentralized devices, maintains data privacy while constructing a decent global machine learning model. Conventional FL methods typically demand more communication rounds to achieve convergence in non-Independent and non-Identically Distributed (non-IID) data scenarios due to their reliance on fixed Stochastic Gradient Descent (SGD) updates at each Communication Round (CR). In this paper, we introduce a novel strategy to expedite the convergence of FL models, inspired by the insights from McMahan et al.’s seminal work. We focus on FL convergence via traditional SGD decay by introducing a dynamic adjusting mechanism for local epochs and local batch size. Our method adapts the decay of SGD updates during the training process, akin to decaying learning rates in classical optimization. Particularly, by adaptively reducing local epochs and increasing local batch size using their ongoing values and the CR as the model progresses, our method enhances convergence speed without compromising accuracy, specifically by effectively addressing challenges posed by non-IID data. We provide theoretical results of the benefits of the dynamic decay of SGD updates in FL scenarios. We demonstrate our method’s consistent outperformance regarding the global model’s communication speedup and convergence behavior through comprehensive experiments.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"12 1","pages":"186-199"},"PeriodicalIF":5.7,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145982210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"STGym: A Modular Benchmark for Spatio-Temporal Networks With a Survey and Case Study on Traffic Forecasting","authors":"Chun-Wei Shen;Jia-Wei Jiang;Hsun-Ping Hsieh","doi":"10.1109/TBDATA.2025.3618482","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3618482","url":null,"abstract":"The rapid advancement of spatio-temporal domain has led to a surge of novel models. These models can typically be decomposed into different modules, such as various types of graph neural networks and temporal networks. Notably, many of these models share identical or similar modules. However, the existing literature often relies on fragmented and self-constructed experimental frameworks. This fragmentation hinders a comprehensive understanding of model interrelationships and makes fair comparisons difficult due to inconsistent training and evaluation processes. To address these issues, we introduce Spatio-Temporal Gym (STGym), an innovative modular benchmark that provides a platform for exploring various spatio-temporal models and supports research for developers. The modular design of STGym facilitates an in-depth analysis of model components and promotes the seamless adoption and extension of existing methods. By standardizing the training and evaluation processes, STGym ensures reproducibility and scalability, enabling fair comparisons across different models. In this paper, we use traffic forecasting, a popular research topic in the spatio-temporal domain, as a case to demonstrate the capabilities of the STGym. Our detailed survey systematically utilizes the modular framework of STGym to organize key modules into various models, thereby facilitating deeper insights into their structures and mechanisms. We also evaluate 18 models on six widely used traffic forecasting datasets and analyze critical hyperparameters to reveal their impact on performance. This study provides valuable resources and insights for developers and researchers.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"12 1","pages":"15-33"},"PeriodicalIF":5.7,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145982291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}