{"title":"Incorporating Confused Phraseological Knowledge Based on Pinyin Input Method for Chinese Spelling Correction","authors":"Weidong Zhao;Xiaoyu Wang;Liqing Qiu","doi":"10.1109/TBDATA.2025.3552344","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3552344","url":null,"abstract":"Chinese Spelling Correction (CSC) is designed to detect and correct spelling errors that occur in Chinese text. In real life, most keyboard input scenarios use the pinyin input method. Researching spelling errors in this scenario is practical and valuable. However, there is currently no research that has truly proposed a model suitable for this scenario. Considering this concern, this paper proposes a model IPCK-IME, which incorporates confused phraseological knowledge based on the pinyin input method. The model integrates its own phonetic features with external similarity knowledge to guide the model to output more correct characters. Furthermore, to mitigate the influence of spelling errors on the semantics of sentences, a Gaussian bias is introduced into the self-attention network of the model. This approach aims to reduces the focus on typos and improve attention to local context. Empirical evidence indicates that our method surpasses existing models in correcting spelling errors generated by the pinyin input method. And, it is more appropriate for correcting Chinese spelling errors in real input scenarios.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 5","pages":"2724-2735"},"PeriodicalIF":5.7,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144934449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive Graph Structure Learning Neural Rough Differential Equations for Multivariate Time Series Forecasting","authors":"Yuming Su;Tinghuai Ma;Huan Rong;Mohamed Magdy Abdel Wahab","doi":"10.1109/TBDATA.2025.3552334","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3552334","url":null,"abstract":"Multivariate time series forecasting has extensive applications in urban computing, such as financial analysis, weather prediction, and traffic forecasting. Using graph structures to model the complex correlations among variables in time series, and leveraging graph neural networks and recurrent neural networks for temporal aggregation and spatial propagation stage, has shown promise. However, traditional methods’ graph structure node learning and discrete neural architecture are not sensitive to issues such as sudden changes, time variance, and irregular sampling often found in real-world data. To address these challenges, we propose a method called <underline>A</u>daptive <underline>G</u>raph structure <underline>L</u>earning neural <underline>R</u>ough <underline>D</u>ifferential <underline>E</u>quations (AGLRDE). Specifically, we combine dynamic and static graph structure learning to adaptively generate a more robust graph representation. Then we employ a spatio-temporal encoder-decoder based on Neural Rough Differential Equations (Neural RDE) to model spatio-temporal dependencies. Additionally, we introduce a path reconstruction loss to constrain the path generation stage. We conduct experiments on six benchmark datasets, demonstrating that our proposed method outperforms existing state-of-the-art methods. The results show that AGLRDE effectively handles aforementioned challenges, significantly improving the accuracy of multivariate time series forecasting.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 5","pages":"2710-2723"},"PeriodicalIF":5.7,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144934381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lei Zhang;Mingren Ke;Likang Wu;Wuji Zhang;Zihao Chen;Hongke Zhao
{"title":"Multi-Objective Graph Contrastive Learning for Recommendation","authors":"Lei Zhang;Mingren Ke;Likang Wu;Wuji Zhang;Zihao Chen;Hongke Zhao","doi":"10.1109/TBDATA.2025.3552341","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3552341","url":null,"abstract":"Recently, numerous studies have integrated self-supervised contrastive learning with Graph Convolutional Networks (GCNs) to address the data sparsity and popularity bias to enhance recommendation performance. While such studies have made breakthroughs in accuracy metric, they often neglect non-accuracy objectives such as diversity, novelty and percentage of long-tail items, which greatly reduces the user experience in real-world applications. To this end, we propose a novel graph collaborative filtering model named Multi-Objective Graph Contrastive Learning for recommendation (MOGCL), designed to provide more comprehensive recommendations by considering multiple objectives. Specifically, MOGCL comprises three modules: a multi-objective embedding generation module, an embedding fusion module and a transfer learning module. In the multi-objective embedding generation module, we employ two GCN encoders with different goal orientations to generate node embeddings targeting accuracy and non-accuracy objectives, respectively. These embeddings are then effectively fused with complementary weights in the embedding fusion module. In the transfer learning module, we suggest an auxiliary self-supervised task to promote the maximization of the mutual information of the two sets of embeddings, so that the obtained final embeddings are more stable and comprehensive. The experimental results on three real-world datasets show that MOGCL achieves optimal trade-offs between multiple objectives comparing to the state-of-the-arts.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 5","pages":"2696-2709"},"PeriodicalIF":5.7,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144934371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Self-Guided Graph Refinement With Progressive Fusion for Multiplex Graph Contrastive Representation Learning","authors":"Qi Dai;Yu Gu;Xiaofeng Zhu;Xiaohua Li;Fangfang Li;Ge Yu","doi":"10.1109/TBDATA.2025.3552331","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3552331","url":null,"abstract":"Multiplex Graph Contrastive Learning (MGCL) has attracted significant attention. However, existing MGCL methods often struggle with suboptimal graph structures and fail to fully capture intricate interdependencies across multiplex views. To address these issues, we propose a novel self-supervised framework, Multiplex Graph Refinement with progressive fusion (MGRefine), for multiplex graph contrastive representation learning. Specifically, MGRefine introduces a multi-view learning module to extract a structural guidance matrix by exploring the underlying relationships between nodes. Then, a progressive fusion module is employed to progressively enhance and fuse representations from different views, capturing and leveraging nuanced interdependencies and comprehensive information across the multiplex graphs. The fused representation is then used to construct a consensus guidance matrix. A self-enhanced refinement module continuously refines the multiplex graphs using these guidance matrices while providing effective supervision signals. MGRefine achieves mutual reinforcement between graph structures and representations, ensuring continuous optimization of the model throughout the learning process in a self-enhanced manner. Extensive experiments demonstrate that MGRefine outperforms state-of-the-art methods and also verify the effectiveness of MGRefine across various downstream tasks on several benchmark datasets.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 5","pages":"2669-2680"},"PeriodicalIF":5.7,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144934433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Li Li;Junjun Si;Jinna Lv;Junting Lu;Jianyu Zhang;Shuaifu Dai
{"title":"MSST: Multi-Scale Spatial-Temporal Representation Learning for Trajectory Similarity Computation","authors":"Li Li;Junjun Si;Jinna Lv;Junting Lu;Jianyu Zhang;Shuaifu Dai","doi":"10.1109/TBDATA.2025.3552340","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3552340","url":null,"abstract":"Computing trajectory similarity is a fundamental task in trajectory analysis. Traditional heuristic methods suffer from quadratic computational complexity, which limits their scalability to large datasets. Recently, Trajectory Representation Learning (TRL) has been extensively studied to address this limitation. However, most existing TRL algorithms face two key challenges. First, they prioritize spatial similarity while neglecting the intricate spatio-temporal dynamics of trajectories, particularly temporal regularities. Second, these methods are often constrained by predefined single spatial or temporal scales, which can significantly impact performance, since the measurement of trajectory similarity depends on spatial and temporal resolution. To address these issues, we propose MSST, a Multi-Scale Self-supervised Trajectory Representation Learning framework. MSST simultaneously processes spatial and temporal information by generating 3D spatial-temporal tokens, thereby capturing spatio-temporal characteristics of trajectories more effectively. Further, MSST explore the multi-scale characteristics of trajectories. Finally, self-supervised contrastive learning is employed to enhance the consistency between the trajectory representations from different views. Experimental results on three real-world datasets for similarity trajectory computation provide insight into the design properties of our approach and demonstrate the superiority of our approach over existing TRL methods. MSST significantly surpasses all state-of-the-art competitors in terms of effectiveness, efficiency, and robustness. We explore the multi-scale characteristics of trajectories. To the best of our knowledge, this is the first effort in the TRL literature. Compared to previous TRL research, the proposed method can balance the noise and the details of trajectories, enabling a more comprehensive analysis by accounting for the variability inherent in trajectory data across different scales.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 5","pages":"2657-2668"},"PeriodicalIF":5.7,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144934392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MHT-Net: A Matching-Based Hierarchical Transfer Network for Glaucoma Detection From Fundus Images","authors":"Linna Zhao;Jianqiang Li;Li Li;Xi Xu","doi":"10.1109/TBDATA.2025.3552342","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3552342","url":null,"abstract":"Glaucoma is a chronic and irreversible eye disease. Early detection and treatment can effectively prevent severe consequences. Deep transfer learning is widely used in fundus imaging analysis to remedy the shortage of training data of glaucoma. The model trained on the source domain may struggle to predict glaucoma in the target domain due to distribution differences. Several limitations cannot be ignored: (1) Image matching: enhancing global and local image consistency through bidirectional matching; (2) Hierarchical transfer: developing a strategy for transferring different hierarchical features. To this end, we propose a novel Matched Hierarchical Transfer Network (MHT-Net) to achieve automatic glaucoma detection. We initially create a fundus structure detector to match global and local images using intermediate layers of a pre-trained diagnostic model with source domain data. Next, a hierarchical transfer network is implemented, sharing parameters for general features and using a domain discriminator for specific features. By integrating adversarial and classification losses, the model acquires domain-invariant features, facilitating precise and seamless transfer of fundus information from source to target domains. Extensive experiments demonstrate the effectiveness of our proposed method, outperforming existing glaucoma detection methods. These advantages endow our algorithm as a promising efficient assisted tool in the glaucoma screening.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 5","pages":"2681-2695"},"PeriodicalIF":5.7,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144934486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing Weak Supervision for Concept Prerequisite Relation Learning","authors":"Miao Zhang;Jiawei Wang;Kui Xiao;Zhifang Huang;Zhifei Li;Yan Zhang","doi":"10.1109/TBDATA.2025.3552330","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3552330","url":null,"abstract":"Concept prerequisite relation learning is used to identify dependency relations between knowledge concepts, which helps learners choose effective learning paths. Currently, most of the mainstream methods utilise deep learning algorithms to capture the prerequisite relations between concepts through supervised or semi-supervised learning. However, these methods are highly dependent on labelled data, which is scarce and costly to annotate in reality. To address this problem, we propose a framework called <underline>W</u>eakly <underline>S</u>upervised <underline>E</u>nhanced <underline>C</u>oncept <underline>P</u>rerequisite <underline>R</u>elation <underline>L</u>earning (WSECPRL). Specifically, we first generate an enhanced concept pseudo-relation graph without labeled data using the pre-trained language model and the large knowledge base as auxiliary information. Second, we propose an improved variational graph auto-encoder model to correctly determine the concept prerequisite relations. We incorporate a multi-head attention mechanism to enhance the representation learning capability of weakly supervised learning. The model reconstructs a directed graph into multiple undirected graphs by splitting the adjacency matrix and determines the direction of the concept prerequisite relation based on the strength of the dependency relation between concepts. Finally, experimental results on several publicly available datasets demonstrate the effectiveness of our proposed framework, with WSECPRL outperforming existing baseline models in terms of F1 scores and AUC.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 5","pages":"2643-2656"},"PeriodicalIF":5.7,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144934541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lantian Xu;Rong-Hua Li;Dong Wen;Qiangqiang Dai;Guoren Wang
{"title":"Efficient Antagonistic $k$k-Plex Enumeration in Signed Graphs","authors":"Lantian Xu;Rong-Hua Li;Dong Wen;Qiangqiang Dai;Guoren Wang","doi":"10.1109/TBDATA.2025.3552335","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3552335","url":null,"abstract":"A signed graph is a graph where each edge receives a sign, positive or negative. The signed graph model has been used in many real applications, such as protein complex discovery and social network analysis. Finding cohesive subgraphs in signed graphs is a fundamental problem. A <inline-formula><tex-math>$k$</tex-math></inline-formula>-plex is a common model for cohesive subgraphs in which every vertex is adjacent to all but at most <inline-formula><tex-math>$k$</tex-math></inline-formula> vertices within the subgraph. In this paper, we propose the model of size-constrained antagonistic <inline-formula><tex-math>$k$</tex-math></inline-formula>-plex in a signed graph. The proposed model guarantees that the resulting subgraph is a <inline-formula><tex-math>$k$</tex-math></inline-formula>-plex and can be divided into two sub-<inline-formula><tex-math>$k$</tex-math></inline-formula>-plexes, both of which have positive inner edges and negative outer edges. This paper aims to identify all maximal antagonistic <inline-formula><tex-math>$k$</tex-math></inline-formula>-plexes in a signed graph. Through rigorous analysis, we show that the problem is NP-Hardness. We propose a novel framework for maximal antagonistic <inline-formula><tex-math>$k$</tex-math></inline-formula>-plexes utilizing set enumeration. Efficiency is improved through pivot pruning and early termination based on the color bound. Preprocessing techniques based on degree and dichromatic graphs effectively narrow the search space before enumeration. Extensive experiments on real-world datasets demonstrate our algorithm’s efficiency, effectiveness, and scalability.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 5","pages":"2587-2600"},"PeriodicalIF":5.7,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144998253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Residual Learning for Self-Knowledge Distillation: Enhancing Neural Networks Through Consistency Across Layers","authors":"Hanpeng Liu;Shuoxi Zhang;Kun He","doi":"10.1109/TBDATA.2025.3552326","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3552326","url":null,"abstract":"Knowledge distillation is widely used technique to transfer knowledge from a large pretrained teacher network to a small student network. However, training complex teacher models requires significant computational resources and storage. To address this, a growing area of research, known as self-knowledge distillation (Self-KD), aims to enhance the performance of a neural network by leveraging its own latent knowledge. Despite its potential, existing Self-KD methods often struggle to effectively extract and utilize the model's dark knowledge. In this work, we identify a consistency problem between feature layer and output layer, and propose a novel Self-KD approach called <bold>R</b>esidual Learning for <bold>S</b>elf-<bold>K</b>nowledge <bold>D</b>istillation (<bold>RSKD</b>). Our method addresses this issue by enabling the last feature layer of the student model learn the residual gap between the outputs of the pseudo-teacher and the student. Additionally, we extend RSKD by allowing each intermediate feature layer of the student model to learn the residual gap between the corresponding deeper features of the pseudo-teacher and the student. Extensive experiments on various visual datasets demonstrate the effectiveness of the proposed method, which outperforms the state-of-the-art baselines.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 5","pages":"2615-2627"},"PeriodicalIF":5.7,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144934466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimal Subdata Selection for Prediction Based on the Distribution of the Covariates","authors":"Alvaro Cia-Mina;Jesus Lopez-Fidalgo;Weng Kee Wong","doi":"10.1109/TBDATA.2025.3552343","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3552343","url":null,"abstract":"Huge data sets are widely available now and there is growing interest in selecting an optimal subsample from the full data set to improve inference efficiency and reduce labeling costs. We propose a new criterion called J–optimality, that builds upon a popular optimal selection criterion that minimizes the Random–X prediction error by additionally incorporating the joint distribution of the covariates. A key advantage of our approach is that we can relate the subsampling selection problem to that of finding an optimal approximate design under a convex criterion, where analytical tools for finding and studying them are already available. Consequently, the J–optimal subsampling method comes with theoretical results and theory-based algorithms for finding them. Simulation results and real data analysis show our proposed methods outperform current subsampling methods and the proposed algorithms can also adapt efficiently to select an optimal subsample from streaming data.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 5","pages":"2601-2614"},"PeriodicalIF":5.7,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10930599","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144934532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}