Xiaowen Chu;Wei Wang;Cong Wang;Yang Liu;Rongfei Zeng;Christopher G. Brinton
{"title":"Guest Editorial Special Issue on Federated Learning for Big Data Applications","authors":"Xiaowen Chu;Wei Wang;Cong Wang;Yang Liu;Rongfei Zeng;Christopher G. Brinton","doi":"10.1109/TBDATA.2024.3417057","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3417057","url":null,"abstract":"","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 5","pages":"2099-2101"},"PeriodicalIF":5.7,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11149636","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144990054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shuo Shang;Qi Liu;Renhe Jiang;Ryosuke Shibasaki;Panos Kalnis;Christian S. Jensen
{"title":"Editorial High-Performance Recommender Systems Based on Spatiotemporal Data","authors":"Shuo Shang;Qi Liu;Renhe Jiang;Ryosuke Shibasaki;Panos Kalnis;Christian S. Jensen","doi":"10.1109/TBDATA.2024.3451088","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3451088","url":null,"abstract":"","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 4","pages":"1588-1588"},"PeriodicalIF":7.5,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11077801","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144597786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Editorial: Big Data Analytics in Complex Social Information Networks","authors":"Desheng Dash Wu;David L. Olson","doi":"10.1109/TBDATA.2024.3485316","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3485316","url":null,"abstract":"This special issue deals with research related to applications of and methods to support Big Data analytics in complex social information networks. The digital age and the rise of social media have sped up changes to social systems with unforeseen consequences. However, there are major challenges created.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 4","pages":"1650-1651"},"PeriodicalIF":7.5,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11077792","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144598007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GE-GNN: Gated Edge-Augmented Graph Neural Network for Fraud Detection","authors":"Wenxin Zhang;Cuicui Luo","doi":"10.1109/TBDATA.2025.3562486","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3562486","url":null,"abstract":"Graph Neural Networks (GNNs) play a significant role and have been widely applied in fraud detection tasks, exhibiting substantial improvements in detection performance compared to conventional methodologies. However, within the intricate structure of fraud graphs, fraudsters usually camouflage themselves among a large number of benign entities. An effective solution to address the camouflage problem involves the incorporation of complex and abundant edge information. Nevertheless, existing GNN-based methods frequently neglect to integrate this crucial information into the message passing process, thereby limiting their efficacy. To address the above issues, this study proposes a novel Gated Edge-augmented Graph Neural Network(GE-GNN). Our approach begins with an edge-based feature augmentation mechanism that leverages both node and edge features within a single relation. Subsequently, we apply the augmented representation to the message passing process to update the node embeddings. Furthermore, we design a gate logistic to regulate the expression of augmented information. Finally, we integrate node features across different relations to obtain a comprehensive representation. Extensive experimental results on two real-world datasets demonstrate that the proposed method outperforms several state-of-the-art methods.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 4","pages":"1664-1676"},"PeriodicalIF":7.5,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144598065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Topology-Based Node-Level Membership Inference Attacks on Graph Neural Networks","authors":"Faqian Guan;Tianqing Zhu;Wanlei Zhou;Philip S. Yu","doi":"10.1109/TBDATA.2025.3558855","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3558855","url":null,"abstract":"Graph neural networks (GNNs) have obtained considerable attention due to their ability to leverage the inherent topological and node information present in graph data. While extensive research has been conducted on privacy attacks targeting machine learning models, the exploration of privacy risks associated with node-level membership inference attacks on GNNs remains relatively limited. GNNs learn representations that encapsulate valuable information about the nodes. These learned representations can be exploited by attackers to infer whether a specific node belongs to the training dataset, leading to the disclosure of sensitive information. The insidious nature of such privacy breaches often leads to an underestimation of the associated risks. Furthermore, the inherent challenges posed by node membership inference attacks make it difficult to develop effective attack models for GNNs that can successfully infer node membership. We propose a more efficient approach that specifically targets node-level membership inference attacks on GNNs. Initially, we combine nodes and their respective neighbors to carry out node membership inference attacks. To address the challenge of variable-length features arising from the differing number of neighboring nodes, we introduce an effective feature processing strategy. Furthermore, we propose two strategies: multiple training of shadow models and random selection of non-membership data, to enhance the performance of the attack model. We empirically evaluate the efficacy of our proposed method using three benchmark datasets. Additionally, we explore two potential defense mechanisms against node-level membership inference attacks.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 5","pages":"2809-2826"},"PeriodicalIF":5.7,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144934350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hanqi Zhang;Yandong Zheng;Chang Xu;Liehuang Zhu;Jiayin Wang
{"title":"Revocable DSSE in Healthcare Systems With Range Query Support","authors":"Hanqi Zhang;Yandong Zheng;Chang Xu;Liehuang Zhu;Jiayin Wang","doi":"10.1109/TBDATA.2025.3556636","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3556636","url":null,"abstract":"With the rapid development of cloud computing, online health monitoring systems are becoming increasingly prevalent. To protect medical data privacy while supporting search operations, Dynamic Searchable Symmetric Encryption (DSSE) technology has been widely used in health monitoring systems. For better monitoring of patient status, keyword range query is also a necessary requirement for the DSSE scheme. Furthermore, in the multi-user setting, user revocation usually leads the owner to download and re-encrypt all indexes, resulting in significant computational overhead. In this paper, we propose a lightweight revocable DSSE scheme with range query support. First, we propose a novel and privacy-preserving range query algorithm that defends plaintext inference attacks. Second, we design a singly linked list structure based on delegatable pseudorandom functions and key-updatable pseudorandom functions, which support lightweight user revocation. Rigorous security analysis proves the security of our proposed range query scheme and demonstrates that our scheme can achieve forward and backward privacy. Experimental evaluations show that our scheme is highly efficient.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 5","pages":"2764-2778"},"PeriodicalIF":5.7,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144934474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Utility-Driven Data Analytics Algorithm for Transaction Modifications Using Pre-Large Concept With Single Database Scan","authors":"Unil Yun;Hanju Kim;Myungha Cho;Taewoong Ryu;Seungwan Park;Doyoon Kim;Doyoung Kim;Chanhee Lee;Witold Pedrycz","doi":"10.1109/TBDATA.2025.3556615","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3556615","url":null,"abstract":"Utility-driven pattern analysis is a fundamental method for analyzing noteworthy patterns with high utility for diverse quantitative transactional databases. Recently, various approaches have emerged to handle large, dynamic database environments more efficiently by reducing the number of data scans and pattern expansion operations with the pre-large concept. However, existing pre-large-based high utility pattern mining methods either fail to handle real-time transaction modifications or require additional data scans to validate candidate patterns. In this paper, we propose a novel efficient utility-driven pattern mining algorithm using the pre-large concept for transaction modifications. Our method incorporates a single-scan-based framework through the management of actual utility values and discovers high utility patterns without candidate generation for efficient utility-driven dynamic data analysis in the modification environment. We compared the performance of the proposed method with state-of-the-art methods through extensive performance evaluation utilizing real and synthetic datasets. According to the evaluation results and a case study, the suggested method performs a minimum of 1.5 times faster than state-of-the-art methods alongside minimal compromise in memory, and it scaled well with increases in database size. Further statistical analyses indicate that the proposed method reduces the pattern search space compared to the previous method while delivering a complete set of accurate results without loss.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 5","pages":"2792-2808"},"PeriodicalIF":5.7,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144934407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Novel Concept-Cognitive Learning Model Oriented to Three-Way Concept for Knowledge Acquisition","authors":"Weihua Xu;Di Jiang","doi":"10.1109/TBDATA.2025.3556637","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3556637","url":null,"abstract":"Concept-cognitive learning (CCL) is the process of enabling machines to simulate the concept learning of the human brain. Existing CCL models focus on formal context while neglecting the importance of skill context. Furthermore, CCL models, which solely focus on positive information, restrict the learning capacity by neglecting negative information, and greatly impeding the acquisition of knowledge. To overcome these issues, we proposes a novel concept-cognitive learning model oriented to three-way concept for knowledge acquisition. First, this paper explains and investigates the relationship between skills and knowledge based on the three-way concept and its properties. Then, in order to simultaneously consider positive and negative information, describe more detailed information, learn more skills, and acquire accurate knowledge, a three-way information granule is described from the perspective of cognitive learning. Then, a transformation method is proposed to transform between different three-way information granules, allowing for the transformation of arbitrary three-way information granule into necessary, sufficient, sufficient and necessary three-way information granules. Finally, algorithm corresponding to the transformation method is designed, and subsequently tested across diverse UCI datasets. The experimental outcomes affirm the effectiveness and excellence of the suggested model and algorithm.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 5","pages":"2779-2791"},"PeriodicalIF":5.7,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144934563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PViTGAtt-IP: Severity Quantification of Lung Infections in Chest X-Rays and CT Scans via Parallel and Cross-Attended Encoders","authors":"Bouthaina Slika;Fadi Dornaika;Fares Bougourzi;Karim Hammoudi","doi":"10.1109/TBDATA.2025.3556612","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3556612","url":null,"abstract":"The development of a robust and adaptive deep learning technique for the diagnosis of pneumonia and the assessment of its severity was a major challenge. Indeed, both chest X-rays (CXR) and CT scans have been widely studied for the diagnosis, detection and quantification of pneumonia. In this paper, a novel approach (PViTGAtt-IP) based on a parallel array of vision transformers is presented, in which the input image is divided into regions of interest. Each region is fed into an individual model and the collective output gives the severity score. Three parallel architectures were also derived and tested. The proposed models were subjected to rigorous tests on two different datasets: RALO CXRs and Per COVID-19 CT scans. The experimental results showed that the proposed models exhibited high performance in accurately predicting scores for both datasets. In particular, the parallel transformers with multi-gate attention proved to be the best performing model. Furthermore, a comparative analysis using state-of-the-art methods showed that our proposed approach consistently achieved competitive or even better performance in terms of the Mean Absolute Error (MAE) and the Pearson Correlation Coefficient (PC). This emphasizes the effectiveness and superiority of our models in the context of diagnosing and assessing the severity of pneumonia.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 5","pages":"2736-2748"},"PeriodicalIF":5.7,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144934547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ling Hu;Tongqing Zhou;Zhihuang Liu;Fang Liu;Zhiping Cai
{"title":"Split Learning on Segmented Healthcare Data","authors":"Ling Hu;Tongqing Zhou;Zhihuang Liu;Fang Liu;Zhiping Cai","doi":"10.1109/TBDATA.2025.3556639","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3556639","url":null,"abstract":"Sequential data learning is vital to harnessing the encompassed rich knowledge for diverse downstream tasks, particularly in healthcare (e.g., disease prediction). Considering data sensitiveness, privacy-preserving learning methods, based on federated learning (FL) and split learning (SL), have been widely investigated. Yet, this work identifies, for the first time, existing methods overlook that sequential data are generated by different patients at different times and stored in different hospitals, failing to learn the sequential correlations between different temporal segments. To fill this void, a novel distributed learning framework <monospace>STSL</monospace> is proposed by training a model on the segments in order. Considering that patients have different visit sequences, <monospace>STSL</monospace> first implements privacy-preserving visit ordering based on a secure multi-party computation mechanism. Then batch scheduling participates patients with similar visit (sub-)sequences into the same training batch, facilitating subsequent split learning on batches. The scheduling process is formulated as an NP-hard optimization problem on balancing learning loss and efficiency and a greedy-based solution is presented. Theoretical analysis proves the privacy preservation property of <monospace>STSL</monospace>. Experimental results on real-world eICU data show its superior performance compared with FL and SL (<inline-formula><tex-math>$5% sim 28%$</tex-math></inline-formula> better accuracy) and effectiveness (a remarkable 75% reduction in communication costs).","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 5","pages":"2749-2763"},"PeriodicalIF":5.7,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144934398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}