{"title":"AnesFormer: An End-to-End Framework for EEG-Based Anesthetic State Classification","authors":"Qihang Wang;Ying Chen;Qinge Xiao","doi":"10.1109/TBDATA.2024.3489419","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3489419","url":null,"abstract":"To determine the real-time changes in brain arousal introduced by anesthetics, Electroencephalogram (EEG) is often used as an objective neuroimaging evidence to link the neurobehavioral states of patients. However, EEG signals often suffer from a low signal-to-noise ratio due to environmental noise and artifacts, which limits its application for a reliable estimation of depth of anesthesia (DoA), especially under high cross-subject variability. In this study, we propose an end-to-end deep learning based framework, termed as AnesFormer, which contains a data selection model, a self-attention based classification model, and a baseline update mechanism. These three components are integrated in a dynamic and seamless manner to achieve the goal of improving the effectiveness and robustness of DoA estimation in a leave-one-out setting. In the experiment, we apply the proposed framework to an office-based dataset and a hospital-based dataset, and use seven existing models as benchmarks. In addition, we conduct an ablation experiment to show the significance of each component in AnesFormer. Our main results indicate that 1) the proposed framework generally performs better than the existing methods for DoA estimation in terms of effectiveness and robustness; 2) each designed component in AnesFormer is likely to contribute to the DoA classification improvement.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 3","pages":"1357-1368"},"PeriodicalIF":7.5,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143949175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Knowledge-Guided Event-Relation Graph Learning Network for Patient Similarity With Chinese Electronic Medical Records","authors":"Zhichao Zhu;Jianqiang Li;Chun Xu;Jingchen Zou;Qing Zhao","doi":"10.1109/TBDATA.2024.3481955","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3481955","url":null,"abstract":"Feature sparse problem is commonly existing in patient similarity calculation task with clinical data, to track which, some approaches have been proposed to use Graph Neural Network (GNN) to model the complex structural information in patient Electronic Medical Records (EMRs). These GNN based approaches usually treat medical concepts (i.e., symptoms, diseases) as nodes to learn spatial features and adopt Recurrent Neural Network (RNN) to learn temporal sequence of these concepts. However, in many cases, several sequential concepts contained in EMR text are considered as occur simultaneously in the clinical diagnosis (i.e., some symptoms are detected simultaneously by once test), learning temporal sequence of these sequential concepts might cause noise for patient similarity calculation. Furthermore, the limited discriminative capability of concepts cannot provide sufficient indicative information for similarity learning. To this end, we propose a Knowledge-guided Event-relation Graph Learning Network (KEGLN) for patient similarity calculation. Specifically, after event extraction, we first construct element-relation graphs and use the first Graph Convolutional Network (GCN) and Graph Attention Network (GAT) layer to aggregate features from each event and its involved elements for reducing the noise produced by temporal sequence of concepts. Meanwhile, the entity description and attribute-value structure are extracted to supplement background knowledge of elements (concepts and trigger words). For the updated event nodes, we then design a event-relation graph and adopt the second GCN and GAT layer to aggregate information from events and their directly neighbors to extract spatial features of events at the current moment. Finally, the Bidirectional Long Short-Term Memory (BiLSTM) model is adopted to learn temporal dependency of event nodes to capture the dynamic change of disease progress. Through diverse datasets and extensive experiments, our KEGLN model outperforms all baselines for Chinese patient similarity calculation.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 3","pages":"1475-1492"},"PeriodicalIF":7.5,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143949195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Enhancing Inter-Domain Routing Security With Visualization and Visual Analytics","authors":"Jingwei Tang;Guodao Sun;Jiahui Chen;Gefei Zhang;Qi Jiang;Yanbiao Li;Guangxing Zhang;Jian Liu;Haixia Wang;Ronghua Liang","doi":"10.1109/TBDATA.2024.3481899","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3481899","url":null,"abstract":"In the complex landscape of the Internet, inter-domain routing systems are essential for ensuring seamless connectivity and reachability across autonomous systems. However, the lack of dependable security validation mechanisms in these systems poses persistent challenges. Vulnerabilities such as prefix hijacking, path forgery, and route leakage not only compromise network operators and users, but also threaten the stability and accessibility of the Internet’s core infrastructure. To address this, visualization and visual analytics techniques are adept at identifying and detecting security threats, offering network administrators effective methods to monitor and maintain network operations. This paper presents a comprehensive survey of the state-of-the-art research in visualization and visual analytics for inter-domain routing security. We delineate four scenarios for tasks analysis in network visualization: monitoring, detection, verification, and discovery. Each category is explored in detail, focusing on the employed data sources and visualization techniques. Several key findings are presented at the end of each category, aimed at providing researchers and practitioners with research inspiration. Furthermore, we examine the trends of academic interest observed in recent decades and propose potential directions for future research in visual analytics pertaining to Internet infrastructure security.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 3","pages":"1508-1527"},"PeriodicalIF":7.5,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143949092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Residual Coupled Prompt Learning for Zero-Shot Sketch-Based Image Retrieval","authors":"Guangyao Zhuo;Zhenqiu Shu;Zhengtao Yu","doi":"10.1109/TBDATA.2024.3481898","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3481898","url":null,"abstract":"Zero-shot sketch-based image retrieval (ZS-SBIR) aims to utilize freehand sketches for retrieving natural images with similar semantics in realistic zero-shot scenarios. Existing works focus on zero-shot semantic transfer using category word embedding and leveraging teacher-student networks to alleviate catastrophic forgetting of pre-trained models. They aim to retain rich discriminative features to achieve zero-shot semantic transfer. However, the category word embedding method is insufficient in flexibility, thereby limiting their retrieval performances in ZS-SBIR scenarios. In addition, the teacher network used for generating guidance signals results in computational redundancy, requiring repeated processing of mini-batch inputs. To address these issues, we propose a deep residual coupled prompt learning (DRCPL) for ZS-SBIR. Specifically, we leverage the text encoder of CLIP to generate category classification weights, thereby improving the flexibility and generality of zero-shot semantic transfer. To tune text and vision representations effectively, we introduce learnable prompts at the input and freeze the parameters of the CLIP encoder. This approach not only effectively prevents catastrophic forgetting, but also significantly reduces the computational complexity of the model. We also introduce the text-vision prompt coupling function to enhance the coordinated consistency between the text and vision representations, ensuring that the two branches can train collaboratively. Finally, we gradually establish stage feature relationships by learning prompts independently at different early stages to facilitate rich contextual learning. Comprehensive experimental results demonstrate that our DRCPL method achieves state-of-the-art performance in ZS-SBIR tasks.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 3","pages":"1493-1507"},"PeriodicalIF":7.5,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143949198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nathaniel Kang;Dongeun Min;Yonghun Cho;Dong-Whan Ko;Hyun Hak Kim;Joon Yeon Choeh;Jongho Im
{"title":"Online News-Based Economic Sentiment Index","authors":"Nathaniel Kang;Dongeun Min;Yonghun Cho;Dong-Whan Ko;Hyun Hak Kim;Joon Yeon Choeh;Jongho Im","doi":"10.1109/TBDATA.2024.3474211","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3474211","url":null,"abstract":"The accurate prediction of industry trends has become increasingly challenging because of unforeseen events. To address this challenge, this study proposes a deep learning approach to generate an economic sentiment index by integrating Natural Language Processing (NLP) models and image-clustering techniques. We first employ sampling techniques to create standardized online news datasets. Feature engineering techniques from the Korean Bidirectional Encoder Representations from Transformers (KoBERT) model are then used to generate relevance and sentiment scores for the textual data. Further, to enhance visualization and clustering, we transform the textual data into joint plot images, which are grouped into distinct clusters based on news categories. Finally, using Multi-criteria Decision Analysis, the various scores and cluster information are synthesized to generate the final economic sentiment index. This approach improves visualization and enhances the interpretability of the generated index. The proposed algorithm is applied to construct a new economic sentiment index for the Information and Communications Technology (ICT) industry in South Korea.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 3","pages":"1464-1474"},"PeriodicalIF":7.5,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10705082","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143949095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Large-Scale Data Quality Challenges, Framework and Evaluation in Metro Systems","authors":"Tailan Yuan;Wen Xiong;Siyuan Liu","doi":"10.1109/TBDATA.2024.3474215","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3474215","url":null,"abstract":"Data quality is a fundamental challenge for downstream data mining tasks. While numerous studies have addressed data quality issues in various contexts, there is a notable lack of systematic research on data quality in metro systems. Metro systems generate a vast volume of multisource heterogeneous datasets daily, and many data mining tasks have been developed for operational and management purposes. Therefore, investigating data quality problems in metro systems is crucial. In this paper, we systematically explore data quality issues in metro systems. First, we present a comprehensive analysis method to examine data quality problems such as missing data, noise, and weak semantics. Second, we design five metrics to measure data quality and propose a set of quality improvement approaches. These approaches include a travel pattern-based missing value imputation method, a heuristic trajectory noise filtering method, and a data semantics enhancement method. Additionally, we develop an automated pipeline solution where the data quality enhancement algorithms are seamlessly integrated with the data processing pipeline. Finally, we provide a case study to illustrate the significant benefits of our data quality improvement methods. We conducted extensive experiments to validate our methods on a set of large-scale datasets collected from a metro system, which include, Wi-Fi signal data, and electronic fence data. The results indicate that 1) the proposed imputation method surpasses other baselines by 26.47% to 44.82%; 2) the proposed noise filtering method outperforms other baselines by an average of 12.22%; and 3) the proposed data semantics enrichment method exceeds the baseline method by 37.34% in terms of maximum accuracy.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 3","pages":"1447-1463"},"PeriodicalIF":7.5,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143949174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Verifiable and Privacy-Preserving $k$k-NN Query Scheme With Multiple Keys","authors":"Yunzhen Zhang;Baocang Wang;Zhen Zhao","doi":"10.1109/TBDATA.2024.3463543","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3463543","url":null,"abstract":"As a basic primitive in spatial and multimedia databases, the <inline-formula><tex-math>$k$</tex-math></inline-formula>-nearest neighbors (<inline-formula><tex-math>$k$</tex-math></inline-formula>-NN) query has been widely used in electronic medicine, location-based services and so on. With the boom in cloud computing, it is currently a trend to upload massive data to the cloud server to enjoy its powerful storage and computing resources. Recently, research communities and commercial applications have proposed many schemes to support <inline-formula><tex-math>$k$</tex-math></inline-formula>-NN query on cloud data. However, most of the existing schemes were designed under the assumption that the query users (QUs) are fully trusted and hold the key of the data owner (DO). In this case, even if the queries were encrypted, the QUs can capture the query content from each other, leading to the query privacy leakage. Unfortunately, to the best of our knowledge, few <inline-formula><tex-math>$k$</tex-math></inline-formula>-NN query schemes can ensure data security and result verification under the key confidentiality condition. In this paper, we propose a verifiable and privacy-preserving <inline-formula><tex-math>$k$</tex-math></inline-formula>-NN query scheme with multiple keys (VP<inline-formula><tex-math>$k$</tex-math></inline-formula>NN), in which each QU's partial private key can only decrypt the encrypted query results belonging to its own, but not the encrypted database, the encrypted query data and query results of other QUs. Moreover, our proposal not only answers the query efficiently, but also ensures the privacy of the data, the query and the result, and the verification of the correctness of the results. Finally, the complexity and security are theoretically analyzed, and the practicality and efficiency of our proposed scheme are compared by simulation experiments.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 3","pages":"1434-1446"},"PeriodicalIF":7.5,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143949324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reliable Data Augmented Contrastive Learning for Sequential Recommendation","authors":"Mankun Zhao;Aitong Sun;Jian Yu;Xuewei Li;Dongxiao He;Ruiguo Yu;Mei Yu","doi":"10.1109/TBDATA.2024.3453752","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3453752","url":null,"abstract":"Sequential recommendation aims to capture users’ dynamic preferences. Due to the limited information in the sequence and the uncertain user behavior, data sparsity has always been a key problem. Although data augmentation methods can alleviate this issue, unreliable data can affect the performance of such models. To solve the above problems, we propose a new framework, namely \u0000<bold>R</b>\u0000eliable \u0000<bold>D</b>\u0000ata Augmented \u0000<bold>C</b>\u0000ontrastive Learning \u0000<bold>Rec</b>\u0000ommender (RDCRec). Specifically, in order to generate more high-quality reliable items for data augmentation, we design a multi-attributes oriented sequence generator. It moves auxiliary information from the input layer to the attention layer for learning a better attention distribution. Then, we replace a percentage of items in the original sequence with reliable items generated by the generator as the augmented sequence, for creating a high-quality view for contrastive learning. In this way, RDCRec can extract more meaningful user patterns by using the self-supervised signals of the reliable items, thereby improving recommendation performance. Finally, we train a discriminator to identify unreplaced items in the augmented sequence thus we can update item embeddings selectively in order to increase the exposure of more reliable items and improve the accuracy of recommendation results. The discriminator, as an auxiliary model, is jointly trained with the generative task and the contrastive learning task. Large experiments on four popular datasets that are commonly used demonstrate the effectiveness of our new method for sequential recommendation.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 6","pages":"694-705"},"PeriodicalIF":7.5,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142600248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huan Tian;Bo Liu;Tianqing Zhu;Wanlei Zhou;Philip S. Yu
{"title":"Distilling Fair Representations From Fair Teachers","authors":"Huan Tian;Bo Liu;Tianqing Zhu;Wanlei Zhou;Philip S. Yu","doi":"10.1109/TBDATA.2024.3460532","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3460532","url":null,"abstract":"As an increasing number of data-driven deep learning models are deployed in our daily lives, the issue of algorithmic fairness has become a major concern. These models are trained on data that inevitably contains various biases, leading them to learn unfair representations that differ across demographic subgroups, resulting in unfair predictions. Previous work on fairness has attempted to remove subgroup information from learned features, aiming to contribute to similar representations across subgroups and lead to fairer predictions. However, identifying and removing this information is extremely challenging due to the “black box” nature of neural networks. Moreover, removing desired features without affecting other features is difficult, as features are often correlated, potentially harming model prediction performance. This paper aims to learn fair representations without degrading model prediction performance. We adopt knowledge distillation, allowing unfair models to learn fair representations directly from a fair teacher. The proposed method provides a novel approach to obtaining fair representations while maintaining valid prediction performance. We evaluate the proposed method, FairDistill, on four datasets (CIFAR-10, UTKFace, CelebA, and Adult) under diverse settings. Extensive experiments demonstrate the effectiveness and robustness of the proposed method.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 3","pages":"1419-1433"},"PeriodicalIF":7.5,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143949093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic NN-Descent: An Efficient k-NN Graph Construction Method","authors":"Jie-Feng Wang;Wan-Lei Zhao;Shihai Xiao;Jiajie Yao;Xuecang Zhang","doi":"10.1109/TBDATA.2024.3460534","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3460534","url":null,"abstract":"As a classic <italic>k</i>-NN graph construction method, NN-Descent has been adopted in various applications for its simplicity, genericness, and efficiency. However, its memory consumption is high due to the employment of two extra supporting graph structures. In this paper, a novel <italic>k</i>-NN graph construction method is proposed. Similar to NN-Descent, the <italic>k</i>-NN graph is constructed by doing cross-matching continuously on the sampled neighbors on each neighborhood. Whereas different from NN-Descent, the cross-matching is undertaken directly on the <italic>k</i>-NN graph under construction. It makes the extra graph structures adopted to support the cross-matching no longer necessary. Moreover, no synchronization between different threads is needed within one iteration. The high-quality graph is constructed at the high-speed efficiency and considerably better memory efficiency over NN-Descent on both the multi-thread CPU and the GPU.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 2","pages":"879-886"},"PeriodicalIF":7.5,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143627853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}