{"title":"Small Object Detection Based on Microscale Perception and Enhancement-Location Feature Pyramid","authors":"Guang Han;Chenwei Guo;Ziyang Li;Haitao Zhao","doi":"10.1109/TCDS.2024.3397684","DOIUrl":"10.1109/TCDS.2024.3397684","url":null,"abstract":"Due to the large number of small objects, significant scale variation, and uneven distribution in images captured by unmanned aerial vehicles (UAVs), existing algorithms have high rates of missing and false detections of small objects in drone images. A new object detection algorithm based on microscale perception and enhancement-location feature pyramid is proposed in this article. The microscale perception module alternatives the original convolution module in backbone, changing the receptive field through two dilation branches with various dilation rates and an adjustment switch branch. To better match the size and shape of sampled targets, the weighted deformable convolution is employed. The enhancement-location feature pyramid module aggregates the features from each layer to obtain balanced semantic information and refines aggregated features to enhance their ability to represent features. Moreover, a bottom-up branch structure is added to utilize the property of lower layer features being beneficial to locating small objects to enhance the localization ability for small objects. Additionally, by using specific image cropping and combining techniques, the target distribution of the training data is altered to make the model more sensitive to small objects and improving its robustness. Finally, a sample balance strategy is used in combination with focal loss and a sample extraction control method to balance simple hard sample imbalance and the long-tail distribution of interclass sample imbalance during training. Experimental results show that the proposed algorithm achieves a mean average precision of 35.9% on the VisDrone2019 dataset, which is a 14.2% improvement over the baseline Cascade RCNN and demonstrates better performance in detecting small objects in drone images. Compared with advanced algorithms in recent years, it also achieves state-of-the-art detection accuracy.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 6","pages":"1982-1996"},"PeriodicalIF":5.0,"publicationDate":"2024-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140940720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LITE-SNN: Leveraging Inherent Dynamics to Train Energy-Efficient Spiking Neural Networks for Sequential Learning","authors":"Nitin Rathi;Kaushik Roy","doi":"10.1109/TCDS.2024.3396431","DOIUrl":"10.1109/TCDS.2024.3396431","url":null,"abstract":"Spiking neural networks (SNNs) are gaining popularity for their promise of low-power machine intelligence on event-driven neuromorphic hardware. SNNs have achieved comparable performance as artificial neural networks (ANNs) on static tasks (image classification) with lower compute energy. In this work, we explore the inherent dynamics of SNNs for sequential tasks such as gesture recognition, sentiment analysis, and sequence-to-sequence learning on data from dynamic vision sensors (DVSs) and natural language processing (NLP). Sequential data are generally processed with complex recurrent neural networks (RNNs) [long short-term memory/gated recurrent unit (LSTM/GRU)] with explicit feedback connections and internal states to handle the long-term dependencies. The neuron models in SNNs—integrate-and-fire (IF) or leaky-integrate-and-fire (LIF)—have internal states (membrane potential) that can be efficiently leveraged for sequential tasks. The membrane potential in the IF/LIF neuron integrates the incoming current and outputs an event (or spike) when the potential crosses a threshold value. Since SNNs compute with highly sparse spike-based spatiotemporal data, the energy/inference is lower than LSTMs/GRUs. We also show that SNNs require fewer parameters than LSTM/GRU resulting in smaller models and faster inference. We observe the problem of vanishing gradients in vanilla SNNs for longer sequences and implement a convolutional SNN with attention layers to perform sequence-to-sequence learning tasks. The inherent recurrence in SNNs, in addition to the fully parallelized convolutional operations, provide additional mechanisms to model sequential dependencies that lead to better accuracy than convolutional neural networks (CNNs) with ReLU activations. We evaluate SNN on gesture recognition from the IBM DVS dataset, sentiment analysis from the IMDB movie reviews dataset, and German-to-English translation from the Multi30k dataset.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 6","pages":"1905-1914"},"PeriodicalIF":5.0,"publicationDate":"2024-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140829039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Machine Unlearning for Seizure Prediction","authors":"Chenghao Shao;Chang Li;Rencheng Song;Xiang Liu;Ruobing Qian;Xun Chen","doi":"10.1109/TCDS.2024.3395663","DOIUrl":"10.1109/TCDS.2024.3395663","url":null,"abstract":"In recent years, companies and organizations have been required to provide individuals with the right to be forgotten to alleviate privacy concerns. In machine learning, this requires researchers not only to delete data from databases but also to remove data information from trained models. Thus, machine unlearning is becoming an emerging research problem. In seizure prediction field, prediction applications are established most on private electroencephalogram (EEG) signals. To provide the right to be forgotten, we propose a machine unlearning method for seizure prediction. Our proposed unlearning method is based on knowledge distillation using two teacher models to guide the student model toward achieving model-level unlearning objective. One teacher model is used to induce the student model to forget data information of patients with unlearning request (forgetting patients), while the other teacher model is used to enable the student model to retain data information of other patients (remaining patients). Experiments were conducted on CHBMIT and Kaggle databases. Results show that our proposed unlearning method can effectively make trained ML models forget the information of forgetting patients and maintain satisfactory performance on remaining patients. To the best of our knowledge, it is the first work of machine unlearning in seizure prediction field.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 6","pages":"1969-1981"},"PeriodicalIF":5.0,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140842228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Song Peng;Teng Ran;Liang Yuan;Jianbo Zhang;Wendong Xiao
{"title":"Robust Perception-Based Visual Simultaneous Localization and Tracking in Dynamic Environments","authors":"Song Peng;Teng Ran;Liang Yuan;Jianbo Zhang;Wendong Xiao","doi":"10.1109/TCDS.2024.3371073","DOIUrl":"10.1109/TCDS.2024.3371073","url":null,"abstract":"Visual simultaneous localization and mapping (SLAM) in dynamic scenes is a prerequisite for robot-related applications. Most of the existing SLAM algorithms mainly focus on dynamic object rejection, which makes part of the valuable information lost and prone to failure in complex environments. This article proposes a semantic visual SLAM system that incorporates rigid object tracking. A robust scene perception frame is designed, which gives autonomous robots the ability to perceive scenes similar to human cognition. Specifically, we propose a two-stage mask revision method to generate fine mask of the object. Based on the revised mask, we propose a semantic and geometric constraint (SAG) strategy, which provides a fast and robust way to perceive dynamic rigid objects. Then, the motion tracking of rigid objects is integrated into the SLAM pipeline, and a novel bundle adjustment is constructed to optimize camera localization and object six-degree of freedom (DoF) poses. Finally, the evaluation of the proposed algorithm is performed on publicly available KITTI dataset, Oxford Multimotion dataset, and real-world scenarios. The proposed algorithm achieves the comprehensive performance of \u0000<inline-formula><tex-math>$text{RPE}_{text{t}}$</tex-math></inline-formula>\u0000 less than 0.07 m per frame and \u0000<inline-formula><tex-math>$text{RPE}_{text{R}}$</tex-math></inline-formula>\u0000 about 0.03\u0000<inline-formula><tex-math>${}^{circ}$</tex-math></inline-formula>\u0000 per frame in the KITTI dataset. The experimental results reveal that the proposed algorithm enables accurate localization and robust tracking than state-of-the-art SLAM algorithms in challenging dynamic scenarios.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 4","pages":"1507-1520"},"PeriodicalIF":5.0,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140002820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Brain Connectivity Analysis for EEG-Based Face Perception Task","authors":"Debashis Das Chakladar;Nikhil R. Pal","doi":"10.1109/TCDS.2024.3370635","DOIUrl":"10.1109/TCDS.2024.3370635","url":null,"abstract":"Face perception is considered a highly developed visual recognition skill in human beings. Most face perception studies used functional magnetic resonance imaging to identify different brain cortices related to face perception. However, studying brain connectivity networks for face perception using electroencephalography (EEG) has not yet been done. In the proposed framework, initially, a correlation-tree traversal-based channel selection algorithm is developed to identify the “optimum” EEG channels by removing the highly correlated EEG channels from the input channel set. Next, the effective brain connectivity network among those “optimum” EEG channels is developed using multivariate transfer entropy (TE) while participants watched different face stimuli (i.e., famous, unfamiliar, and scrambled). We transform EEG channels into corresponding brain regions for generalization purposes and identify the active brain regions for each face stimulus. To find the stimuluswise brain dynamics, the information transfer among the identified brain regions is estimated using several graphical measures [global efficiency (GE) and transitivity]. Our model archives the mean GE of 0.800, 0.695, and 0.581 for famous, unfamiliar, and scrambled faces, respectively. Identifying face perception-specific brain regions will enhance understanding of the EEG-based face-processing system. Understanding the brain networks of famous, unfamiliar, and scrambled faces can be useful in criminal investigation applications.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 4","pages":"1494-1506"},"PeriodicalIF":5.0,"publicationDate":"2024-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140002461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"D-FaST: Cognitive Signal Decoding With Disentangled Frequency–Spatial–Temporal Attention","authors":"WeiGuo Chen;Changjian Wang;Kele Xu;Yuan Yuan;Yanru Bai;Dongsong Zhang","doi":"10.1109/TCDS.2024.3370261","DOIUrl":"10.1109/TCDS.2024.3370261","url":null,"abstract":"Cognitive language processing (CLP), situated at the intersection of natural language processing (NLP) and cognitive science, plays a progressively pivotal role in the domains of artificial intelligence, cognitive intelligence, and brain science. Among the essential areas of investigation in CLP, cognitive signal decoding (CSD) has made remarkable achievements, yet there still exist challenges related to insufficient global dynamic representation capability and deficiencies in multidomain feature integration. In this article, we introduce a novel paradigm for CLP referred to as disentangled frequency–spatial–temporal attention (D-FaST). Specifically, we present a novel cognitive signal decoder that operates on disentangled frequency–space–time domain attention. This decoder encompasses three key components: frequency domain feature extraction employing multiview attention (MVA), spatial domain feature extraction utilizing dynamic brain connection graph attention, and temporal feature extraction relying on local time sliding window attention. These components are integrated within a novel disentangled framework. Additionally, to encourage advancements in this field, we have created a new CLP dataset, MNRED. Subsequently, we conducted an extensive series of experiments, evaluating D-FaST's performance on MNRED, as well as on publicly available datasets including ZuCo, BCIC IV-2A, and BCIC IV-2B. Our experimental results demonstrate that D-FaST outperforms existing methods significantly on both our datasets and traditional CSD datasets including establishing a state-of-the-art accuracy score 78.72% on MNRED, pushing the accuracy score on ZuCo to 78.35%, accuracy score on BCIC IV-2A to 74.85%, and accuracy score on BCIC IV-2B to 76.81%.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 4","pages":"1476-1493"},"PeriodicalIF":5.0,"publicationDate":"2024-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139979066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DTCM: Deep Transformer Capsule Mutual Distillation for Multivariate Time Series Classification","authors":"Zhiwen Xiao;Xin Xu;Huanlai Xing;Bowen Zhao;Xinhan Wang;Fuhong Song;Rong Qu;Li Feng","doi":"10.1109/TCDS.2024.3370219","DOIUrl":"10.1109/TCDS.2024.3370219","url":null,"abstract":"This article proposes a dual-network-based feature extractor, perceptive capsule network (PCapN), for multivariate time series classification (MTSC), including a local feature network (LFN) and a global relation network (GRN). The LFN has two heads (i.e., Head_A and Head_B), each containing two squash convolutional neural network (CNN) blocks and one dynamic routing block to extract the local features from the data and mine the connections among them. The GRN consists of two capsule-based transformer blocks and one dynamic routing block to capture the global patterns of each variable and correlate the useful information of multiple variables. Unfortunately, it is difficult to directly deploy PCapN on mobile devices due to its strict requirement for computing resources. So, this article designs a lightweight capsule network (LCapN) to mimic the cumbersome PCapN. To promote knowledge transfer from PCapN to LCapN, this article proposes a deep transformer capsule mutual (DTCM) distillation method. It is targeted and offline, using one- and two-way operations to supervise the knowledge distillation (KD) process for the dual-network-based student and teacher models. Experimental results show that the proposed PCapN and DTCM achieve excellent performance on University of East Anglia 2018 (UEA2018) datasets regarding top-1 accuracy.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 4","pages":"1445-1461"},"PeriodicalIF":5.0,"publicationDate":"2024-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139979417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Agree to Disagree: Exploring Partial Semantic Consistency Against Visual Deviation for Compositional Zero-Shot Learning","authors":"Xiangyu Li;Xu Yang;Xi Wang;Cheng Deng","doi":"10.1109/TCDS.2024.3367957","DOIUrl":"10.1109/TCDS.2024.3367957","url":null,"abstract":"Compositional zero-shot learning (CZSL) aims to recognize novel concepts from known subconcepts. However, it is still challenging since the intricate interaction between subconcepts is entangled with their corresponding visual features, which affects the recognition accuracy of concepts. Besides, the domain gap between training and testing data leads to the model poor generalization. In this article, we tackle these problems by exploring partial semantic consistency (PSC) to eliminate visual deviation to guarantee the discrimination and generalization of representations. Considering the complicated interaction between subconcepts and their visual features, we decompose seen images into visual elements according to their labels and obtain the instance-level subdeviations from compositions, which is utilized to excavate the category-level primitives of subconcepts. Furthermore, we present a multiscale concept composition (MSCC) approach to produce virtual samples from two aspects, which augments the sufficiency and diversity of samples so that the proposed model can generalize to novel compositions. Extensive experiments indicate that our method significantly outperforms the state-of-the-art approaches on three benchmark datasets.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 4","pages":"1433-1444"},"PeriodicalIF":5.0,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139954276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Compressed Video Anomaly Detection of Human Behavior Based on Abnormal Region Determination","authors":"Lijun He;Miao Zhang;Hao Liu;Liejun Wang;Fan Li","doi":"10.1109/TCDS.2024.3367493","DOIUrl":"10.1109/TCDS.2024.3367493","url":null,"abstract":"Video anomaly detection has a wide range of applications in video monitoring-related scenarios. The existing image-domain-based anomaly detection algorithms usually require completely decoding the received videos, complex information extraction, and network structure, which makes them difficult to be implemented directly. In this article, we focus on anomaly detection directly for compressed videos. The compressed videos need not be fully decoded and auxiliary information can be obtained directly, which have low computational complexity. We propose a compressed video anomaly detection algorithm based on accurate abnormal region determination (ARD-VAD), which is suitable to be deployed on edge servers. First, to ensure the overall low complexity and save storage space, we sparsely sample the prior knowledge of I-frame representing the appearance information and motion vector (MV) representing the motion information from compressed videos. Based on the sampled information, a two-branch network structure, which consists of MV reconstruction branch and future I-frame prediction branch, is designed. Specifically, the two branches are connected by an attention network based on the MV residuals to guide the prediction network to focus on the abnormal regions. Furthermore, to emphasize the abnormal regions, we develop an adaptive sensing of abnormal regions determination module based on motion intensity represented by the second derivative of MV. This module can enhance the difference of the real anomaly region between the generated frame and the current frame. The experiments show that our algorithm can achieve a good balance between performance and complexity.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 4","pages":"1462-1475"},"PeriodicalIF":5.0,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139954150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Reinforcement Learning With Multicritic TD3 for Decentralized Multirobot Path Planning","authors":"Heqing Yin;Chang Wang;Chao Yan;Xiaojia Xiang;Boliang Cai;Changyun Wei","doi":"10.1109/TCDS.2024.3368055","DOIUrl":"10.1109/TCDS.2024.3368055","url":null,"abstract":"Centralized multirobot path planning is a prevalent approach involving a global planner computing feasible paths for each robot using shared information. Nonetheless, this approach encounters limitations due to communication constraints and computational complexity. To address these challenges, we introduce a novel decentralized multirobot path planning approach that eliminates the need for sharing the states and intentions of robots. Our approach harnesses deep reinforcement learning and features an asynchronous multicritic twin delayed deep deterministic policy gradient (AMC-TD3) algorithm, which enhances the original gate recurrent unit (GRU)-attention-based TD3 algorithm by incorporating a multicritic network and employing an asynchronous training mechanism. By training each critic with a unique reward function, our learned policy enables each robot to navigate toward its long-term objective without colliding with other robots in complex environments. Furthermore, our reward function, grounded in social norms, allows the robots to naturally avoid each other in congested situations. Specifically, we train three critics to encourage each robot to achieve its long-term navigation goal, maintain its moving direction, and prevent collisions with other robots. Our model can learn an end-to-end navigation policy without relying on an accurate map or any localization information, rendering it highly adaptable to various environments. Simulation results reveal that our proposed approach surpasses baselines in several environments with different levels of complexity and robot populations.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 4","pages":"1233-1247"},"PeriodicalIF":5.0,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139954488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}