IEEE Transactions on Cognitive and Developmental Systems最新文献_第9页

Decoding Joint-Level Hand Movements With Intracortical Neural Signals in a Human Brain–Computer Interface 在人脑-计算机接口中利用皮层内神经信号解码关节级手部运动

IF 5 3区计算机科学

IEEE Transactions on Cognitive and Developmental Systems Pub Date : 2024-06-04 DOI: 10.1109/TCDS.2024.3409555

Huaqin Sun;Yu Qi;Xiaodi Wu;Junming Zhu;Jianmin Zhang;Yueming Wang

{"title":"Decoding Joint-Level Hand Movements With Intracortical Neural Signals in a Human Brain–Computer Interface","authors":"Huaqin Sun;Yu Qi;Xiaodi Wu;Junming Zhu;Jianmin Zhang;Yueming Wang","doi":"10.1109/TCDS.2024.3409555","DOIUrl":"10.1109/TCDS.2024.3409555","url":null,"abstract":"Fine movements of hands play an important role in everyday life. While existing studies have successfully decoded hand gestures or finger movements from brain signals, direct decoding of single-joint kinematics remains challenging. This study aims to investigate the decoding of fine hand movements at the single-joint level. Neural activities were recorded from the motor cortex (MC) of a human participant while imagining eleven different hand movements. We comprehensively evaluated the decoding efficiency of various brain signal features, neural decoding algorithms, and single-joint kinematic variables for decoding. Results showed that using the spiking band power (SBP) signals, we could faithfully decode the single-joint angles with an average correlation coefficient of 0.77, outperforming other brain signal features. Nonlinear approaches that incorporate temporal context information, particularly recurrent neural networks, significantly outperformed traditional methods. Decoding joint angles yielded superior results compared to joint angular velocities. Our approach facilitates the construction of high-performance brain–computer interfaces for dexterous hand control.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 6","pages":"2100-2109"},"PeriodicalIF":5.0,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141940258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimal Strategies and Cooperative Teaming for 3-D Multiplayer Reach-Avoid Games 三维多人避险游戏的最佳策略与合作组队

IF 5 3区计算机科学

IEEE Transactions on Cognitive and Developmental Systems Pub Date : 2024-06-03 DOI: 10.1109/TCDS.2024.3406889

Peng Gao;Xiuxian Li;Jinwen Hu

{"title":"Optimal Strategies and Cooperative Teaming for 3-D Multiplayer Reach-Avoid Games","authors":"Peng Gao;Xiuxian Li;Jinwen Hu","doi":"10.1109/TCDS.2024.3406889","DOIUrl":"10.1109/TCDS.2024.3406889","url":null,"abstract":"This article studies multiplayer reach-avoid games with a plane being the goal in 3-D space. Due to the difficulty that directly analyzing multipursuer multievader scenarios brings the curse of dimensionality, the whole problem is decomposed to distinct subgames. In the subgames, a single pursuer or multiple pursuers, which have different speeds, form a team to capture one evader cooperatively while the evader struggles to reach the plane. With the players’ dominance region based on the definition of isochronous surfaces, the target points and value functions are obtained for the game of degree by using Apollonius spheres. Additionally, the corresponding closed-loop saddle-point strategies are shown to be Nash equilibrium. The degeneration between scenarios of different scales is also discussed. To minimize the sum of subgames’ costs, the tasks of intercepting multiple evaders are assigned to individuals or teams in the form of bipartite graph matching. A hierarchical matching algorithm and a state-feedback rematching method are proposed which can be updated in real-time to improve the solution. Finally, diverse empirical experiments and comparisons with state-of-the-art methods are illustrated to demonstrate the optimality of proposed strategies and algorithms in this article.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 6","pages":"2085-2099"},"PeriodicalIF":5.0,"publicationDate":"2024-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141940306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

HR-SNN: An End-to-End Spiking Neural Network for Four-Class Classification Motor Imagery Brain–Computer Interface HR-SNN：用于四级分类运动图像的端到端尖峰神经网络脑机接口

IF 5 3区计算机科学

IEEE Transactions on Cognitive and Developmental Systems Pub Date : 2024-04-30 DOI: 10.1109/TCDS.2024.3395443

Yulin Li;Liangwei Fan;Hui Shen;Dewen Hu

{"title":"HR-SNN: An End-to-End Spiking Neural Network for Four-Class Classification Motor Imagery Brain–Computer Interface","authors":"Yulin Li;Liangwei Fan;Hui Shen;Dewen Hu","doi":"10.1109/TCDS.2024.3395443","DOIUrl":"10.1109/TCDS.2024.3395443","url":null,"abstract":"Spiking neural network (SNN) excels in processing temporal information and conserving energy, particularly when deployed on neuromorphic hardware. These strengths position SNN as an ideal choice for developing wearable brain–computer interface (BCI) devices. However, the application of SNN in complex BCI tasks, like four-class motor imagery classification, is limited. In light of this, this study introduces a powerful SNN architecture hybrid response SNN (HR-SNN). We employ parameterwise gradient descent methods to optimize spike encoding efficiency. The SNN's frequency perception is improved by integrating a hybrid response spiking module. In addition, a diff-potential spiking decoder is designed to optimize SNN output potential utilization. Validation experiments are performed on PhysioNet and BCI competition IV 2a datasets. On PhysioNet, our model achieves accuracies of 67.24% and 74.95% using global training and subject-specific transfer learning, respectively. On BCI competition IV 2a, our approach attains an average accuracy of 77.58%, surpassing all the compared SNN models and demonstrating competitiveness against state-of-the-art (SOTA) convolution neural network (CNN) approaches. We validate the robustness of HR-SNN under noise and channel loss scenarios. Additionally, energy analysis reveals HR-SNN's superior energy efficiency compared to existing CNN models. Notably, HR-SNN exhibits a 2–16 times energy consumption advantage over existing SNN methods.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 6","pages":"1955-1968"},"PeriodicalIF":5.0,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140829368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Adaptive Framework for Long-Term Sensory Home Training: A Feasibility Study 长期感官家庭训练的适应性框架：可行性研究

IF 5 3区计算机科学

IEEE Transactions on Cognitive and Developmental Systems Pub Date : 2024-04-25 DOI: 10.1109/TCDS.2024.3393635

Stefano Silvoni;Simon Desch;Florian Beier;Robin Bekrater-Bodmann;Annette Löffler;Dieter Kleinböhl;Stefano Tamascelli;Herta Flor

{"title":"Adaptive Framework for Long-Term Sensory Home Training: A Feasibility Study","authors":"Stefano Silvoni;Simon Desch;Florian Beier;Robin Bekrater-Bodmann;Annette Löffler;Dieter Kleinböhl;Stefano Tamascelli;Herta Flor","doi":"10.1109/TCDS.2024.3393635","DOIUrl":"10.1109/TCDS.2024.3393635","url":null,"abstract":"Training programs, based on principles of brain-plasticity and skill learning, are useful in counteracting functional decline in pathological conditions. Training effects of such procedures are well described but their adaptive features are usually not reported. A software framework designed for a long-term home training program is presented. It gradually trains users, provides a multidimensional range of stimulus differentiation, encompasses a strategy to increase the task demand and includes motivational reinforcement components. The structured framework was tested in a feasibility study involving two perceptual discrimination tasks (visual and auditory) in four persons in middle-to-older adulthood who were trained for 30 days. Practicability of the training was shown in a home setting by high adherence to the procedure, adaptive increase in task demand over time and positive learning effects on an individual level. Participants learned to distinguish progressively smaller target objects in the visual task (with diminished contrast) and sweeps progressively varying less in frequency in the auditory task (with overlapping noise). This adaptive procedure can provide the basis for the design of extended training programs engaging sensory function in individuals with impaired sensorimotor and cognitive functions. Further investigations are necessary to assess the generalization of learning effects and clinical validity.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 6","pages":"1929-1942"},"PeriodicalIF":5.0,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10508624","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140797954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Leveraging Spatiotemporal Estimation for Online Adaptive Steady-State Visual Evoked Potential Recognition 利用时空估计进行在线自适应稳态视觉诱发电位识别

IF 5 3区计算机科学

IEEE Transactions on Cognitive and Developmental Systems Pub Date : 2024-04-23 DOI: 10.1109/TCDS.2024.3392745

Jing Jin;Xinjie He;Brendan Z. Allison;Ke Qin;Xingyu Wang;Andrzej Cichocki

{"title":"Leveraging Spatiotemporal Estimation for Online Adaptive Steady-State Visual Evoked Potential Recognition","authors":"Jing Jin;Xinjie He;Brendan Z. Allison;Ke Qin;Xingyu Wang;Andrzej Cichocki","doi":"10.1109/TCDS.2024.3392745","DOIUrl":"10.1109/TCDS.2024.3392745","url":null,"abstract":"Online adaptive canonical correction analysis (OACCA) has been applied successfully in the recently popular steady-state visual evoked potential (SSVEP) target recognition methods. However, due to the significant amount of spatiotemporal relevant background noise in the online historical sample label data of OACCA, there is redundant noise component in the learned common spatial filter that can reduce online classification accuracy. Aiming at solving this defect in OACCA, we designed an online spatial–temporal equalization filter (STE) to suppress the background noise component in the electroencephalography (EEG). Meanwhile, an adaptive decoding method for SSVEP based on online spatial–temporal estimation (STE-OACCA) is proposed by combining the online STE filter and the OACCA algorithm. A pseudoonline test on the Tsinghua University FBCCA-DW dataset shows that the proposed STE-OACCA method significantly outperforms the CCA, MSI, OACCA approaches as well as STE-CCA. More importantly, proposed method can be directly used in online SSVEP recognition without calibration. The proposed algorithm is robust, which is promising for the development of practical brain computer interface (BCI).","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 6","pages":"1943-1954"},"PeriodicalIF":5.0,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140797952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Minimizing EEG Human Interference: A Study of an Adaptive EEG Spatial Feature Extraction With Deep Convolutional Neural Networks 最小化脑电图人为干扰：利用深度卷积神经网络进行自适应脑电图空间特征提取的研究

IF 5 3区计算机科学

IEEE Transactions on Cognitive and Developmental Systems Pub Date : 2024-04-18 DOI: 10.1109/TCDS.2024.3391131

Haojin Deng;Shiqi Wang;Yimin Yang;W. G. Will Zhao;Hui Zhang;Ruizhong Wei;Q. M. Jonathan Wu;Bao-Liang Lu

{"title":"Minimizing EEG Human Interference: A Study of an Adaptive EEG Spatial Feature Extraction With Deep Convolutional Neural Networks","authors":"Haojin Deng;Shiqi Wang;Yimin Yang;W. G. Will Zhao;Hui Zhang;Ruizhong Wei;Q. M. Jonathan Wu;Bao-Liang Lu","doi":"10.1109/TCDS.2024.3391131","DOIUrl":"10.1109/TCDS.2024.3391131","url":null,"abstract":"Emotion is one of the main psychological factors that affects human behavior. Using a neural network model trained with electroencephalography (EEG)-based frequency features has been widely used to accurately recognize human emotions. However, utilizing EEG-based spatial information with popular 2-D kernels of convolutional neural networks (CNNs) has rarely been explored in the extant literature. This article addresses these challenges by proposing an EEG-based spatial-frequency-based framework for recognizing human emotion, resulting in fewer human interference parameters with better generalization performance. Specifically, we propose a two-stream hierarchical network framework that learns features from two networks, one trained from the frequency domain while another trained from the spatial domain. Our approach is extensively validated on the SEED, SEED-V, and DREAMER datasets. Our proposed method achieved an accuracy of 94.84% on the SEED dataset and 68.61% on the SEED-V dataset with EEG data only. The average accuracy of the Dreamer dataset is 93.01%, 92.04%, and 91.74% in valence, arousal, and dominance dimensions, respectively. The experiments directly support that our motivation of utilizing the two-stream domain features significantly improves the final recognition performance. The experimental results show that the proposed framework obtains improvements over state-of-the-art methods over these three varied scaled datasets. Furthermore, it also indicates the potential of the proposed framework in conjunction with current ImageNet pretrained models for improving performance on 1-D psychological signals.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 6","pages":"1915-1928"},"PeriodicalIF":5.0,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140627221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MAVIDSQL: A Model-Agnostic Visualization for Interpretation and Diagnosis of Text-to-SQL Tasks MAVIDSQL：用于解释和诊断文本到 SQL 任务的模型诊断可视化工具

IF 5 3区计算机科学

IEEE Transactions on Cognitive and Developmental Systems Pub Date : 2024-04-18 DOI: 10.1109/TCDS.2024.3391278

Jingwei Tang;Guodao Sun;Jiahui Chen;Gefei Zhang;Baofeng Chang;Haixia Wang;Ronghua Liang

{"title":"MAVIDSQL: A Model-Agnostic Visualization for Interpretation and Diagnosis of Text-to-SQL Tasks","authors":"Jingwei Tang;Guodao Sun;Jiahui Chen;Gefei Zhang;Baofeng Chang;Haixia Wang;Ronghua Liang","doi":"10.1109/TCDS.2024.3391278","DOIUrl":"10.1109/TCDS.2024.3391278","url":null,"abstract":"Significant advancements in semantic parsing for text-to-SQL (T2S) tasks have been achieved through the employment of neural network models, such as LSTM, BERT, and T5. The exceptional performance of large language models, such as ChatGPT, has been demonstrated in recent research, even in zero-shot scenarios. However, the inherent transparency of T2S models presents them as black boxes, concealing their inner workings from both developers and users, which complicates the diagnosis of potential error patterns. Despite the fact that numerous visual analysis studies have been conducted in natural language processing communities, scant attention has been paid to addressing the challenges of semantic parsing, specifically in T2S tasks. This limitation hinders the development of effective tools for model optimization and evaluation. This article presents an interactive visual analysis tool, MAVIDSQL, to assist model developers and users in understanding and diagnosing T2S tasks. The system comprises three modules: the model manager, the feature extractor, and the visualization interface, which adopt a model-agnostic approach to diagnose potential errors and infer model decisions by analyzing input–output data, facilitating interactive visual analysis to identify error patterns and assess model performance. Two case studies and interviews with domain experts demonstrate the effectiveness of MAVIDSQL in facilitating the understanding of T2S tasks and identifying potential errors.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 5","pages":"1887-1903"},"PeriodicalIF":5.0,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140627741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Toward Two-Stream Foveation-Based Active Vision Learning 实现基于视觉的双流主动视觉学习

IF 5 3区计算机科学

IEEE Transactions on Cognitive and Developmental Systems Pub Date : 2024-04-17 DOI: 10.1109/TCDS.2024.3390597

Timur Ibrayev;Amitangshu Mukherjee;Sai Aparna Aketi;Kaushik Roy

{"title":"Toward Two-Stream Foveation-Based Active Vision Learning","authors":"Timur Ibrayev;Amitangshu Mukherjee;Sai Aparna Aketi;Kaushik Roy","doi":"10.1109/TCDS.2024.3390597","DOIUrl":"10.1109/TCDS.2024.3390597","url":null,"abstract":"Deep neural network (DNN) based machine perception frameworks process the entire input in a one-shot manner to provide answers to both “\u0000<italic>what\u0000 object is being observed” and “\u0000<italic>where\u0000 it is located.” In contrast, the \u0000<italic>“two-stream hypothesis”\u0000 from neuroscience explains the neural processing in the human visual cortex as an active vision system that utilizes two separate regions of the brain to answer the \u0000<italic>what\u0000 and the \u0000<italic>where\u0000 questions. In this work, we propose a machine learning framework inspired by the \u0000<italic>“two-stream hypothesis”\u0000 and explore the potential benefits that it offers. Specifically, the proposed framework models the following mechanisms: 1) ventral (\u0000<italic>what\u0000) stream focusing on the input regions perceived by the fovea part of an eye (foveation); 2) dorsal (\u0000<italic>where\u0000) stream providing visual guidance; and 3) iterative processing of the two streams to calibrate visual focus and process the sequence of focused image patches. The training of the proposed framework is accomplished by label-based DNN training for the ventral stream model and reinforcement learning (RL) for the dorsal stream model. We show that the two-stream foveation-based learning is applicable to the challenging task of weakly-supervised object localization (WSOL), where the training data is limited to the object class or its attributes. The framework is capable of both predicting the properties of an object \u0000<italic>and\u0000 successfully localizing it by predicting its bounding box. We also show that, due to the independent nature of the two streams, the dorsal model can be applied on its own to unseen images to localize objects from different datasets.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 5","pages":"1843-1860"},"PeriodicalIF":5.0,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140615331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cognitive Assessment of Scientific Creative Skill by Brain-Connectivity Analysis Using Graph Convolutional Interval Type-2 Fuzzy Network 利用图卷积-间隔-2 型模糊网络的脑连接性分析对科学创新技能进行认知评估

IF 5 3区计算机科学

IEEE Transactions on Cognitive and Developmental Systems Pub Date : 2024-04-16 DOI: 10.1109/TCDS.2024.3390005

Sayantani Ghosh;Amit Konar;Atulya K. Nagar

{"title":"Cognitive Assessment of Scientific Creative Skill by Brain-Connectivity Analysis Using Graph Convolutional Interval Type-2 Fuzzy Network","authors":"Sayantani Ghosh;Amit Konar;Atulya K. Nagar","doi":"10.1109/TCDS.2024.3390005","DOIUrl":"10.1109/TCDS.2024.3390005","url":null,"abstract":"Scientific creativity refers to natural/automated genesis of innovations in science, propelling scientific, technological, industrial, and/or societal progress. Mental paper folding (MPF) requires spatial reasoning, which is an important attribute to determine creative potential of people. The article proposes a novel approach to determine creative potential of people from their brain-connectivity network (BCN) during their participation in MPF tasks using functional near-infrared spectroscopy (fNIRS). The work involves three phases. The first phase includes construction of BCN using Pearson's correlation method. The centrality features of the nodes in the network are assessed in the second phase and transferred to a proposed graph convolutional-interval type-2 fuzzy network (GC-IT2FN) in the third phase to classify the creative potential of individuals in four grades. The novelty of the work includes: 1) a novel self-attention mechanism in the network to guide graph convolution layers to focus on the most relevant nodes; 2) selection of a new activation function, Logish, after graph convolution to enhance classifier accuracy; and 3) utilizing the promising region in the footprint of uncertainty (FOU) of the used fuzzy sets of IT2FN-based classifier to reduce the effect of uncertainty in brain data on classifier performance. Experiments conducted demonstrate the efficacy of the proposed framework in contrast to traditional approaches.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 5","pages":"1872-1886"},"PeriodicalIF":5.0,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140615380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Spatiotemporal Feature Enhancement Network for Blur Robust Underwater Object Detection 用于模糊鲁棒水下物体检测的时空特征增强网络

IF 5 3区计算机科学

IEEE Transactions on Cognitive and Developmental Systems Pub Date : 2024-04-12 DOI: 10.1109/TCDS.2024.3386664

Hao Zhou;Lu Qi;Hai Huang;Xu Yang;Jing Yang

{"title":"Spatiotemporal Feature Enhancement Network for Blur Robust Underwater Object Detection","authors":"Hao Zhou;Lu Qi;Hai Huang;Xu Yang;Jing Yang","doi":"10.1109/TCDS.2024.3386664","DOIUrl":"10.1109/TCDS.2024.3386664","url":null,"abstract":"Underwater object detection is challenged by the presence of image blur induced by light absorption and scattering, resulting in substantial performance degradation. It is hypothesized that the attenuation of light is directly correlated with the camera-to-object distance, manifesting as variable degrees of image blur across different regions within underwater images. Specifically, regions in close proximity to the camera exhibit less pronounced blur compared to distant regions. Within the same object category, objects situated in clear regions share similar feature embeddings with their counterparts in blurred regions. This observation underscores the potential for leveraging objects in clear regions to aid in the detection of objects within blurred areas, a critical requirement for autonomous agents, such as autonomous underwater vehicles, engaged in continuous underwater object detection. Motivated by this insight, we introduce the spatiotemporal feature enhancement network (STFEN), a novel framework engineered to autonomously extract discriminative features from objects in clear regions. These features are then harnessed to enhance the representations of objects in blurred regions, operating across both spatial and temporal dimensions. Notably, the proposed STFEN seamlessly integrates into two-stage detectors, such as the faster region-based convolutional neural networks (Faster R-CNN) and feature pyramid networks (FPN). Extensive experimentation conducted on two benchmark underwater datasets, URPC 2018 and URPC 2019, conclusively demonstrates the efficacy of the STFEN framework. It delivers substantial enhancements in performance relative to baseline methods, yielding improvements in the mAP evaluation metric ranging from 3.7% to 5.0%.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 5","pages":"1814-1828"},"PeriodicalIF":5.0,"publicationDate":"2024-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140593967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0