{"title":"Fine-grained image classification method based on hybrid attention module","authors":"Weixiang Lu, Ying Yang, Lei Yang","doi":"10.3389/fnbot.2024.1391791","DOIUrl":"https://doi.org/10.3389/fnbot.2024.1391791","url":null,"abstract":"To efficiently capture feature information in tasks of fine-grained image classification, this study introduces a new network model for fine-grained image classification, which utilizes a hybrid attention approach. The model is built upon a hybrid attention module (MA), and with the assistance of the attention erasure module (EA), it can adaptively enhance the prominent areas in the image and capture more detailed image information. Specifically, for tasks involving fine-grained image classification, this study designs an attention module capable of applying the attention mechanism to both the channel and spatial dimensions. This highlights the important regions and key feature channels in the image, allowing for the extraction of distinct local features. Furthermore, this study presents an attention erasure module (EA) that can remove significant areas in the image based on the features identified; thus, shifting focus to additional feature details within the image and improving the diversity and completeness of the features. Moreover, this study enhances the pooling layer of ResNet50 to augment the perceptual region and the capability to extract features from the network’s less deep layers. For the objective of fine-grained image classification, this study extracts a variety of features and merges them effectively to create the final feature representation. To assess the effectiveness of the proposed model, experiments were conducted on three publicly available fine-grained image classification datasets: Stanford Cars, FGVC-Aircraft, and CUB-200–2011. The method achieved classification accuracies of 92.8, 94.0, and 88.2% on these datasets, respectively. In comparison with existing approaches, the efficiency of this method has significantly improved, demonstrating higher accuracy and robustness.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"89 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140841370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fusion inception and transformer network for continuous estimation of finger kinematics from surface electromyography","authors":"Chuang Lin, Xiaobing Zhang","doi":"10.3389/fnbot.2024.1305605","DOIUrl":"https://doi.org/10.3389/fnbot.2024.1305605","url":null,"abstract":"Decoding surface electromyography (sEMG) to recognize human movement intentions enables us to achieve stable, natural and consistent control in the field of human computer interaction (HCI). In this paper, we present a novel deep learning (DL) model, named fusion inception and transformer network (FIT), which effectively models both local and global information on sequence data by fully leveraging the capabilities of Inception and Transformer networks. In the publicly available Ninapro dataset, we selected surface EMG signals from six typical hand grasping maneuvers in 10 subjects for predicting the values of the 10 most important joint angles in the hand. Our model’s performance, assessed through Pearson’s correlation coefficient (PCC), root mean square error (RMSE), and R-squared (R<jats:sup>2</jats:sup>) metrics, was compared with temporal convolutional network (TCN), long short-term memory network (LSTM), and bidirectional encoder representation from transformers model (BERT). Additionally, we also calculate the training time and the inference time of the models. The results show that FIT is the most performant, with excellent estimation accuracy and low computational cost. Our model contributes to the development of HCI technology and has significant practical value.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"1 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140841728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mining local and global spatiotemporal features for tactile object recognition","authors":"Xiaoliang Qian, Wei Deng, Wei Wang, Yucui Liu, Liying Jiang","doi":"10.3389/fnbot.2024.1387428","DOIUrl":"https://doi.org/10.3389/fnbot.2024.1387428","url":null,"abstract":"The tactile object recognition (TOR) is highly important for environmental perception of robots. The previous works usually utilize single scale convolution which cannot simultaneously extract local and global spatiotemporal features of tactile data, which leads to low accuracy in TOR task. To address above problem, this article proposes a local and global residual (LGR-18) network which is mainly consisted of multiple local and global convolution (LGC) blocks. An LGC block contains two pairs of local convolution (LC) and global convolution (GC) modules. The LC module mainly utilizes a temporal shift operation and a 2D convolution layer to extract local spatiotemporal features. The GC module extracts global spatiotemporal features by fusing multiple 1D and 2D convolutions which can expand the receptive field in temporal and spatial dimensions. Consequently, our LGR-18 network can extract local-global spatiotemporal features without using 3D convolutions which usually require a large number of parameters. The effectiveness of LC module, GC module and LGC block is verified by ablation studies. Quantitative comparisons with state-of-the-art methods reveal the excellent capability of our method.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"27 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140841107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shaoqi Yu, Yintong Wang, Lili Chen, Xiaolin Zhang, Jiamao Li
{"title":"3D hand pose and mesh estimation via a generic Topology-aware Transformer model","authors":"Shaoqi Yu, Yintong Wang, Lili Chen, Xiaolin Zhang, Jiamao Li","doi":"10.3389/fnbot.2024.1395652","DOIUrl":"https://doi.org/10.3389/fnbot.2024.1395652","url":null,"abstract":"In Human-Robot Interaction (HRI), accurate 3D hand pose and mesh estimation hold critical importance. However, inferring reasonable and accurate poses in severe self-occlusion and high self-similarity remains an inherent challenge. In order to alleviate the ambiguity caused by invisible and similar joints during HRI, we propose a new Topology-aware Transformer network named HandGCNFormer with depth image as input, incorporating prior knowledge of hand kinematic topology into the network while modeling long-range contextual information. Specifically, we propose a novel Graphformer decoder with an additional Node-offset Graph Convolutional layer (NoffGConv). The Graphformer decoder optimizes the synergy between the Transformer and GCN, capturing long-range dependencies and local topological connections between joints. On top of that, we replace the standard MLP prediction head with a novel Topology-aware head to better exploit local topological constraints for more reasonable and accurate poses. Our method achieves state-of-the-art 3D hand pose estimation performance on four challenging datasets, including Hands2017, NYU, ICVL, and MSRA. To further demonstrate the effectiveness and scalability of our proposed Graphformer Decoder and Topology aware head, we extend our framework to HandGCNFormer-Mesh for the 3D hand mesh estimation task. The extended framework efficiently integrates a shape regressor with the original Graphformer Decoder and Topology aware head, producing Mano parameters. The results on the HO-3D dataset, which contains various and challenging occlusions, show that our HandGCNFormer-Mesh achieves competitive results compared to previous state-of-the-art 3D hand mesh estimation methods.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"45 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140841369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Curiosity model policy optimization for robotic manipulator tracking control with input saturation in uncertain environment","authors":"Tu Wang, Fujie Wang, Zhongye Xie, Feiyan Qin","doi":"10.3389/fnbot.2024.1376215","DOIUrl":"https://doi.org/10.3389/fnbot.2024.1376215","url":null,"abstract":"In uncertain environments with robot input saturation, both model-based reinforcement learning (MBRL) and traditional controllers struggle to perform control tasks optimally. In this study, an algorithmic framework of Curiosity Model Policy Optimization (CMPO) is proposed by combining curiosity and model-based approach, where tracking errors are reduced via training agents on control gains for traditional model-free controllers. To begin with, a metric for judging positive and negative curiosity is proposed. Constrained optimization is employed to update the curiosity ratio, which improves the efficiency of agent training. Next, the novelty distance buffer ratio is defined to reduce bias between the environment and the model. Finally, CMPO is simulated with traditional controllers and baseline MBRL algorithms in the robotic environment designed with non-linear rewards. The experimental results illustrate that the algorithm achieves superior tracking performance and generalization capabilities.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"12 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140841364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huaqi Zhao, Xiang Peng, Su Wang, Jun-Bao Li, Jeng-Shyang Pan, Xiaoguang Su, Xiaomin Liu
{"title":"Improved object detection method for unmanned driving based on Transformers","authors":"Huaqi Zhao, Xiang Peng, Su Wang, Jun-Bao Li, Jeng-Shyang Pan, Xiaoguang Su, Xiaomin Liu","doi":"10.3389/fnbot.2024.1342126","DOIUrl":"https://doi.org/10.3389/fnbot.2024.1342126","url":null,"abstract":"The object detection method serves as the core technology within the unmanned driving perception module, extensively employed for detecting vehicles, pedestrians, traffic signs, and various objects. However, existing object detection methods still encounter three challenges in intricate unmanned driving scenarios: unsatisfactory performance in multi-scale object detection, inadequate accuracy in detecting small objects, and occurrences of false positives and missed detections in densely occluded environments. Therefore, this study proposes an improved object detection method for unmanned driving, leveraging Transformer architecture to address these challenges. First, a multi-scale Transformer feature extraction method integrated with channel attention is used to enhance the network's capability in extracting features across different scales. Second, a training method incorporating Query Denoising with Gaussian decay was employed to enhance the network's proficiency in learning representations of small objects. Third, a hybrid matching method combining Optimal Transport and Hungarian algorithms was used to facilitate the matching process between predicted and actual values, thereby enriching the network with more informative positive sample features. Experimental evaluations conducted on datasets including KITTI demonstrate that the proposed method achieves 3% higher mean Average Precision (mAP) than that of the existing methodologies.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"107 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140841101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhengchao Zhang, Lianke Zhou, Yuyang Wu, Nianbin Wang
{"title":"The meta-learning method for the ensemble model based on situational meta-task","authors":"Zhengchao Zhang, Lianke Zhou, Yuyang Wu, Nianbin Wang","doi":"10.3389/fnbot.2024.1391247","DOIUrl":"https://doi.org/10.3389/fnbot.2024.1391247","url":null,"abstract":"IntroductionThe meta-learning methods have been widely used to solve the problem of few-shot learning. Generally, meta-learners are trained on a variety of tasks and then generalized to novel tasks.MethodsHowever, existing meta-learning methods do not consider the relationship between meta-tasks and novel tasks during the meta-training period, so that initial models of the meta-learner provide less useful meta-knowledge for the novel tasks. This leads to a weak generalization ability on novel tasks. Meanwhile, different initial models contain different meta-knowledge, which leads to certain differences in the learning effect of novel tasks during the meta-testing period. Therefore, this article puts forward a meta-optimization method based on situational meta-task construction and cooperation of multiple initial models. First, during the meta-training period, a method of constructing situational meta-task is proposed, and the selected candidate task sets provide more effective meta-knowledge for novel tasks. Then, during the meta-testing period, an ensemble model method based on meta-optimization is proposed to minimize the loss of inter-model cooperation in prediction, so that multiple models cooperation can realize the learning of novel tasks.ResultsThe above-mentioned methods are applied to popular few-shot character datasets and image recognition datasets. Furthermore, the experiment results indicate that the proposed method achieves good effects in few-shot classification tasks.DiscussionIn future work, we will extend our methods to provide more generalized and useful meta-knowledge to the model during the meta-training period when the novel few-shot tasks are completely invisible.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"35 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140806225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yueguang Ge, Shaolin Zhang, Yinghao Cai, Tao Lu, Haitao Wang, Xiaolong Hui, Shuo Wang
{"title":"Ontology based autonomous robot task processing framework","authors":"Yueguang Ge, Shaolin Zhang, Yinghao Cai, Tao Lu, Haitao Wang, Xiaolong Hui, Shuo Wang","doi":"10.3389/fnbot.2024.1401075","DOIUrl":"https://doi.org/10.3389/fnbot.2024.1401075","url":null,"abstract":"IntroductionIn recent years, the perceptual capabilities of robots have been significantly enhanced. However, the task execution of the robots still lacks adaptive capabilities in unstructured and dynamic environments.MethodsIn this paper, we propose an ontology based autonomous robot task processing framework (ARTProF), to improve the robot's adaptability within unstructured and dynamic environments. ARTProF unifies ontological knowledge representation, reasoning, and autonomous task planning and execution into a single framework. The interface between the knowledge base and neural network-based object detection is first introduced in ARTProF to improve the robot's perception capabilities. A knowledge-driven manipulation operator based on Robot Operating System (ROS) is then designed to facilitate the interaction between the knowledge base and the robot's primitive actions. Additionally, an operation similarity model is proposed to endow the robot with the ability to generalize to novel objects. Finally, a dynamic task planning algorithm, leveraging ontological knowledge, equips the robot with adaptability to execute tasks in unstructured and dynamic environments.ResultsExperimental results on real-world scenarios and simulations demonstrate the effectiveness and efficiency of the proposed ARTProF framework.DiscussionIn future work, we will focus on refining the ARTProF framework by integrating neurosymbolic inference.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"115 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140885414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xudong Zhang, Yunlong Ge, Yifeng Wang, Jun Wang, Wenhao Wang, Lijun Lu
{"title":"Residual learning-based robotic image analysis model for low-voltage distributed photovoltaic fault identification and positioning","authors":"Xudong Zhang, Yunlong Ge, Yifeng Wang, Jun Wang, Wenhao Wang, Lijun Lu","doi":"10.3389/fnbot.2024.1396979","DOIUrl":"https://doi.org/10.3389/fnbot.2024.1396979","url":null,"abstract":"With the fast development of large-scale Photovoltaic (PV) plants, the automatic PV fault identification and positioning have become an important task for the PV intelligent systems, aiming to guarantee the safety, reliability, and productivity of large-scale PV plants. In this paper, we propose a residual learning-based robotic (UAV) image analysis model for low-voltage distributed PV fault identification and positioning. In our target scenario, the unmanned aerial vehicles (UAVs) are deployed to acquire moving images of low-voltage distributed PV power plants. To get desired robustness and accuracy of PV image detection, we integrate residual learning with attention mechanism into the UAV image analysis model based on you only look once v4 (YOLOv4) network. Then, we design the sophisticated multi-scale spatial pyramid fusion and use it to optimize the YOLOv4 network for the nuanced task of fault localization within PV arrays, where the Complete-IOU loss is incorporated in the predictive modeling phase, significantly enhancing the accuracy and efficiency of fault detection. A series of experimental comparisons in terms of the accuracy of fault positioning are conducted, and the experimental results verify the feasibility and effectiveness of the proposed model in dealing with the safety and reliability maintenance of low-voltage distributed PV systems.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"19 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140635325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hybrid knowledge transfer for MARL based on action advising and experience sharing","authors":"Feng Liu, Dongqi Li, Jian Gao","doi":"10.3389/fnbot.2024.1364587","DOIUrl":"https://doi.org/10.3389/fnbot.2024.1364587","url":null,"abstract":"Multiagent Reinforcement Learning (MARL) has been well adopted due to its exceptional ability to solve multiagent decision-making problems. To further enhance learning efficiency, knowledge transfer algorithms have been developed, among which experience-sharing-based and action-advising-based transfer strategies share the mainstream. However, it is notable that, although there exist many successful applications of both strategies, they are not flawless. For the long-developed action-advising-based methods (namely KT-AA, short for knowledge transfer based on action advising), their data efficiency and scalability are not satisfactory. As for the newly proposed experience-sharing-based knowledge transfer methods (KT-ES), although the shortcomings of KT-AA have been partially overcome, they are incompetent to correct specific bad decisions in the later learning stage. To leverage the superiority of both KT-AA and KT-ES, this study proposes KT-Hybrid, a hybrid knowledge transfer approach. In the early learning phase, KT-ES methods are employed, expecting better data efficiency from KT-ES to enhance the policy to a basic level as soon as possible. Later, we focus on correcting specific errors made by the basic policy, trying to use KT-AA methods to further improve the performance. Simulations demonstrate that the proposed KT-Hybrid outperforms well-received action-advising- and experience-sharing-based methods.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"14 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140885486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}