Frontiers in Neurorobotics最新文献

筛选
英文 中文
A scalable multi-modal learning fruit detection algorithm for dynamic environments.
IF 2.6 4区 计算机科学
Frontiers in Neurorobotics Pub Date : 2025-02-07 eCollection Date: 2024-01-01 DOI: 10.3389/fnbot.2024.1518878
Liang Mao, Zihao Guo, Mingzhe Liu, Yue Li, Linlin Wang, Jie Li
{"title":"A scalable multi-modal learning fruit detection algorithm for dynamic environments.","authors":"Liang Mao, Zihao Guo, Mingzhe Liu, Yue Li, Linlin Wang, Jie Li","doi":"10.3389/fnbot.2024.1518878","DOIUrl":"10.3389/fnbot.2024.1518878","url":null,"abstract":"<p><strong>Introduction: </strong>To enhance the detection of litchi fruits in natural scenes, address challenges such as dense occlusion and small target identification, this paper proposes a novel multimodal target detection method, denoted as YOLOv5-Litchi.</p><p><strong>Methods: </strong>Initially, the Neck layer network of YOLOv5s is simplified by changing its FPN+PAN structure to an FPN structure and increasing the number of detection heads from 3 to 5. Additionally, the detection heads with resolutions of 80 × 80 pixels and 160 × 160 pixels are replaced by TSCD detection heads to enhance the model's ability to detect small targets. Subsequently, the positioning loss function is replaced with the EIoU loss function, and the confidence loss is substituted by VFLoss to further improve the accuracy of the detection bounding box and reduce the missed detection rate in occluded targets. A sliding slice method is then employed to predict image targets, thereby reducing the miss rate of small targets.</p><p><strong>Results: </strong>Experimental results demonstrate that the proposed model improves accuracy, recall, and mean average precision (mAP) by 9.5, 0.9, and 12.3 percentage points, respectively, compared to the original YOLOv5s model. When benchmarked against other models such as YOLOx, YOLOv6, and YOLOv8, the proposed model's AP value increases by 4.0, 6.3, and 3.7 percentage points, respectively.</p><p><strong>Discussion: </strong>The improved network exhibits distinct improvements, primarily focusing on enhancing the recall rate and AP value, thereby reducing the missed detection rate which exhibiting a reduced number of missed targets and a more accurate prediction frame, indicating its suitability for litchi fruit detection. Therefore, this method significantly enhances the detection accuracy of mature litchi fruits and effectively addresses the challenges of dense occlusion and small target detection, providing crucial technical support for subsequent litchi yield estimation.</p>","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"18 ","pages":"1518878"},"PeriodicalIF":2.6,"publicationDate":"2025-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11841473/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143467727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Construction of multi-robot platform based on dobot robots.
IF 2.6 4区 计算机科学
Frontiers in Neurorobotics Pub Date : 2025-02-05 eCollection Date: 2025-01-01 DOI: 10.3389/fnbot.2025.1550787
Jinchi Han, Duojicairang Ma
{"title":"Construction of multi-robot platform based on dobot robots.","authors":"Jinchi Han, Duojicairang Ma","doi":"10.3389/fnbot.2025.1550787","DOIUrl":"10.3389/fnbot.2025.1550787","url":null,"abstract":"<p><p>For the researches of cooperative control scheme for multirobot systems, this paper sets up an experimental platform based on dobot robots, which can be used to perform physical experiments to verify related schemes. A distributed scheme is proposed to achieve cooperative control for multirobot systems. Simulation results prove the effectiveness of the distributed scheme. Then, the experimental platform based on dobot robots is built to verify the proposed scheme. Specifically, a computer sends data to the microcontroller inside the host through WiFi communication, then the host distributes data to the slaves. Finally, the physical experiment of related schemes is performed on the experimental platform. Comparing the simulations with the physical experiments, the task is successfully completed on this experimental platform, which proves the effectiveness of the scheme and the feasibility of the platform. The experimental platform developed in this paper possesses the capability to validate various schemes and exhibits strong expandability and practicality.</p>","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"19 ","pages":"1550787"},"PeriodicalIF":2.6,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11835969/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143457683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Noise-immune zeroing neural dynamics for dynamic signal source localization system and robotic applications in the presence of noise.
IF 2.6 4区 计算机科学
Frontiers in Neurorobotics Pub Date : 2025-02-05 eCollection Date: 2025-01-01 DOI: 10.3389/fnbot.2025.1546731
Yuxin Zhao, Jiahao Wu, Mianjie Zheng
{"title":"Noise-immune zeroing neural dynamics for dynamic signal source localization system and robotic applications in the presence of noise.","authors":"Yuxin Zhao, Jiahao Wu, Mianjie Zheng","doi":"10.3389/fnbot.2025.1546731","DOIUrl":"10.3389/fnbot.2025.1546731","url":null,"abstract":"<p><p>Time angle of arrival (AoA) and time difference of arrival (TDOA) are two widely used methods for solving dynamic signal source localization (DSSL) problems, where the position of a moving target is determined by measuring the angle and time difference of the signal's arrival, respectively. In robotic manipulator applications, accurate and real-time joint information is crucial for tasks such as trajectory tracking and visual servoing. However, signal propagation and acquisition are susceptible to noise interference, which poses challenges for real-time systems. To address this issue, a noise-immune zeroing neural dynamics (NIZND) model is proposed. The NIZND model is a brain-inspired algorithm that incorporates an integral term and an activation function into the traditional zeroing neural dynamics (ZND) model, designed to effectively mitigate noise interference during localization tasks. Theoretical analysis confirms that the proposed NIZND model exhibits global convergence and high precision under noisy conditions. Simulation experiments demonstrate the robustness and effectiveness of the NIZND model in comparison to traditional DSSL-solving schemes and in a trajectory tracking scheme for robotic manipulators. The NIZND model offers a promising solution to the challenge of accurate localization in noisy environments, ensuring both high precision and effective noise suppression. The experimental results highlight its superiority in real-time applications where noise interference is prevalent.</p>","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"19 ","pages":"1546731"},"PeriodicalIF":2.6,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11835927/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143457686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LoCS-Net: Localizing convolutional spiking neural network for fast visual place recognition.
IF 2.6 4区 计算机科学
Frontiers in Neurorobotics Pub Date : 2025-01-29 eCollection Date: 2024-01-01 DOI: 10.3389/fnbot.2024.1490267
Ugur Akcal, Ivan Georgiev Raikov, Ekaterina Dmitrievna Gribkova, Anwesa Choudhuri, Seung Hyun Kim, Mattia Gazzola, Rhanor Gillette, Ivan Soltesz, Girish Chowdhary
{"title":"LoCS-Net: Localizing convolutional spiking neural network for fast visual place recognition.","authors":"Ugur Akcal, Ivan Georgiev Raikov, Ekaterina Dmitrievna Gribkova, Anwesa Choudhuri, Seung Hyun Kim, Mattia Gazzola, Rhanor Gillette, Ivan Soltesz, Girish Chowdhary","doi":"10.3389/fnbot.2024.1490267","DOIUrl":"10.3389/fnbot.2024.1490267","url":null,"abstract":"<p><p>Visual place recognition (VPR) is the ability to recognize locations in a physical environment based only on visual inputs. It is a challenging task due to perceptual aliasing, viewpoint and appearance variations and complexity of dynamic scenes. Despite promising demonstrations, many state-of-the-art (SOTA) VPR approaches based on artificial neural networks (ANNs) suffer from computational inefficiency. However, spiking neural networks (SNNs) implemented on neuromorphic hardware are reported to have remarkable potential for more efficient solutions computationally. Still, training SOTA SNNs for VPR is often intractable on large and diverse datasets, and they typically demonstrate poor real-time operation performance. To address these shortcomings, we developed an end-to-end convolutional SNN model for VPR that leverages backpropagation for tractable training. Rate-based approximations of leaky integrate-and-fire (LIF) neurons are employed during training, which are then replaced with spiking LIF neurons during inference. The proposed method significantly outperforms existing SOTA SNNs on challenging datasets like Nordland and Oxford RobotCar, achieving 78.6% precision at 100% recall on the Nordland dataset (compared to 73.0% from the current SOTA) and 45.7% on the Oxford RobotCar dataset (compared to 20.2% from the current SOTA). Our approach offers a simpler training pipeline while yielding significant improvements in both training and inference times compared to SOTA SNNs for VPR. Hardware-in-the-loop tests using Intel's neuromorphic USB form factor, Kapoho Bay, show that our on-chip spiking models for VPR trained via the ANN-to-SNN conversion strategy continue to outperform their SNN counterparts, despite a slight but noticeable decrease in performance when transitioning from off-chip to on-chip, while offering significant energy efficiency. The results highlight the outstanding rapid prototyping and real-world deployment capabilities of this approach, showing it to be a substantial step toward more prevalent SNN-based real-world robotics solutions.</p>","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"18 ","pages":"1490267"},"PeriodicalIF":2.6,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11813887/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143407057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Privacy-preserving ADP for secure tracking control of AVRs against unreliable communication.
IF 2.6 4区 计算机科学
Frontiers in Neurorobotics Pub Date : 2025-01-29 eCollection Date: 2025-01-01 DOI: 10.3389/fnbot.2025.1549414
Kun Zhang, Kezhen Han, Zhijian Hu, Guoqiang Tan
{"title":"Privacy-preserving ADP for secure tracking control of AVRs against unreliable communication.","authors":"Kun Zhang, Kezhen Han, Zhijian Hu, Guoqiang Tan","doi":"10.3389/fnbot.2025.1549414","DOIUrl":"10.3389/fnbot.2025.1549414","url":null,"abstract":"<p><p>In this study, we developed an encrypted guaranteed-cost tracking control scheme for autonomous vehicles or robots (AVRs), by using the adaptive dynamic programming technique. To construct the tracking dynamics under unreliable communication, the AVR's motion is analyzed. To mitigate information leakage and unauthorized access in vehicular network systems, an encrypted guaranteed-cost policy iteration algorithm is developed, incorporating encryption and decryption schemes between the vehicle and the cloud based on the tracking dynamics. Building on a simplified single-network framework, the Hamilton-Jacobi-Bellman equation is approximately solved, avoiding the complexity of dual-network structures and reducing the computational costs. The input-constrained issue is successfully handled using a non-quadratic value function. Furthermore, the approximate optimal control is verified to stabilize the tracking system. A case study involving an AVR system validates the effectiveness and practicality of the proposed algorithm.</p>","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"19 ","pages":"1549414"},"PeriodicalIF":2.6,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11813875/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143407034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NavBLIP: a visual-language model for enhancing unmanned aerial vehicles navigation and object detection.
IF 2.6 4区 计算机科学
Frontiers in Neurorobotics Pub Date : 2025-01-24 eCollection Date: 2024-01-01 DOI: 10.3389/fnbot.2024.1513354
Ye Li, Li Yang, Meifang Yang, Fei Yan, Tonghua Liu, Chensi Guo, Rufeng Chen
{"title":"NavBLIP: a visual-language model for enhancing unmanned aerial vehicles navigation and object detection.","authors":"Ye Li, Li Yang, Meifang Yang, Fei Yan, Tonghua Liu, Chensi Guo, Rufeng Chen","doi":"10.3389/fnbot.2024.1513354","DOIUrl":"10.3389/fnbot.2024.1513354","url":null,"abstract":"<p><strong>Introduction: </strong>In recent years, Unmanned Aerial Vehicles (UAVs) have increasingly been deployed in various applications such as autonomous navigation, surveillance, and object detection. Traditional methods for UAV navigation and object detection have often relied on either handcrafted features or unimodal deep learning approaches. While these methods have seen some success, they frequently encounter limitations in dynamic environments, where robustness and computational efficiency become critical for real-time performance. Additionally, these methods often fail to effectively integrate multimodal inputs, which restricts their adaptability and generalization capabilities when facing complex and diverse scenarios.</p><p><strong>Methods: </strong>To address these challenges, we introduce NavBLIP, a novel visual-language model specifically designed to enhance UAV navigation and object detection by utilizing multimodal data. NavBLIP incorporates transfer learning techniques along with a Nuisance-Invariant Multimodal Feature Extraction (NIMFE) module. The NIMFE module plays a key role in disentangling relevant features from intricate visual and environmental inputs, allowing UAVs to swiftly adapt to new environments and improve object detection accuracy. Furthermore, NavBLIP employs a multimodal control strategy that dynamically selects context-specific features to optimize real-time performance, ensuring efficiency in high-stakes operations.</p><p><strong>Results and discussion: </strong>Extensive experiments on benchmark datasets such as RefCOCO, CC12M, and Openlmages reveal that NavBLIP outperforms existing state-of-the-art models in terms of accuracy, recall, and computational efficiency. Additionally, our ablation study emphasizes the significance of the NIMFE and transfer learning components in boosting the model's performance, underscoring NavBLIP's potential for real-time UAV applications where adaptability and computational efficiency are paramount.</p>","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"18 ","pages":"1513354"},"PeriodicalIF":2.6,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11802496/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143382200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Brain-inspired multimodal motion and fine-grained action recognition.
IF 2.6 4区 计算机科学
Frontiers in Neurorobotics Pub Date : 2025-01-24 eCollection Date: 2024-01-01 DOI: 10.3389/fnbot.2024.1502071
Yuening Li, Xiuhua Yang, Changkui Chen
{"title":"Brain-inspired multimodal motion and fine-grained action recognition.","authors":"Yuening Li, Xiuhua Yang, Changkui Chen","doi":"10.3389/fnbot.2024.1502071","DOIUrl":"10.3389/fnbot.2024.1502071","url":null,"abstract":"<p><strong>Introduction: </strong>Traditional action recognition methods predominantly rely on a single modality, such as vision or motion, which presents significant limitations when dealing with fine-grained action recognition. These methods struggle particularly with video data containing complex combinations of actions and subtle motion variations.</p><p><strong>Methods: </strong>Typically, they depend on handcrafted feature extractors or simple convolutional neural network (CNN) architectures, which makes effective multimodal fusion challenging. This study introduces a novel architecture called FGM-CLIP (Fine-Grained Motion CLIP) to enhance fine-grained action recognition. FGM-CLIP leverages the powerful capabilities of Contrastive Language-Image Pretraining (CLIP), integrating a fine-grained motion encoder and a multimodal fusion layer to achieve precise end-to-end action recognition. By jointly optimizing visual and motion features, the model captures subtle action variations, resulting in higher classification accuracy in complex video data.</p><p><strong>Results and discussion: </strong>Experimental results demonstrate that FGM-CLIP significantly outperforms existing methods on multiple fine-grained action recognition datasets. Its multimodal fusion strategy notably improves the model's robustness and accuracy, particularly for videos with intricate action patterns.</p>","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"18 ","pages":"1502071"},"PeriodicalIF":2.6,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11802800/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143382178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transformer-based short-term traffic forecasting model considering traffic spatiotemporal correlation.
IF 2.6 4区 计算机科学
Frontiers in Neurorobotics Pub Date : 2025-01-23 eCollection Date: 2025-01-01 DOI: 10.3389/fnbot.2025.1527908
Ande Chang, Yuting Ji, Yiming Bie
{"title":"Transformer-based short-term traffic forecasting model considering traffic spatiotemporal correlation.","authors":"Ande Chang, Yuting Ji, Yiming Bie","doi":"10.3389/fnbot.2025.1527908","DOIUrl":"10.3389/fnbot.2025.1527908","url":null,"abstract":"<p><p>Traffic forecasting is crucial for a variety of applications, including route optimization, signal management, and travel time estimation. However, many existing prediction models struggle to accurately capture the spatiotemporal patterns in traffic data due to its inherent nonlinearity, high dimensionality, and complex dependencies. To address these challenges, a short-term traffic forecasting model, Trafficformer, is proposed based on the Transformer framework. The model first uses a multilayer perceptron to extract features from historical traffic data, then enhances spatial interactions through Transformer-based encoding. By incorporating road network topology, a spatial mask filters out noise and irrelevant interactions, improving prediction accuracy. Finally, traffic speed is predicted using another multilayer perceptron. In the experiments, Trafficformer is evaluated on the Seattle Loop Detector dataset. It is compared with six baseline methods, with Mean Absolute Error, Mean Absolute Percentage Error, and Root Mean Square Error used as metrics. The results show that Trafficformer not only has higher prediction accuracy, but also can effectively identify key sections, and has great potential in intelligent traffic control optimization and refined traffic resource allocation.</p>","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"19 ","pages":"1527908"},"PeriodicalIF":2.6,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11799296/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143364427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graph Convolutional Networks for multi-modal robotic martial arts leg pose recognition.
IF 2.6 4区 计算机科学
Frontiers in Neurorobotics Pub Date : 2025-01-20 eCollection Date: 2024-01-01 DOI: 10.3389/fnbot.2024.1520983
Shun Yao, Yihan Ping, Xiaoyu Yue, He Chen
{"title":"Graph Convolutional Networks for multi-modal robotic martial arts leg pose recognition.","authors":"Shun Yao, Yihan Ping, Xiaoyu Yue, He Chen","doi":"10.3389/fnbot.2024.1520983","DOIUrl":"10.3389/fnbot.2024.1520983","url":null,"abstract":"<p><strong>Introduction: </strong>Accurate recognition of martial arts leg poses is essential for applications in sports analytics, rehabilitation, and human-computer interaction. Traditional pose recognition models, relying on sequential or convolutional approaches, often struggle to capture the complex spatial-temporal dependencies inherent in martial arts movements. These methods lack the ability to effectively model the nuanced dynamics of joint interactions and temporal progression, leading to limited generalization in recognizing complex actions.</p><p><strong>Methods: </strong>To address these challenges, we propose PoseGCN, a Graph Convolutional Network (GCN)-based model that integrates spatial, temporal, and contextual features through a novel framework. PoseGCN leverages spatial-temporal graph encoding to capture joint motion dynamics, an action-specific attention mechanism to assign importance to relevant joints depending on the action context, and a self-supervised pretext task to enhance temporal robustness and continuity. Experimental results on four benchmark datasets-Kinetics-700, Human3.6M, NTU RGB+D, and UTD-MHAD-demonstrate that PoseGCN outperforms existing models, achieving state-of-the-art accuracy and F1 scores.</p><p><strong>Results and discussion: </strong>These findings highlight the model's capacity to generalize across diverse datasets and capture fine-grained pose details, showcasing its potential in advancing complex pose recognition tasks. The proposed framework offers a robust solution for precise action recognition and paves the way for future developments in multi-modal pose analysis.</p>","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"18 ","pages":"1520983"},"PeriodicalIF":2.6,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11792168/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143188921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improved object detection method for autonomous driving based on DETR. 基于 DETR 的改进型自动驾驶物体检测方法。
IF 2.6 4区 计算机科学
Frontiers in Neurorobotics Pub Date : 2025-01-20 eCollection Date: 2024-01-01 DOI: 10.3389/fnbot.2024.1484276
Huaqi Zhao, Songnan Zhang, Xiang Peng, Zhengguang Lu, Guojing Li
{"title":"Improved object detection method for autonomous driving based on DETR.","authors":"Huaqi Zhao, Songnan Zhang, Xiang Peng, Zhengguang Lu, Guojing Li","doi":"10.3389/fnbot.2024.1484276","DOIUrl":"10.3389/fnbot.2024.1484276","url":null,"abstract":"<p><p>Object detection is a critical component in the development of autonomous driving technology and has demonstrated significant growth potential. To address the limitations of current techniques, this paper presents an improved object detection method for autonomous driving based on a detection transformer (DETR). First, we introduce a multi-scale feature and location information extraction method, which solves the inadequacy of the model for multi-scale object localization and detection. In addition, we developed a transformer encoder based on the group axial attention mechanism. This allows for efficient attention range control in the horizontal and vertical directions while reducing computation, ultimately enhancing the inference speed. Furthermore, we propose a novel dynamic hyperparameter tuning training method based on Pareto efficiency, which coordinates the training state of the loss functions through dynamic weights, overcoming issues associated with manually setting fixed weights and enhancing model convergence speed and accuracy. Experimental results demonstrate that the proposed method surpasses others, with improvements of 3.3%, 4.5%, and 3% in average precision on the COCO, PASCAL VOC, and KITTI datasets, respectively, and an 84% increase in FPS.</p>","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"18 ","pages":"1484276"},"PeriodicalIF":2.6,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11788285/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143122709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信