Research on target detection method based on attention mechanism and reinforcement learning

International Conference on Artificial Intelligence, Virtual Reality, and Visualization Pub Date : 2023-03-01 DOI:10.1117/12.2668537

Q. Wang, Chenxi Xu, Hongwei Du, Yuxuan Liu, Yang Liu, Yujia Fu, Kai Li, Haobin Shi

{"title":"Research on target detection method based on attention mechanism and reinforcement learning","authors":"Q. Wang, Chenxi Xu, Hongwei Du, Yuxuan Liu, Yang Liu, Yujia Fu, Kai Li, Haobin Shi","doi":"10.1117/12.2668537","DOIUrl":null,"url":null,"abstract":"The development of intelligent manufacturing promotes the intellectualization of traditional navigation technology. Because actor-critic (AC) algorithm is difficult to converge in the actual application process, this paper uses the optimization algorithm of this method, which is called deep deterministic policy gradient (DDPG). Through the use of experience playback and dual network design, the learning rate can be greatly improved compared with the original algorithm. Because curiosity strategy has more advantages in alleviating sparse reward problem, this paper also takes curiosity mechanism as an internal reward exploration strategy and proposes the DDPG method based on improved curiosity mechanism to solve the problem that robots lack external reward in some complex environments and tasks cannot be completed. The simulation and real experiment results show that the proposed method is more stable when completing the navigation task and performs well in the long-distance autonomous navigation task.","PeriodicalId":137914,"journal":{"name":"International Conference on Artificial Intelligence, Virtual Reality, and Visualization","volume":"742 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Artificial Intelligence, Virtual Reality, and Visualization","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2668537","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The development of intelligent manufacturing promotes the intellectualization of traditional navigation technology. Because actor-critic (AC) algorithm is difficult to converge in the actual application process, this paper uses the optimization algorithm of this method, which is called deep deterministic policy gradient (DDPG). Through the use of experience playback and dual network design, the learning rate can be greatly improved compared with the original algorithm. Because curiosity strategy has more advantages in alleviating sparse reward problem, this paper also takes curiosity mechanism as an internal reward exploration strategy and proposes the DDPG method based on improved curiosity mechanism to solve the problem that robots lack external reward in some complex environments and tasks cannot be completed. The simulation and real experiment results show that the proposed method is more stable when completing the navigation task and performs well in the long-distance autonomous navigation task.

查看原文本刊更多论文

基于注意机制和强化学习的目标检测方法研究

智能制造的发展促进了传统导航技术的智能化。由于actor-critic (AC)算法在实际应用过程中难以收敛，本文采用了该方法的优化算法，称为深度确定性策略梯度(deep deterministic policy gradient, DDPG)。通过使用经验回放和双网络设计，与原算法相比，学习率大大提高。由于好奇心策略在缓解稀疏奖励问题上更有优势，本文也将好奇心机制作为一种内部奖励探索策略，提出了基于改进好奇心机制的DDPG方法来解决机器人在一些复杂环境中缺乏外部奖励而无法完成任务的问题。仿真和实际实验结果表明，该方法在完成导航任务时更加稳定，在长距离自主导航任务中表现良好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Conference on Artificial Intelligence, Virtual Reality, and Visualization

自引率

0.00%

发文量