基于协作任务强化学习的网络智能增强型无人机目标搜索模型

IF 2.5 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

International Journal of Web Information Systems Pub Date : 2024-03-19 DOI:10.1108/ijwis-10-2023-0184

Mingke Gao, Zhenyu Zhang, Jinyuan Zhang, Shihao Tang, Han Zhang, Tao Pang

{"title":"基于协作任务强化学习的网络智能增强型无人机目标搜索模型","authors":"Mingke Gao, Zhenyu Zhang, Jinyuan Zhang, Shihao Tang, Han Zhang, Tao Pang","doi":"10.1108/ijwis-10-2023-0184","DOIUrl":null,"url":null,"abstract":"Purpose\nBecause of the various advantages of reinforcement learning (RL) mentioned above, this study uses RL to train unmanned aerial vehicles to perform two tasks: target search and cooperative obstacle avoidance.\n\nDesign/methodology/approach\nThis study draws inspiration from the recurrent state-space model and recurrent models (RPM) to propose a simpler yet highly effective model called the unmanned aerial vehicles prediction model (UAVPM). The main objective is to assist in training the UAV representation model with a recurrent neural network, using the soft actor-critic algorithm.\n\nFindings\nThis study proposes a generalized actor-critic framework consisting of three modules: representation, policy and value. This architecture serves as the foundation for training UAVPM. This study proposes the UAVPM, which is designed to aid in training the recurrent representation using the transition model, reward recovery model and observation recovery model. Unlike traditional approaches reliant solely on reward signals, RPM incorporates temporal information. In addition, it allows the inclusion of extra knowledge or information from virtual training environments. This study designs UAV target search and UAV cooperative obstacle avoidance tasks. The algorithm outperforms baselines in these two environments.\n\nOriginality/value\nIt is important to note that UAVPM does not play a role in the inference phase. This means that the representation model and policy remain independent of UAVPM. Consequently, this study can introduce additional “cheating” information from virtual training environments to guide the UAV representation without concerns about its real-world existence. By leveraging historical information more effectively, this study enhances UAVs’ decision-making abilities, thus improving the performance of both tasks at hand.\n","PeriodicalId":44153,"journal":{"name":"International Journal of Web Information Systems","volume":null,"pages":null},"PeriodicalIF":2.5000,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Web intelligence-enhanced unmanned aerial vehicle target search model based on reinforcement learning for cooperative tasks\",\"authors\":\"Mingke Gao, Zhenyu Zhang, Jinyuan Zhang, Shihao Tang, Han Zhang, Tao Pang\",\"doi\":\"10.1108/ijwis-10-2023-0184\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Purpose\\nBecause of the various advantages of reinforcement learning (RL) mentioned above, this study uses RL to train unmanned aerial vehicles to perform two tasks: target search and cooperative obstacle avoidance.\\n\\nDesign/methodology/approach\\nThis study draws inspiration from the recurrent state-space model and recurrent models (RPM) to propose a simpler yet highly effective model called the unmanned aerial vehicles prediction model (UAVPM). The main objective is to assist in training the UAV representation model with a recurrent neural network, using the soft actor-critic algorithm.\\n\\nFindings\\nThis study proposes a generalized actor-critic framework consisting of three modules: representation, policy and value. This architecture serves as the foundation for training UAVPM. This study proposes the UAVPM, which is designed to aid in training the recurrent representation using the transition model, reward recovery model and observation recovery model. Unlike traditional approaches reliant solely on reward signals, RPM incorporates temporal information. In addition, it allows the inclusion of extra knowledge or information from virtual training environments. This study designs UAV target search and UAV cooperative obstacle avoidance tasks. The algorithm outperforms baselines in these two environments.\\n\\nOriginality/value\\nIt is important to note that UAVPM does not play a role in the inference phase. This means that the representation model and policy remain independent of UAVPM. Consequently, this study can introduce additional “cheating” information from virtual training environments to guide the UAV representation without concerns about its real-world existence. By leveraging historical information more effectively, this study enhances UAVs’ decision-making abilities, thus improving the performance of both tasks at hand.\\n\",\"PeriodicalId\":44153,\"journal\":{\"name\":\"International Journal of Web Information Systems\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2024-03-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Web Information Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1108/ijwis-10-2023-0184\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Web Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1108/ijwis-10-2023-0184","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

设计/方法/方法本研究从递归状态空间模型和递归模型（RPM）中汲取灵感，提出了一种更简单但非常有效的模型，称为无人飞行器预测模型（UAVPM）。主要目的是利用软演员批判算法，通过递归神经网络协助训练无人机表示模型。研究结果本研究提出了一个广义演员批判框架，由表示、策略和价值三个模块组成。该架构是训练 UAVPM 的基础。本研究提出了 UAVPM，旨在利用过渡模型、奖励恢复模型和观察恢复模型来帮助训练循环表示。与传统的仅依赖奖励信号的方法不同，RPM 纳入了时间信息。此外，它还允许加入额外的知识或来自虚拟训练环境的信息。本研究设计了无人机目标搜索和无人机协同避障任务。该算法在这两种环境中的表现优于基线算法。原创性/价值值得注意的是，UAVPM 在推理阶段并不发挥作用。这意味着表示模型和策略与 UAVPM 无关。因此，这项研究可以从虚拟训练环境中引入额外的 "作弊 "信息来指导无人机表示，而无需担心其在现实世界中的存在。通过更有效地利用历史信息，本研究增强了无人机的决策能力，从而提高了手头两项任务的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Web intelligence-enhanced unmanned aerial vehicle target search model based on reinforcement learning for cooperative tasks

Purpose Because of the various advantages of reinforcement learning (RL) mentioned above, this study uses RL to train unmanned aerial vehicles to perform two tasks: target search and cooperative obstacle avoidance. Design/methodology/approach This study draws inspiration from the recurrent state-space model and recurrent models (RPM) to propose a simpler yet highly effective model called the unmanned aerial vehicles prediction model (UAVPM). The main objective is to assist in training the UAV representation model with a recurrent neural network, using the soft actor-critic algorithm. Findings This study proposes a generalized actor-critic framework consisting of three modules: representation, policy and value. This architecture serves as the foundation for training UAVPM. This study proposes the UAVPM, which is designed to aid in training the recurrent representation using the transition model, reward recovery model and observation recovery model. Unlike traditional approaches reliant solely on reward signals, RPM incorporates temporal information. In addition, it allows the inclusion of extra knowledge or information from virtual training environments. This study designs UAV target search and UAV cooperative obstacle avoidance tasks. The algorithm outperforms baselines in these two environments. Originality/value It is important to note that UAVPM does not play a role in the inference phase. This means that the representation model and policy remain independent of UAVPM. Consequently, this study can introduce additional “cheating” information from virtual training environments to guide the UAV representation without concerns about its real-world existence. By leveraging historical information more effectively, this study enhances UAVs’ decision-making abilities, thus improving the performance of both tasks at hand.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Web Information Systems COMPUTER SCIENCE, INFORMATION SYSTEMS-

CiteScore

4.60

自引率

0.00%

发文量

期刊介绍： The Global Information Infrastructure is a daily reality. In spite of the many applications in all domains of our societies: e-business, e-commerce, e-learning, e-science, and e-government, for instance, and in spite of the tremendous advances by engineers and scientists, the seamless development of Web information systems and services remains a major challenge. The journal examines how current shared vision for the future is one of semantically-rich information and service oriented architecture for global information systems. This vision is at the convergence of progress in technologies such as XML, Web services, RDF, OWL, of multimedia, multimodal, and multilingual information retrieval, and of distributed, mobile and ubiquitous computing. Topicality While the International Journal of Web Information Systems covers a broad range of topics, the journal welcomes papers that provide a perspective on all aspects of Web information systems: Web semantics and Web dynamics, Web mining and searching, Web databases and Web data integration, Web-based commerce and e-business, Web collaboration and distributed computing, Internet computing and networks, performance of Web applications, and Web multimedia services and Web-based education.