UBG: An Unreal BattleGround Benchmark With Object-Aware Hierarchical Proximal Policy Optimization.

IF 10.2 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE transactions on neural networks and learning systems Pub Date : 2025-05-20 DOI:10.1109/tnnls.2025.3567001

Longyu Niu,Baihui Li,Xingjian Fan,Hao Fang,Jun Li,Junliang Xing,Jun Wan,Zhen Lei

{"title":"UBG: An Unreal BattleGround Benchmark With Object-Aware Hierarchical Proximal Policy Optimization.","authors":"Longyu Niu,Baihui Li,Xingjian Fan,Hao Fang,Jun Li,Junliang Xing,Jun Wan,Zhen Lei","doi":"10.1109/tnnls.2025.3567001","DOIUrl":null,"url":null,"abstract":"The deep reinforcement learning (DRL) has made significant progress in various simulation environments. However, applying DRL methods to real-world scenarios poses certain challenges due to limitations in visual fidelity, scene complexity, and task diversity within existing environments. To address limitations and explore the potential ability of DRL, we developed a 3-D open-world first-person shooter (FPS) game called Unreal BattleGround (UBG) using the unreal engine (UE). UBG provides a realistic 3-D environment with variable complexity, random scenes, diverse tasks, and multiple scene interaction methods. This benchmark involves far more complex state-action spaces than classic pseudo-3-D FPS games (e.g., ViZDoom), making it challenging for DRL to learn human-level decision sequences. Then, we propose the object-aware hierarchically proximal policy optimization (OaH-PPO) method in the UBG. It involves a two-level hierarchy, where the high-level controller is tasked with learning option control, and the low-level workers focus on mastering subtasks. To boost the learning of subtasks, we propose three modules: an object-aware module for extracting depth detection information from the environment, potential-based intrinsic reward shaping for efficient exploration, and annealing imitation learning (IL) to guide the initialization. Experimental results have demonstrated the broad applicability of the UBG and the effectiveness of the OaH-PPO. We will release the code of the UBG and OaH-PPO after publication.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"11 1","pages":""},"PeriodicalIF":10.2000,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/tnnls.2025.3567001","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The deep reinforcement learning (DRL) has made significant progress in various simulation environments. However, applying DRL methods to real-world scenarios poses certain challenges due to limitations in visual fidelity, scene complexity, and task diversity within existing environments. To address limitations and explore the potential ability of DRL, we developed a 3-D open-world first-person shooter (FPS) game called Unreal BattleGround (UBG) using the unreal engine (UE). UBG provides a realistic 3-D environment with variable complexity, random scenes, diverse tasks, and multiple scene interaction methods. This benchmark involves far more complex state-action spaces than classic pseudo-3-D FPS games (e.g., ViZDoom), making it challenging for DRL to learn human-level decision sequences. Then, we propose the object-aware hierarchically proximal policy optimization (OaH-PPO) method in the UBG. It involves a two-level hierarchy, where the high-level controller is tasked with learning option control, and the low-level workers focus on mastering subtasks. To boost the learning of subtasks, we propose three modules: an object-aware module for extracting depth detection information from the environment, potential-based intrinsic reward shaping for efficient exploration, and annealing imitation learning (IL) to guide the initialization. Experimental results have demonstrated the broad applicability of the UBG and the effectiveness of the OaH-PPO. We will release the code of the UBG and OaH-PPO after publication.

查看原文本刊更多论文

UBG：具有对象感知分层最接近策略优化的虚幻战场基准。

深度强化学习（DRL）在各种仿真环境中取得了重大进展。然而，由于现有环境中视觉保真度、场景复杂性和任务多样性的限制，将DRL方法应用于现实场景会带来一定的挑战。为了解决DRL的局限性并探索其潜在能力，我们使用虚幻引擎（UE）开发了一款名为《虚幻战场》（UBG）的3d开放世界第一人称射击游戏。UBG提供了一个复杂多变、场景随机、任务多样、场景交互方式多样的逼真三维环境。这个基准涉及比经典的伪3- d FPS游戏（如《ViZDoom》）更复杂的状态-动作空间，这使得DRL很难学习人类级别的决策序列。在此基础上，提出了UBG中目标感知的分层近端策略优化（OaH-PPO）方法。它涉及一个两级层次结构，其中高级控制器的任务是学习选项控制，而低级工作者则专注于掌握子任务。为了促进子任务的学习，我们提出了三个模块：用于从环境中提取深度检测信息的对象感知模块，用于有效探索的基于潜在的内在奖励塑造模块，以及用于指导初始化的退火模仿学习（IL）模块。实验结果证明了UBG的广泛适用性和OaH-PPO的有效性。UBG和OaH-PPO的代码将在发布后发布。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on neural networks and learning systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

CiteScore

23.80

自引率

9.60%

发文量

2102

审稿时长

3-8 weeks

期刊介绍： The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.