在ARORA模拟器中通过强化学习比较物理效果

Troyle Thomas, Armando Fandango, D. Reed, C. Hoayun, J. Hurter, Alexander Gutierrez, K. Brawner
{"title":"在ARORA模拟器中通过强化学习比较物理效果","authors":"Troyle Thomas, Armando Fandango, D. Reed, C. Hoayun, J. Hurter, Alexander Gutierrez, K. Brawner","doi":"10.46354/i3m.2021.emss.015","DOIUrl":null,"url":null,"abstract":"By testing various physics levels for training autonomous-vehicle navigation using a deep deterministic policy gradient algorithm, the present study fills a lack of research on the impact of physics levels for vehicle behaviour, specifically for reinforcement-learning algorithms. Measures from a PointGoal Navigation task were investigated: simulator run-time, training steps, and agent effectiveness through the Success weighted by (normalised inverse) Path Length (SPL) measure. Training and testing occurred in the novel simulator ARORA, or A Realistic Open environment for Rapid Agent training. The goal of ARORA is to provide a high-fidelity, open-source platform for simulation, using physics-based movement, vehicle modelling, and a continuous action space within a large-scale geospecific city environment. Using four physics levels, or models, to create four different curriculum conditions for training, the SPL was highest for the condition using all physics levels defined for the experiment, with two conditions returning zero values. Future researchers should consider providing adequate support when training complex-physics vehicle models. The run-time results revealed a benefit for experimental machines with a better CPU, at least for the vector-only observations we employed.","PeriodicalId":322169,"journal":{"name":"Proceedings of the 33rd European Modeling & Simulation Symposium","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Comparing Physics Effects through Reinforcement Learning in the ARORA Simulator\",\"authors\":\"Troyle Thomas, Armando Fandango, D. Reed, C. Hoayun, J. Hurter, Alexander Gutierrez, K. Brawner\",\"doi\":\"10.46354/i3m.2021.emss.015\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"By testing various physics levels for training autonomous-vehicle navigation using a deep deterministic policy gradient algorithm, the present study fills a lack of research on the impact of physics levels for vehicle behaviour, specifically for reinforcement-learning algorithms. Measures from a PointGoal Navigation task were investigated: simulator run-time, training steps, and agent effectiveness through the Success weighted by (normalised inverse) Path Length (SPL) measure. Training and testing occurred in the novel simulator ARORA, or A Realistic Open environment for Rapid Agent training. The goal of ARORA is to provide a high-fidelity, open-source platform for simulation, using physics-based movement, vehicle modelling, and a continuous action space within a large-scale geospecific city environment. Using four physics levels, or models, to create four different curriculum conditions for training, the SPL was highest for the condition using all physics levels defined for the experiment, with two conditions returning zero values. Future researchers should consider providing adequate support when training complex-physics vehicle models. The run-time results revealed a benefit for experimental machines with a better CPU, at least for the vector-only observations we employed.\",\"PeriodicalId\":322169,\"journal\":{\"name\":\"Proceedings of the 33rd European Modeling & Simulation Symposium\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 33rd European Modeling & Simulation Symposium\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.46354/i3m.2021.emss.015\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 33rd European Modeling & Simulation Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.46354/i3m.2021.emss.015","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

通过使用深度确定性策略梯度算法测试各种物理级别来训练自动驾驶汽车导航,本研究填补了物理级别对车辆行为影响的研究缺失,特别是对强化学习算法的影响。研究了PointGoal导航任务的度量:模拟器运行时间、训练步骤和通过(归一化逆)路径长度(SPL)度量加权成功的代理有效性。训练和测试在新型模拟器ARORA中进行,即用于快速智能体训练的现实开放环境。ARORA的目标是提供一个高保真的、开源的仿真平台,使用基于物理的运动、车辆建模和大规模地理特定城市环境中的连续动作空间。使用四个物理水平或模型来创建四种不同的训练课程条件,在使用为实验定义的所有物理水平的条件下,SPL最高,其中两个条件返回零值。未来的研究人员应该考虑在训练复杂物理车辆模型时提供足够的支持。运行时结果揭示了具有更好CPU的实验机器的好处,至少对于我们使用的仅向量观察来说是这样。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Comparing Physics Effects through Reinforcement Learning in the ARORA Simulator
By testing various physics levels for training autonomous-vehicle navigation using a deep deterministic policy gradient algorithm, the present study fills a lack of research on the impact of physics levels for vehicle behaviour, specifically for reinforcement-learning algorithms. Measures from a PointGoal Navigation task were investigated: simulator run-time, training steps, and agent effectiveness through the Success weighted by (normalised inverse) Path Length (SPL) measure. Training and testing occurred in the novel simulator ARORA, or A Realistic Open environment for Rapid Agent training. The goal of ARORA is to provide a high-fidelity, open-source platform for simulation, using physics-based movement, vehicle modelling, and a continuous action space within a large-scale geospecific city environment. Using four physics levels, or models, to create four different curriculum conditions for training, the SPL was highest for the condition using all physics levels defined for the experiment, with two conditions returning zero values. Future researchers should consider providing adequate support when training complex-physics vehicle models. The run-time results revealed a benefit for experimental machines with a better CPU, at least for the vector-only observations we employed.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信