Online Virtual Training in Soft Actor-Critic for Autonomous Driving

Maryam Savari, Y. Choe
{"title":"Online Virtual Training in Soft Actor-Critic for Autonomous Driving","authors":"Maryam Savari, Y. Choe","doi":"10.1109/IJCNN52387.2021.9533791","DOIUrl":null,"url":null,"abstract":"Deep Reinforcement Learning (RL) algorithms are widely being used in autonomous driving due to their ability to cope with unseen environments. However, in a complex domain like autonomous driving, these algorithms need to explore the environment enough to be able to converge. Therefore, these algorithms are faced with the problem of long training times and large amounts of data. In addition, using deep RL algorithms in areas that safety is an important factor such as autonomous driving can lead to a safety issue since we cannot leave the car driving in the street unattended. In this research, we tested two methods for the purpose of reducing the training time. First, we pre-trained Soft Actor-Critic (SAC) with Learning from Demonstrations (LfD) to find out if pre-training can reduce the training time of the SAC algorithm. Then, an online end-to-end combination method of SAC, LfD, and Learning from Interventions (LfI) is proposed to train an agent (dubbed Online Virtual Training). Both scenarios were implemented and tested in an inverted-pendulum task in OpenAI gym and autonomous driving in the Carla simulator. The results showed a dramatic reduction in the training time and a significant increase in gaining rewards for Online LfD (33%) and Online Virtual training (36 %) as compare to the baseline SAC. The proposed approach is expected to be effective in daily commute scenarios for autonomous driving.","PeriodicalId":396583,"journal":{"name":"2021 International Joint Conference on Neural Networks (IJCNN)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Joint Conference on Neural Networks (IJCNN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN52387.2021.9533791","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Deep Reinforcement Learning (RL) algorithms are widely being used in autonomous driving due to their ability to cope with unseen environments. However, in a complex domain like autonomous driving, these algorithms need to explore the environment enough to be able to converge. Therefore, these algorithms are faced with the problem of long training times and large amounts of data. In addition, using deep RL algorithms in areas that safety is an important factor such as autonomous driving can lead to a safety issue since we cannot leave the car driving in the street unattended. In this research, we tested two methods for the purpose of reducing the training time. First, we pre-trained Soft Actor-Critic (SAC) with Learning from Demonstrations (LfD) to find out if pre-training can reduce the training time of the SAC algorithm. Then, an online end-to-end combination method of SAC, LfD, and Learning from Interventions (LfI) is proposed to train an agent (dubbed Online Virtual Training). Both scenarios were implemented and tested in an inverted-pendulum task in OpenAI gym and autonomous driving in the Carla simulator. The results showed a dramatic reduction in the training time and a significant increase in gaining rewards for Online LfD (33%) and Online Virtual training (36 %) as compare to the baseline SAC. The proposed approach is expected to be effective in daily commute scenarios for autonomous driving.
自动驾驶软演员评价的在线虚拟训练
深度强化学习(RL)算法因其应对未知环境的能力而被广泛应用于自动驾驶。然而,在像自动驾驶这样的复杂领域,这些算法需要对环境进行足够的探索才能收敛。因此,这些算法都面临着训练时间长、数据量大的问题。此外,在自动驾驶等对安全性要求很高的领域使用深度强化学习算法可能会导致安全问题,因为我们不能让汽车在无人看管的情况下在街上行驶。在本研究中,我们以减少训练时间为目的,测试了两种方法。首先,我们使用从演示中学习(LfD)对软行为者-评论家(SAC)进行预训练,以确定预训练是否可以减少SAC算法的训练时间。然后,提出了一种SAC、LfD和从干预中学习(LfI)的在线端到端组合方法来训练智能体(称为在线虚拟训练)。这两种场景都在OpenAI gym的倒摆任务和Carla模拟器的自动驾驶中实现和测试。结果显示,与基线SAC相比,在线LfD(33%)和在线虚拟培训(36%)的培训时间显着减少,获得奖励显着增加。该方法有望在自动驾驶的日常通勤场景中发挥作用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信