Online Virtual Training in Soft Actor-Critic for Autonomous Driving

2021 International Joint Conference on Neural Networks (IJCNN) Pub Date : 2021-07-18 DOI:10.1109/IJCNN52387.2021.9533791

Maryam Savari, Y. Choe

{"title":"Online Virtual Training in Soft Actor-Critic for Autonomous Driving","authors":"Maryam Savari, Y. Choe","doi":"10.1109/IJCNN52387.2021.9533791","DOIUrl":null,"url":null,"abstract":"Deep Reinforcement Learning (RL) algorithms are widely being used in autonomous driving due to their ability to cope with unseen environments. However, in a complex domain like autonomous driving, these algorithms need to explore the environment enough to be able to converge. Therefore, these algorithms are faced with the problem of long training times and large amounts of data. In addition, using deep RL algorithms in areas that safety is an important factor such as autonomous driving can lead to a safety issue since we cannot leave the car driving in the street unattended. In this research, we tested two methods for the purpose of reducing the training time. First, we pre-trained Soft Actor-Critic (SAC) with Learning from Demonstrations (LfD) to find out if pre-training can reduce the training time of the SAC algorithm. Then, an online end-to-end combination method of SAC, LfD, and Learning from Interventions (LfI) is proposed to train an agent (dubbed Online Virtual Training). Both scenarios were implemented and tested in an inverted-pendulum task in OpenAI gym and autonomous driving in the Carla simulator. The results showed a dramatic reduction in the training time and a significant increase in gaining rewards for Online LfD (33%) and Online Virtual training (36 %) as compare to the baseline SAC. The proposed approach is expected to be effective in daily commute scenarios for autonomous driving.","PeriodicalId":396583,"journal":{"name":"2021 International Joint Conference on Neural Networks (IJCNN)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Joint Conference on Neural Networks (IJCNN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN52387.2021.9533791","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Deep Reinforcement Learning (RL) algorithms are widely being used in autonomous driving due to their ability to cope with unseen environments. However, in a complex domain like autonomous driving, these algorithms need to explore the environment enough to be able to converge. Therefore, these algorithms are faced with the problem of long training times and large amounts of data. In addition, using deep RL algorithms in areas that safety is an important factor such as autonomous driving can lead to a safety issue since we cannot leave the car driving in the street unattended. In this research, we tested two methods for the purpose of reducing the training time. First, we pre-trained Soft Actor-Critic (SAC) with Learning from Demonstrations (LfD) to find out if pre-training can reduce the training time of the SAC algorithm. Then, an online end-to-end combination method of SAC, LfD, and Learning from Interventions (LfI) is proposed to train an agent (dubbed Online Virtual Training). Both scenarios were implemented and tested in an inverted-pendulum task in OpenAI gym and autonomous driving in the Carla simulator. The results showed a dramatic reduction in the training time and a significant increase in gaining rewards for Online LfD (33%) and Online Virtual training (36 %) as compare to the baseline SAC. The proposed approach is expected to be effective in daily commute scenarios for autonomous driving.

查看原文本刊更多论文

自动驾驶软演员评价的在线虚拟训练

深度强化学习(RL)算法因其应对未知环境的能力而被广泛应用于自动驾驶。然而，在像自动驾驶这样的复杂领域，这些算法需要对环境进行足够的探索才能收敛。因此，这些算法都面临着训练时间长、数据量大的问题。此外，在自动驾驶等对安全性要求很高的领域使用深度强化学习算法可能会导致安全问题，因为我们不能让汽车在无人看管的情况下在街上行驶。在本研究中，我们以减少训练时间为目的，测试了两种方法。首先，我们使用从演示中学习(LfD)对软行为者-评论家(SAC)进行预训练，以确定预训练是否可以减少SAC算法的训练时间。然后，提出了一种SAC、LfD和从干预中学习(LfI)的在线端到端组合方法来训练智能体(称为在线虚拟训练)。这两种场景都在OpenAI gym的倒摆任务和Carla模拟器的自动驾驶中实现和测试。结果显示，与基线SAC相比，在线LfD(33%)和在线虚拟培训(36%)的培训时间显着减少，获得奖励显着增加。该方法有望在自动驾驶的日常通勤场景中发挥作用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 International Joint Conference on Neural Networks (IJCNN)

自引率

0.00%

发文量