基于自适应奖励增强法的深度强化学习改进GNSS定位校正

Jianhao Tang, Zhenni Li, Rui Guo, Haoli Zhao, Qianming Wang, Ming Liu, Shengli Xie, Marios Polycarpou
{"title":"基于自适应奖励增强法的深度强化学习改进GNSS定位校正","authors":"Jianhao Tang, Zhenni Li, Rui Guo, Haoli Zhao, Qianming Wang, Ming Liu, Shengli Xie, Marios Polycarpou","doi":"10.33012/2023.19181","DOIUrl":null,"url":null,"abstract":"High-precision global navigation satellite system (GNSS) positioning for automatic driving in urban environments is an unsolved problem because of the influence of multipath effects. Recently, methods based on data-driven deep reinforcement learning (DRL) have been used to learn continual positioning-correction strategies without strict assumptions about model parameters, and are adaptable to nonstationary urban environments. However, these methods face two remaining challenges: 1) real-time data for training collected in nonstationary urban environments is inadequate because of issues such as response delay and signal interruption, which causes the performance degradation in DRL, 2) the existing methods use vehicle positions as the environment observations, ignoring the complex errors caused by multipath effects in urban environments. In this paper, we propose a novel DRL-based positioning-correction method with an adaptive reward augmentation method (ARAM), and use GNSS measurements instead of vehicle positions as the environment observations, for improving the GNSS positioning accuracy in nonstationary urban environments. To specify the accurate current state of the vehicle agent, we use the GNSS measurement observations, including the line-of-sight (LOS) vector and the pseudorange residual, to model complex environmental errors, and employ a long and short term memory (LSTM) module to learn the temporal aspects of the observations, which include the interference by multipath effects on the GNSS positioning in urban environments. To address the performance degradation caused by inadequate real-time training data, we employ ARAM to adaptively modify the matching of data between the source domain and target domain of the nonstationary urban environments, to leverage sufficient data from the source domain for training, and thus to improve the performance of DRL. Hence, based on ARAM and using the GNSS measurement observations, we construct an LSTM-based proximal policy optimization algorithm with ARAM (LSTMPPO-ARAM) to achieve an adaptive dynamic positioning-correction policy for nonstationary urban environments. The proposed method was evaluated using the Google smartphone decimeter challenge (GSDC) dataset and the Guangzhou GNSS measurement dataset, with the results demonstrating that our method can obtain about a 10% improvement in positioning performance over existing model-based methods and an 8% improvement over learning-based approaches.","PeriodicalId":498211,"journal":{"name":"Proceedings of the Satellite Division's International Technical Meeting","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improving GNSS positioning correction using Deep Reinforcement Learning with Adaptive Reward Augmentation Method\",\"authors\":\"Jianhao Tang, Zhenni Li, Rui Guo, Haoli Zhao, Qianming Wang, Ming Liu, Shengli Xie, Marios Polycarpou\",\"doi\":\"10.33012/2023.19181\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"High-precision global navigation satellite system (GNSS) positioning for automatic driving in urban environments is an unsolved problem because of the influence of multipath effects. Recently, methods based on data-driven deep reinforcement learning (DRL) have been used to learn continual positioning-correction strategies without strict assumptions about model parameters, and are adaptable to nonstationary urban environments. However, these methods face two remaining challenges: 1) real-time data for training collected in nonstationary urban environments is inadequate because of issues such as response delay and signal interruption, which causes the performance degradation in DRL, 2) the existing methods use vehicle positions as the environment observations, ignoring the complex errors caused by multipath effects in urban environments. In this paper, we propose a novel DRL-based positioning-correction method with an adaptive reward augmentation method (ARAM), and use GNSS measurements instead of vehicle positions as the environment observations, for improving the GNSS positioning accuracy in nonstationary urban environments. To specify the accurate current state of the vehicle agent, we use the GNSS measurement observations, including the line-of-sight (LOS) vector and the pseudorange residual, to model complex environmental errors, and employ a long and short term memory (LSTM) module to learn the temporal aspects of the observations, which include the interference by multipath effects on the GNSS positioning in urban environments. To address the performance degradation caused by inadequate real-time training data, we employ ARAM to adaptively modify the matching of data between the source domain and target domain of the nonstationary urban environments, to leverage sufficient data from the source domain for training, and thus to improve the performance of DRL. Hence, based on ARAM and using the GNSS measurement observations, we construct an LSTM-based proximal policy optimization algorithm with ARAM (LSTMPPO-ARAM) to achieve an adaptive dynamic positioning-correction policy for nonstationary urban environments. The proposed method was evaluated using the Google smartphone decimeter challenge (GSDC) dataset and the Guangzhou GNSS measurement dataset, with the results demonstrating that our method can obtain about a 10% improvement in positioning performance over existing model-based methods and an 8% improvement over learning-based approaches.\",\"PeriodicalId\":498211,\"journal\":{\"name\":\"Proceedings of the Satellite Division's International Technical Meeting\",\"volume\":\"37 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-10-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Satellite Division's International Technical Meeting\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.33012/2023.19181\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Satellite Division's International Technical Meeting","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.33012/2023.19181","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

由于受多径效应的影响,城市环境下自动驾驶高精度全球卫星导航系统(GNSS)定位一直是一个未解决的问题。近年来,基于数据驱动的深度强化学习(DRL)的方法被用于学习连续定位校正策略,而不需要对模型参数进行严格的假设,并且可以适应非平稳的城市环境。然而,这些方法仍然面临两个挑战:1)由于响应延迟和信号中断等问题,在非平稳城市环境中收集的训练实时数据不足,导致DRL性能下降;2)现有方法使用车辆位置作为环境观测值,忽略了城市环境中多径效应引起的复杂误差。为了提高GNSS在非平稳城市环境中的定位精度,提出了一种新的基于drl的自适应奖励增强(ARAM)定位校正方法,并使用GNSS测量值代替车辆位置作为环境观测值。为了确定车辆智能体的准确当前状态,我们使用GNSS测量观测数据,包括视距(LOS)矢量和伪距残差,来模拟复杂的环境误差,并采用长短期记忆(LSTM)模块来学习观测数据的时间方面,其中包括城市环境中多径效应对GNSS定位的干扰。为了解决实时训练数据不足导致的性能下降问题,我们采用ARAM自适应修改非平稳城市环境的源域和目标域之间的数据匹配,利用源域足够的数据进行训练,从而提高DRL的性能。为此,基于ARAM,利用GNSS测量观测数据,构建了一种基于lstm的ARAM近端策略优化算法(LSTMPPO-ARAM),实现了非平稳城市环境下的自适应动态定位校正策略。利用谷歌智能手机分米挑战(GSDC)数据集和广州GNSS测量数据集对该方法进行了评估,结果表明,该方法的定位性能比现有的基于模型的方法提高了约10%,比基于学习的方法提高了8%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Improving GNSS positioning correction using Deep Reinforcement Learning with Adaptive Reward Augmentation Method
High-precision global navigation satellite system (GNSS) positioning for automatic driving in urban environments is an unsolved problem because of the influence of multipath effects. Recently, methods based on data-driven deep reinforcement learning (DRL) have been used to learn continual positioning-correction strategies without strict assumptions about model parameters, and are adaptable to nonstationary urban environments. However, these methods face two remaining challenges: 1) real-time data for training collected in nonstationary urban environments is inadequate because of issues such as response delay and signal interruption, which causes the performance degradation in DRL, 2) the existing methods use vehicle positions as the environment observations, ignoring the complex errors caused by multipath effects in urban environments. In this paper, we propose a novel DRL-based positioning-correction method with an adaptive reward augmentation method (ARAM), and use GNSS measurements instead of vehicle positions as the environment observations, for improving the GNSS positioning accuracy in nonstationary urban environments. To specify the accurate current state of the vehicle agent, we use the GNSS measurement observations, including the line-of-sight (LOS) vector and the pseudorange residual, to model complex environmental errors, and employ a long and short term memory (LSTM) module to learn the temporal aspects of the observations, which include the interference by multipath effects on the GNSS positioning in urban environments. To address the performance degradation caused by inadequate real-time training data, we employ ARAM to adaptively modify the matching of data between the source domain and target domain of the nonstationary urban environments, to leverage sufficient data from the source domain for training, and thus to improve the performance of DRL. Hence, based on ARAM and using the GNSS measurement observations, we construct an LSTM-based proximal policy optimization algorithm with ARAM (LSTMPPO-ARAM) to achieve an adaptive dynamic positioning-correction policy for nonstationary urban environments. The proposed method was evaluated using the Google smartphone decimeter challenge (GSDC) dataset and the Guangzhou GNSS measurement dataset, with the results demonstrating that our method can obtain about a 10% improvement in positioning performance over existing model-based methods and an 8% improvement over learning-based approaches.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信