{"title":"基于自适应奖励增强法的深度强化学习改进GNSS定位校正","authors":"Jianhao Tang, Zhenni Li, Rui Guo, Haoli Zhao, Qianming Wang, Ming Liu, Shengli Xie, Marios Polycarpou","doi":"10.33012/2023.19181","DOIUrl":null,"url":null,"abstract":"High-precision global navigation satellite system (GNSS) positioning for automatic driving in urban environments is an unsolved problem because of the influence of multipath effects. Recently, methods based on data-driven deep reinforcement learning (DRL) have been used to learn continual positioning-correction strategies without strict assumptions about model parameters, and are adaptable to nonstationary urban environments. However, these methods face two remaining challenges: 1) real-time data for training collected in nonstationary urban environments is inadequate because of issues such as response delay and signal interruption, which causes the performance degradation in DRL, 2) the existing methods use vehicle positions as the environment observations, ignoring the complex errors caused by multipath effects in urban environments. In this paper, we propose a novel DRL-based positioning-correction method with an adaptive reward augmentation method (ARAM), and use GNSS measurements instead of vehicle positions as the environment observations, for improving the GNSS positioning accuracy in nonstationary urban environments. To specify the accurate current state of the vehicle agent, we use the GNSS measurement observations, including the line-of-sight (LOS) vector and the pseudorange residual, to model complex environmental errors, and employ a long and short term memory (LSTM) module to learn the temporal aspects of the observations, which include the interference by multipath effects on the GNSS positioning in urban environments. To address the performance degradation caused by inadequate real-time training data, we employ ARAM to adaptively modify the matching of data between the source domain and target domain of the nonstationary urban environments, to leverage sufficient data from the source domain for training, and thus to improve the performance of DRL. Hence, based on ARAM and using the GNSS measurement observations, we construct an LSTM-based proximal policy optimization algorithm with ARAM (LSTMPPO-ARAM) to achieve an adaptive dynamic positioning-correction policy for nonstationary urban environments. The proposed method was evaluated using the Google smartphone decimeter challenge (GSDC) dataset and the Guangzhou GNSS measurement dataset, with the results demonstrating that our method can obtain about a 10% improvement in positioning performance over existing model-based methods and an 8% improvement over learning-based approaches.","PeriodicalId":498211,"journal":{"name":"Proceedings of the Satellite Division's International Technical Meeting","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improving GNSS positioning correction using Deep Reinforcement Learning with Adaptive Reward Augmentation Method\",\"authors\":\"Jianhao Tang, Zhenni Li, Rui Guo, Haoli Zhao, Qianming Wang, Ming Liu, Shengli Xie, Marios Polycarpou\",\"doi\":\"10.33012/2023.19181\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"High-precision global navigation satellite system (GNSS) positioning for automatic driving in urban environments is an unsolved problem because of the influence of multipath effects. Recently, methods based on data-driven deep reinforcement learning (DRL) have been used to learn continual positioning-correction strategies without strict assumptions about model parameters, and are adaptable to nonstationary urban environments. However, these methods face two remaining challenges: 1) real-time data for training collected in nonstationary urban environments is inadequate because of issues such as response delay and signal interruption, which causes the performance degradation in DRL, 2) the existing methods use vehicle positions as the environment observations, ignoring the complex errors caused by multipath effects in urban environments. In this paper, we propose a novel DRL-based positioning-correction method with an adaptive reward augmentation method (ARAM), and use GNSS measurements instead of vehicle positions as the environment observations, for improving the GNSS positioning accuracy in nonstationary urban environments. To specify the accurate current state of the vehicle agent, we use the GNSS measurement observations, including the line-of-sight (LOS) vector and the pseudorange residual, to model complex environmental errors, and employ a long and short term memory (LSTM) module to learn the temporal aspects of the observations, which include the interference by multipath effects on the GNSS positioning in urban environments. To address the performance degradation caused by inadequate real-time training data, we employ ARAM to adaptively modify the matching of data between the source domain and target domain of the nonstationary urban environments, to leverage sufficient data from the source domain for training, and thus to improve the performance of DRL. Hence, based on ARAM and using the GNSS measurement observations, we construct an LSTM-based proximal policy optimization algorithm with ARAM (LSTMPPO-ARAM) to achieve an adaptive dynamic positioning-correction policy for nonstationary urban environments. The proposed method was evaluated using the Google smartphone decimeter challenge (GSDC) dataset and the Guangzhou GNSS measurement dataset, with the results demonstrating that our method can obtain about a 10% improvement in positioning performance over existing model-based methods and an 8% improvement over learning-based approaches.\",\"PeriodicalId\":498211,\"journal\":{\"name\":\"Proceedings of the Satellite Division's International Technical Meeting\",\"volume\":\"37 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-10-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Satellite Division's International Technical Meeting\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.33012/2023.19181\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Satellite Division's International Technical Meeting","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.33012/2023.19181","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Improving GNSS positioning correction using Deep Reinforcement Learning with Adaptive Reward Augmentation Method
High-precision global navigation satellite system (GNSS) positioning for automatic driving in urban environments is an unsolved problem because of the influence of multipath effects. Recently, methods based on data-driven deep reinforcement learning (DRL) have been used to learn continual positioning-correction strategies without strict assumptions about model parameters, and are adaptable to nonstationary urban environments. However, these methods face two remaining challenges: 1) real-time data for training collected in nonstationary urban environments is inadequate because of issues such as response delay and signal interruption, which causes the performance degradation in DRL, 2) the existing methods use vehicle positions as the environment observations, ignoring the complex errors caused by multipath effects in urban environments. In this paper, we propose a novel DRL-based positioning-correction method with an adaptive reward augmentation method (ARAM), and use GNSS measurements instead of vehicle positions as the environment observations, for improving the GNSS positioning accuracy in nonstationary urban environments. To specify the accurate current state of the vehicle agent, we use the GNSS measurement observations, including the line-of-sight (LOS) vector and the pseudorange residual, to model complex environmental errors, and employ a long and short term memory (LSTM) module to learn the temporal aspects of the observations, which include the interference by multipath effects on the GNSS positioning in urban environments. To address the performance degradation caused by inadequate real-time training data, we employ ARAM to adaptively modify the matching of data between the source domain and target domain of the nonstationary urban environments, to leverage sufficient data from the source domain for training, and thus to improve the performance of DRL. Hence, based on ARAM and using the GNSS measurement observations, we construct an LSTM-based proximal policy optimization algorithm with ARAM (LSTMPPO-ARAM) to achieve an adaptive dynamic positioning-correction policy for nonstationary urban environments. The proposed method was evaluated using the Google smartphone decimeter challenge (GSDC) dataset and the Guangzhou GNSS measurement dataset, with the results demonstrating that our method can obtain about a 10% improvement in positioning performance over existing model-based methods and an 8% improvement over learning-based approaches.