优化gps拒绝环境下移动资产位置跟踪的同步时间

GLOBECOM 2022 - 2022 IEEE Global Communications Conference Pub Date : 2022-12-04 DOI:10.1109/GLOBECOM48099.2022.10001565

C. M. Bowyer, J. Shea, T. Wong, W. Dixon

{"title":"优化gps拒绝环境下移动资产位置跟踪的同步时间","authors":"C. M. Bowyer, J. Shea, T. Wong, W. Dixon","doi":"10.1109/GLOBECOM48099.2022.10001565","DOIUrl":null,"url":null,"abstract":"A network of distributed agents operating in a GPS-denied environment is tasked with tracking the position of a mobile asset. The agents use time-of-flight (ToF) measurements obtained from transmissions of a beacon signal by the asset to estimate the asset's position, but the estimates are noisy due to clock drift at the agents' local clocks. The clock drift can be reduced by having the asset and network of agents perform a distributed synchronization process; however, synchronization and localization cannot be performed simultaneously because the clock states are not well defined during synchronization and due to constraints on the radios' communication resources. The problem of optimizing when the agents should perform sensing or synchronization is formulated as a partially observed Markov decision process (POMDP) with continuous observations. Table-based Q-learning is used to search for an optimal policy on the belief space of the POMDP, but some form of approximation must be used because the belief space is a high -dimensional continuous space. We compare the performance of two approaches: 1) in replicated Q-learning, we learn a $Q$ function that is a linear function of the continuous beliefs; 2) in triple-Q learning, the beliefs are replaced by a model with fewer parameters, and quantization is used on a continuous parameter. Simulations are used to compare the performance of these methods with the approach that is most commonly used, which is using periodic synchronization at an optimized, fixed rate.","PeriodicalId":313199,"journal":{"name":"GLOBECOM 2022 - 2022 IEEE Global Communications Conference","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Optimizing Synchronization Times for Position Tracking of a Mobile Asset in GPS-denied Environments\",\"authors\":\"C. M. Bowyer, J. Shea, T. Wong, W. Dixon\",\"doi\":\"10.1109/GLOBECOM48099.2022.10001565\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A network of distributed agents operating in a GPS-denied environment is tasked with tracking the position of a mobile asset. The agents use time-of-flight (ToF) measurements obtained from transmissions of a beacon signal by the asset to estimate the asset's position, but the estimates are noisy due to clock drift at the agents' local clocks. The clock drift can be reduced by having the asset and network of agents perform a distributed synchronization process; however, synchronization and localization cannot be performed simultaneously because the clock states are not well defined during synchronization and due to constraints on the radios' communication resources. The problem of optimizing when the agents should perform sensing or synchronization is formulated as a partially observed Markov decision process (POMDP) with continuous observations. Table-based Q-learning is used to search for an optimal policy on the belief space of the POMDP, but some form of approximation must be used because the belief space is a high -dimensional continuous space. We compare the performance of two approaches: 1) in replicated Q-learning, we learn a $Q$ function that is a linear function of the continuous beliefs; 2) in triple-Q learning, the beliefs are replaced by a model with fewer parameters, and quantization is used on a continuous parameter. Simulations are used to compare the performance of these methods with the approach that is most commonly used, which is using periodic synchronization at an optimized, fixed rate.\",\"PeriodicalId\":313199,\"journal\":{\"name\":\"GLOBECOM 2022 - 2022 IEEE Global Communications Conference\",\"volume\":\"45 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"GLOBECOM 2022 - 2022 IEEE Global Communications Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/GLOBECOM48099.2022.10001565\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"GLOBECOM 2022 - 2022 IEEE Global Communications Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GLOBECOM48099.2022.10001565","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在拒绝gps的环境中运行的分布式代理网络的任务是跟踪移动资产的位置。智能体使用从资产的信标信号传输中获得的飞行时间(ToF)测量值来估计资产的位置，但由于智能体本地时钟的时钟漂移，估计结果存在噪声。可以通过使资产和代理网络执行分布式同步过程来减少时钟漂移;然而，同步和定位不能同时执行，因为同步期间时钟状态没有很好地定义，并且由于无线电通信资源的限制。智能体何时应该执行感知或同步的优化问题被表述为具有连续观测的部分观察马尔可夫决策过程(POMDP)。基于表的q学习用于在POMDP的信念空间上搜索最优策略，但由于信念空间是一个高维连续空间，因此必须使用某种形式的近似。我们比较了两种方法的性能:1)在复制Q学习中，我们学习一个$Q$函数，它是连续信念的线性函数;2)在三重q学习中，将信念替换为参数较少的模型，对连续参数进行量化。仿真用于将这些方法的性能与最常用的方法(以优化的固定速率使用周期性同步)进行比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Optimizing Synchronization Times for Position Tracking of a Mobile Asset in GPS-denied Environments

A network of distributed agents operating in a GPS-denied environment is tasked with tracking the position of a mobile asset. The agents use time-of-flight (ToF) measurements obtained from transmissions of a beacon signal by the asset to estimate the asset's position, but the estimates are noisy due to clock drift at the agents' local clocks. The clock drift can be reduced by having the asset and network of agents perform a distributed synchronization process; however, synchronization and localization cannot be performed simultaneously because the clock states are not well defined during synchronization and due to constraints on the radios' communication resources. The problem of optimizing when the agents should perform sensing or synchronization is formulated as a partially observed Markov decision process (POMDP) with continuous observations. Table-based Q-learning is used to search for an optimal policy on the belief space of the POMDP, but some form of approximation must be used because the belief space is a high -dimensional continuous space. We compare the performance of two approaches: 1) in replicated Q-learning, we learn a $Q$ function that is a linear function of the continuous beliefs; 2) in triple-Q learning, the beliefs are replaced by a model with fewer parameters, and quantization is used on a continuous parameter. Simulations are used to compare the performance of these methods with the approach that is most commonly used, which is using periodic synchronization at an optimized, fixed rate.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

GLOBECOM 2022 - 2022 IEEE Global Communications Conference

自引率

0.00%

发文量