ViFi-Loc: Multi-modal Pedestrian Localization using GAN with Camera-Phone Correspondences

Companion Publication of the 2020 International Conference on Multimodal Interaction Pub Date : 2023-10-09 DOI:10.1145/3577190.3614119

Hansi Liu, Hongsheng Lu, Kristin Data, Marco Gruteser

{"title":"ViFi-Loc: Multi-modal Pedestrian Localization using GAN with Camera-Phone Correspondences","authors":"Hansi Liu, Hongsheng Lu, Kristin Data, Marco Gruteser","doi":"10.1145/3577190.3614119","DOIUrl":null,"url":null,"abstract":"In Smart City and Vehicle-to-Everything (V2X) systems, acquiring pedestrians’ accurate locations is crucial to traffic and pedestrian safety. Current systems adopt cameras and wireless sensors to estimate people’s locations via sensor fusion. Standard fusion algorithms, however, become inapplicable when multi-modal data is not associated. For example, pedestrians are out of the camera field of view, or data from the camera modality is missing. To address this challenge and produce more accurate location estimations for pedestrians, we propose a localization solution based on a Generative Adversarial Network (GAN) architecture. During training, it learns the underlying linkage between pedestrians’ camera-phone data correspondences. During inference, it generates refined position estimations based only on pedestrians’ phone data that consists of GPS, IMU, and FTM. Results show that our GAN produces 3D coordinates at 1 to 2 meters localization error across 5 different outdoor scenes. We further show that the proposed model supports self-learning. The generated coordinates can be associated with pedestrians’ bounding box coordinates to obtain additional camera-phone data correspondences. This allows automatic data collection during inference. Results show that after fine-tuning the GAN model on the expanded dataset, localization accuracy is further improved by up to 26%.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"94 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Companion Publication of the 2020 International Conference on Multimodal Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3577190.3614119","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In Smart City and Vehicle-to-Everything (V2X) systems, acquiring pedestrians’ accurate locations is crucial to traffic and pedestrian safety. Current systems adopt cameras and wireless sensors to estimate people’s locations via sensor fusion. Standard fusion algorithms, however, become inapplicable when multi-modal data is not associated. For example, pedestrians are out of the camera field of view, or data from the camera modality is missing. To address this challenge and produce more accurate location estimations for pedestrians, we propose a localization solution based on a Generative Adversarial Network (GAN) architecture. During training, it learns the underlying linkage between pedestrians’ camera-phone data correspondences. During inference, it generates refined position estimations based only on pedestrians’ phone data that consists of GPS, IMU, and FTM. Results show that our GAN produces 3D coordinates at 1 to 2 meters localization error across 5 different outdoor scenes. We further show that the proposed model supports self-learning. The generated coordinates can be associated with pedestrians’ bounding box coordinates to obtain additional camera-phone data correspondences. This allows automatic data collection during inference. Results show that after fine-tuning the GAN model on the expanded dataset, localization accuracy is further improved by up to 26%.

查看原文本刊更多论文

ViFi-Loc:使用GAN与相机-手机通信的多模态行人定位

在智慧城市和车联网(V2X)系统中，获取行人的准确位置对交通和行人安全至关重要。目前的系统采用摄像头和无线传感器，通过传感器融合来估计人们的位置。然而，当多模态数据不相关联时，标准的融合算法就不适用了。例如，行人在相机视野之外，或者来自相机模态的数据丢失。为了应对这一挑战并为行人提供更准确的位置估计，我们提出了一种基于生成对抗网络(GAN)架构的定位解决方案。在训练过程中，它学习行人的相机和手机数据通信之间的潜在联系。在推理过程中，它仅基于行人的手机数据生成精细的位置估计，该数据由GPS, IMU和FTM组成。结果表明，我们的GAN在5个不同的室外场景中产生的3D坐标定位误差为1到2米。我们进一步证明了所提出的模型支持自学习。生成的坐标可以与行人的边界框坐标相关联，以获得额外的相机-手机数据对应。这允许在推理期间自动收集数据。结果表明，在扩展数据集上对GAN模型进行微调后，定位精度进一步提高了26%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Companion Publication of the 2020 International Conference on Multimodal Interaction

自引率

0.00%

发文量