VideoLoc:基于视频的室内文本信息定位

IEEE INFOCOM 2021 - IEEE Conference on Computer Communications Pub Date : 2021-05-10 DOI:10.1109/INFOCOM42981.2021.9488739

Shusheng Li, Wenbo He

{"title":"VideoLoc:基于视频的室内文本信息定位","authors":"Shusheng Li, Wenbo He","doi":"10.1109/INFOCOM42981.2021.9488739","DOIUrl":null,"url":null,"abstract":"Indoor localization serves an important role in various scenarios such as navigation in shopping malls or hospitals. However, the existing technology is usually based on additional deployment and the signals suffer from strong environmental interference in the complex indoor environment. In this paper, we propose video-based indoor localization with text information (i.e. \"VideoLoc\") without the deployment of additional equipment. Videos taken by the phone carriers cover more critical information (e.g. logos in malls), while a single photo may fail to capture it. To reduce redundant information in the video, we propose key-frame selection based on deep learning model and clustering algorithm. Video frames are characterized with deep visual descriptors and the clustering algorithm efficiently clusters these descriptors into a set of non-overlapping snippets. We select keyframes from these non-overlapping snippets in terms of the cluster centroid that represents each snippet. Then, we propose text detection and recognition with the perspective transformation to make full use of stable and discriminative text information (e.g. logos or room numbers) in keyframes for localization. Finally, we obtain the location of the phone carrier via the triangulation algorithm. The experimental results show that VideoLoc achieves high precision of localization and is robust to dynamic environments.","PeriodicalId":293079,"journal":{"name":"IEEE INFOCOM 2021 - IEEE Conference on Computer Communications","volume":"63 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"VideoLoc: Video-based Indoor Localization with Text Information\",\"authors\":\"Shusheng Li, Wenbo He\",\"doi\":\"10.1109/INFOCOM42981.2021.9488739\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Indoor localization serves an important role in various scenarios such as navigation in shopping malls or hospitals. However, the existing technology is usually based on additional deployment and the signals suffer from strong environmental interference in the complex indoor environment. In this paper, we propose video-based indoor localization with text information (i.e. \\\"VideoLoc\\\") without the deployment of additional equipment. Videos taken by the phone carriers cover more critical information (e.g. logos in malls), while a single photo may fail to capture it. To reduce redundant information in the video, we propose key-frame selection based on deep learning model and clustering algorithm. Video frames are characterized with deep visual descriptors and the clustering algorithm efficiently clusters these descriptors into a set of non-overlapping snippets. We select keyframes from these non-overlapping snippets in terms of the cluster centroid that represents each snippet. Then, we propose text detection and recognition with the perspective transformation to make full use of stable and discriminative text information (e.g. logos or room numbers) in keyframes for localization. Finally, we obtain the location of the phone carrier via the triangulation algorithm. The experimental results show that VideoLoc achieves high precision of localization and is robust to dynamic environments.\",\"PeriodicalId\":293079,\"journal\":{\"name\":\"IEEE INFOCOM 2021 - IEEE Conference on Computer Communications\",\"volume\":\"63 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-05-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE INFOCOM 2021 - IEEE Conference on Computer Communications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INFOCOM42981.2021.9488739\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE INFOCOM 2021 - IEEE Conference on Computer Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INFOCOM42981.2021.9488739","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

室内定位在购物中心或医院导航等各种场景中发挥着重要作用。然而，现有技术通常是基于附加部署的，在复杂的室内环境中，信号受到强烈的环境干扰。在本文中，我们提出了一种基于视频的室内定位方法。(“videloc”)，无需部署额外设备。手机运营商拍摄的视频涵盖了更多关键信息(例如商场的标志)，而单张照片可能无法捕捉到这些信息。为了减少视频中的冗余信息，我们提出了基于深度学习模型和聚类算法的关键帧选择。视频帧具有深度视觉描述符，聚类算法有效地将这些描述符聚类成一组不重叠的片段。我们根据代表每个片段的聚类质心从这些非重叠片段中选择关键帧。然后，我们提出了基于视角变换的文本检测和识别，充分利用关键帧中稳定的、有区别的文本信息(如logo或房间号)进行定位。最后，通过三角测量算法得到手机运营商的位置。实验结果表明，该方法具有较高的定位精度和对动态环境的鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

VideoLoc: Video-based Indoor Localization with Text Information

Indoor localization serves an important role in various scenarios such as navigation in shopping malls or hospitals. However, the existing technology is usually based on additional deployment and the signals suffer from strong environmental interference in the complex indoor environment. In this paper, we propose video-based indoor localization with text information (i.e. "VideoLoc") without the deployment of additional equipment. Videos taken by the phone carriers cover more critical information (e.g. logos in malls), while a single photo may fail to capture it. To reduce redundant information in the video, we propose key-frame selection based on deep learning model and clustering algorithm. Video frames are characterized with deep visual descriptors and the clustering algorithm efficiently clusters these descriptors into a set of non-overlapping snippets. We select keyframes from these non-overlapping snippets in terms of the cluster centroid that represents each snippet. Then, we propose text detection and recognition with the perspective transformation to make full use of stable and discriminative text information (e.g. logos or room numbers) in keyframes for localization. Finally, we obtain the location of the phone carrier via the triangulation algorithm. The experimental results show that VideoLoc achieves high precision of localization and is robust to dynamic environments.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE INFOCOM 2021 - IEEE Conference on Computer Communications

自引率

0.00%

发文量