VideoLoc:基于视频的室内文本信息定位

Shusheng Li, Wenbo He
{"title":"VideoLoc:基于视频的室内文本信息定位","authors":"Shusheng Li, Wenbo He","doi":"10.1109/INFOCOM42981.2021.9488739","DOIUrl":null,"url":null,"abstract":"Indoor localization serves an important role in various scenarios such as navigation in shopping malls or hospitals. However, the existing technology is usually based on additional deployment and the signals suffer from strong environmental interference in the complex indoor environment. In this paper, we propose video-based indoor localization with text information (i.e. \"VideoLoc\") without the deployment of additional equipment. Videos taken by the phone carriers cover more critical information (e.g. logos in malls), while a single photo may fail to capture it. To reduce redundant information in the video, we propose key-frame selection based on deep learning model and clustering algorithm. Video frames are characterized with deep visual descriptors and the clustering algorithm efficiently clusters these descriptors into a set of non-overlapping snippets. We select keyframes from these non-overlapping snippets in terms of the cluster centroid that represents each snippet. Then, we propose text detection and recognition with the perspective transformation to make full use of stable and discriminative text information (e.g. logos or room numbers) in keyframes for localization. Finally, we obtain the location of the phone carrier via the triangulation algorithm. The experimental results show that VideoLoc achieves high precision of localization and is robust to dynamic environments.","PeriodicalId":293079,"journal":{"name":"IEEE INFOCOM 2021 - IEEE Conference on Computer Communications","volume":"63 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"VideoLoc: Video-based Indoor Localization with Text Information\",\"authors\":\"Shusheng Li, Wenbo He\",\"doi\":\"10.1109/INFOCOM42981.2021.9488739\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Indoor localization serves an important role in various scenarios such as navigation in shopping malls or hospitals. However, the existing technology is usually based on additional deployment and the signals suffer from strong environmental interference in the complex indoor environment. In this paper, we propose video-based indoor localization with text information (i.e. \\\"VideoLoc\\\") without the deployment of additional equipment. Videos taken by the phone carriers cover more critical information (e.g. logos in malls), while a single photo may fail to capture it. To reduce redundant information in the video, we propose key-frame selection based on deep learning model and clustering algorithm. Video frames are characterized with deep visual descriptors and the clustering algorithm efficiently clusters these descriptors into a set of non-overlapping snippets. We select keyframes from these non-overlapping snippets in terms of the cluster centroid that represents each snippet. Then, we propose text detection and recognition with the perspective transformation to make full use of stable and discriminative text information (e.g. logos or room numbers) in keyframes for localization. Finally, we obtain the location of the phone carrier via the triangulation algorithm. The experimental results show that VideoLoc achieves high precision of localization and is robust to dynamic environments.\",\"PeriodicalId\":293079,\"journal\":{\"name\":\"IEEE INFOCOM 2021 - IEEE Conference on Computer Communications\",\"volume\":\"63 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-05-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE INFOCOM 2021 - IEEE Conference on Computer Communications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INFOCOM42981.2021.9488739\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE INFOCOM 2021 - IEEE Conference on Computer Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INFOCOM42981.2021.9488739","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

室内定位在购物中心或医院导航等各种场景中发挥着重要作用。然而,现有技术通常是基于附加部署的,在复杂的室内环境中,信号受到强烈的环境干扰。在本文中,我们提出了一种基于视频的室内定位方法。(“videloc”),无需部署额外设备。手机运营商拍摄的视频涵盖了更多关键信息(例如商场的标志),而单张照片可能无法捕捉到这些信息。为了减少视频中的冗余信息,我们提出了基于深度学习模型和聚类算法的关键帧选择。视频帧具有深度视觉描述符,聚类算法有效地将这些描述符聚类成一组不重叠的片段。我们根据代表每个片段的聚类质心从这些非重叠片段中选择关键帧。然后,我们提出了基于视角变换的文本检测和识别,充分利用关键帧中稳定的、有区别的文本信息(如logo或房间号)进行定位。最后,通过三角测量算法得到手机运营商的位置。实验结果表明,该方法具有较高的定位精度和对动态环境的鲁棒性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
VideoLoc: Video-based Indoor Localization with Text Information
Indoor localization serves an important role in various scenarios such as navigation in shopping malls or hospitals. However, the existing technology is usually based on additional deployment and the signals suffer from strong environmental interference in the complex indoor environment. In this paper, we propose video-based indoor localization with text information (i.e. "VideoLoc") without the deployment of additional equipment. Videos taken by the phone carriers cover more critical information (e.g. logos in malls), while a single photo may fail to capture it. To reduce redundant information in the video, we propose key-frame selection based on deep learning model and clustering algorithm. Video frames are characterized with deep visual descriptors and the clustering algorithm efficiently clusters these descriptors into a set of non-overlapping snippets. We select keyframes from these non-overlapping snippets in terms of the cluster centroid that represents each snippet. Then, we propose text detection and recognition with the perspective transformation to make full use of stable and discriminative text information (e.g. logos or room numbers) in keyframes for localization. Finally, we obtain the location of the phone carrier via the triangulation algorithm. The experimental results show that VideoLoc achieves high precision of localization and is robust to dynamic environments.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信