{"title":"VideoLoc: Video-based Indoor Localization with Text Information","authors":"Shusheng Li, Wenbo He","doi":"10.1109/INFOCOM42981.2021.9488739","DOIUrl":null,"url":null,"abstract":"Indoor localization serves an important role in various scenarios such as navigation in shopping malls or hospitals. However, the existing technology is usually based on additional deployment and the signals suffer from strong environmental interference in the complex indoor environment. In this paper, we propose video-based indoor localization with text information (i.e. \"VideoLoc\") without the deployment of additional equipment. Videos taken by the phone carriers cover more critical information (e.g. logos in malls), while a single photo may fail to capture it. To reduce redundant information in the video, we propose key-frame selection based on deep learning model and clustering algorithm. Video frames are characterized with deep visual descriptors and the clustering algorithm efficiently clusters these descriptors into a set of non-overlapping snippets. We select keyframes from these non-overlapping snippets in terms of the cluster centroid that represents each snippet. Then, we propose text detection and recognition with the perspective transformation to make full use of stable and discriminative text information (e.g. logos or room numbers) in keyframes for localization. Finally, we obtain the location of the phone carrier via the triangulation algorithm. The experimental results show that VideoLoc achieves high precision of localization and is robust to dynamic environments.","PeriodicalId":293079,"journal":{"name":"IEEE INFOCOM 2021 - IEEE Conference on Computer Communications","volume":"63 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE INFOCOM 2021 - IEEE Conference on Computer Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INFOCOM42981.2021.9488739","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Indoor localization serves an important role in various scenarios such as navigation in shopping malls or hospitals. However, the existing technology is usually based on additional deployment and the signals suffer from strong environmental interference in the complex indoor environment. In this paper, we propose video-based indoor localization with text information (i.e. "VideoLoc") without the deployment of additional equipment. Videos taken by the phone carriers cover more critical information (e.g. logos in malls), while a single photo may fail to capture it. To reduce redundant information in the video, we propose key-frame selection based on deep learning model and clustering algorithm. Video frames are characterized with deep visual descriptors and the clustering algorithm efficiently clusters these descriptors into a set of non-overlapping snippets. We select keyframes from these non-overlapping snippets in terms of the cluster centroid that represents each snippet. Then, we propose text detection and recognition with the perspective transformation to make full use of stable and discriminative text information (e.g. logos or room numbers) in keyframes for localization. Finally, we obtain the location of the phone carrier via the triangulation algorithm. The experimental results show that VideoLoc achieves high precision of localization and is robust to dynamic environments.