{"title":"通过在空间和时间上分布图像处理来加速基于视觉的3D室内定位","authors":"D. Yun, Hyunseok Chang, T. V. Lakshman","doi":"10.1145/2671015.2671018","DOIUrl":null,"url":null,"abstract":"In a vision-based 3D indoor localization system, conducting localization of user's device at a high frame rate is important to support real-time augment reality applications. However, vision-based 3D localization typically involves 2D keypoint detection and 2D-3D matching processes, which are in general too computationally intensive to be carried out at a high frame rate (e.g., 30 fps) on commodity hardware such as laptops or smartphones. In order to reduce per-frame computation time for 3D localization, we present a new method that distributes required computation over space and time, by splitting a video frame region into multiple sub-blocks, and processing only a sub-block in a rotating sequence at each video frame. The proposed method is general enough that it can be applied to any keypoint detection and 2D-3D matching schemes. We apply the method in a prototype 3D indoor localization system, and evaluate its performance in a 120m long indoor hallway environment using 5,200 video frames of 640x480 (VGA) resolution and a commodity laptop. When SIFT-based keypoint detection is used, our method reduces average and maximum computation time per frame by a factor of 10 and 7 respectively, with a marginal increase of positioning error (e.g., 0.17 m). This improvement enables the frame processing rate to increase from 3.2 fps to 23.3 fps.","PeriodicalId":93673,"journal":{"name":"Proceedings of the ACM Symposium on Virtual Reality Software and Technology. ACM Symposium on Virtual Reality Software and Technology","volume":"88 1","pages":"77-86"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Accelerating vision-based 3D indoor localization by distributing image processing over space and time\",\"authors\":\"D. Yun, Hyunseok Chang, T. V. Lakshman\",\"doi\":\"10.1145/2671015.2671018\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In a vision-based 3D indoor localization system, conducting localization of user's device at a high frame rate is important to support real-time augment reality applications. However, vision-based 3D localization typically involves 2D keypoint detection and 2D-3D matching processes, which are in general too computationally intensive to be carried out at a high frame rate (e.g., 30 fps) on commodity hardware such as laptops or smartphones. In order to reduce per-frame computation time for 3D localization, we present a new method that distributes required computation over space and time, by splitting a video frame region into multiple sub-blocks, and processing only a sub-block in a rotating sequence at each video frame. The proposed method is general enough that it can be applied to any keypoint detection and 2D-3D matching schemes. We apply the method in a prototype 3D indoor localization system, and evaluate its performance in a 120m long indoor hallway environment using 5,200 video frames of 640x480 (VGA) resolution and a commodity laptop. When SIFT-based keypoint detection is used, our method reduces average and maximum computation time per frame by a factor of 10 and 7 respectively, with a marginal increase of positioning error (e.g., 0.17 m). This improvement enables the frame processing rate to increase from 3.2 fps to 23.3 fps.\",\"PeriodicalId\":93673,\"journal\":{\"name\":\"Proceedings of the ACM Symposium on Virtual Reality Software and Technology. ACM Symposium on Virtual Reality Software and Technology\",\"volume\":\"88 1\",\"pages\":\"77-86\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-11-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ACM Symposium on Virtual Reality Software and Technology. ACM Symposium on Virtual Reality Software and Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2671015.2671018\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM Symposium on Virtual Reality Software and Technology. ACM Symposium on Virtual Reality Software and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2671015.2671018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Accelerating vision-based 3D indoor localization by distributing image processing over space and time
In a vision-based 3D indoor localization system, conducting localization of user's device at a high frame rate is important to support real-time augment reality applications. However, vision-based 3D localization typically involves 2D keypoint detection and 2D-3D matching processes, which are in general too computationally intensive to be carried out at a high frame rate (e.g., 30 fps) on commodity hardware such as laptops or smartphones. In order to reduce per-frame computation time for 3D localization, we present a new method that distributes required computation over space and time, by splitting a video frame region into multiple sub-blocks, and processing only a sub-block in a rotating sequence at each video frame. The proposed method is general enough that it can be applied to any keypoint detection and 2D-3D matching schemes. We apply the method in a prototype 3D indoor localization system, and evaluate its performance in a 120m long indoor hallway environment using 5,200 video frames of 640x480 (VGA) resolution and a commodity laptop. When SIFT-based keypoint detection is used, our method reduces average and maximum computation time per frame by a factor of 10 and 7 respectively, with a marginal increase of positioning error (e.g., 0.17 m). This improvement enables the frame processing rate to increase from 3.2 fps to 23.3 fps.