{"title":"SMLoc:空间多层感知引导的相机定位","authors":"Jingyuan Feng, Shengsheng Wang, Haonan Sun","doi":"10.1117/1.jei.33.5.053013","DOIUrl":null,"url":null,"abstract":"Camera localization is a technique for obtaining the camera’s six degrees of freedom using the camera as a sensor input. It is widely used in augmented reality, autonomous driving, virtual reality, etc. In recent years, with the development of deep-learning technology, absolute pose regression has gained wide attention as an end-to-end learning-based localization method. The typical architecture is constructed by a convolutional backbone and a multilayer perception (MLP) regression header composed of multiple fully connected layers. Typically, the two-dimensional feature maps extracted by the convolutional backbone have to be flattened and passed into the fully connected layer for pose regression. However, this operation will result in the loss of crucial pixel position information carried by the two-dimensional feature map and adversely affect the accuracy of the pose estimation. We propose a parallel structure, termed SMLoc, using a spatial MLP to aggregate position and orientation information from feature maps, respectively, reducing the loss of pixel position information. Our approach achieves superior performance on common indoor and outdoor datasets.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"734 1","pages":""},"PeriodicalIF":1.0000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SMLoc: spatial multilayer perception-guided camera localization\",\"authors\":\"Jingyuan Feng, Shengsheng Wang, Haonan Sun\",\"doi\":\"10.1117/1.jei.33.5.053013\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Camera localization is a technique for obtaining the camera’s six degrees of freedom using the camera as a sensor input. It is widely used in augmented reality, autonomous driving, virtual reality, etc. In recent years, with the development of deep-learning technology, absolute pose regression has gained wide attention as an end-to-end learning-based localization method. The typical architecture is constructed by a convolutional backbone and a multilayer perception (MLP) regression header composed of multiple fully connected layers. Typically, the two-dimensional feature maps extracted by the convolutional backbone have to be flattened and passed into the fully connected layer for pose regression. However, this operation will result in the loss of crucial pixel position information carried by the two-dimensional feature map and adversely affect the accuracy of the pose estimation. We propose a parallel structure, termed SMLoc, using a spatial MLP to aggregate position and orientation information from feature maps, respectively, reducing the loss of pixel position information. Our approach achieves superior performance on common indoor and outdoor datasets.\",\"PeriodicalId\":54843,\"journal\":{\"name\":\"Journal of Electronic Imaging\",\"volume\":\"734 1\",\"pages\":\"\"},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2024-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Electronic Imaging\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1117/1.jei.33.5.053013\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Electronic Imaging","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1117/1.jei.33.5.053013","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
SMLoc: spatial multilayer perception-guided camera localization
Camera localization is a technique for obtaining the camera’s six degrees of freedom using the camera as a sensor input. It is widely used in augmented reality, autonomous driving, virtual reality, etc. In recent years, with the development of deep-learning technology, absolute pose regression has gained wide attention as an end-to-end learning-based localization method. The typical architecture is constructed by a convolutional backbone and a multilayer perception (MLP) regression header composed of multiple fully connected layers. Typically, the two-dimensional feature maps extracted by the convolutional backbone have to be flattened and passed into the fully connected layer for pose regression. However, this operation will result in the loss of crucial pixel position information carried by the two-dimensional feature map and adversely affect the accuracy of the pose estimation. We propose a parallel structure, termed SMLoc, using a spatial MLP to aggregate position and orientation information from feature maps, respectively, reducing the loss of pixel position information. Our approach achieves superior performance on common indoor and outdoor datasets.
期刊介绍:
The Journal of Electronic Imaging publishes peer-reviewed papers in all technology areas that make up the field of electronic imaging and are normally considered in the design, engineering, and applications of electronic imaging systems.