Simultaneous Scene-independent Camera Localization and Category-level Object Pose Estimation via Multi-level Feature Fusion

Wang Junyi, Yue Qi
{"title":"Simultaneous Scene-independent Camera Localization and Category-level Object Pose Estimation via Multi-level Feature Fusion","authors":"Wang Junyi, Yue Qi","doi":"10.1109/VR55154.2023.00041","DOIUrl":null,"url":null,"abstract":"In AR/MR applications, camera localization and object pose estimation both play crucial roles. The universality of learning techniques, often referred to as scene-independent localization and category-level pose estimation, presents challenges for both tasks. The two missions maintain close relationships due to the spatial geometry constraint, but differing task requirements result in distinct feature extraction. In this paper, we focus on simultaneous scene-independent camera localization and category-level object pose estimation with a unified learning framework. The system consists of a localization branch called SLO-LocNet, a pose estimation branch called SLO-ObjNet, a feature fusion module for feature sharing between two tasks, and two decoders for creating coordinate maps. In SLO-LocNet, localization features are produced for anticipating the relative pose between two adjusted frames using inputs of color and depth images. Furthermore, we establish an image fusion module in order to promote feature sharing in depth and color branches. With SLO-ObjNet, we take the detected depth image and its corresponding point cloud as inputs, and produce object pose features for pose estimation. A geometry fusion module is created to combine depth and point cloud information simultaneously. Between the two tasks, the image fusion module is also exploited to accomplish feature sharing. In terms of the loss function, we present a mixed optimization function that is composed of the relative camera pose, geometry constraint, absolute and relative object pose terms. To verify how well our algorithm could perform, we conduct experiments on both localization and pose estimation datasets, covering 7 Scenes, ScanNet, REAL275 and YCB-Video. All experiments demonstrate superior performance to other existing methods. We specifically train the network on ScanNet and test it on 7 Scenes to demonstrate the universality performance. Additionally, the positive effects of fusion modules and loss function are also demonstrated.","PeriodicalId":346767,"journal":{"name":"2023 IEEE Conference Virtual Reality and 3D User Interfaces (VR)","volume":" 32","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE Conference Virtual Reality and 3D User Interfaces (VR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/VR55154.2023.00041","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In AR/MR applications, camera localization and object pose estimation both play crucial roles. The universality of learning techniques, often referred to as scene-independent localization and category-level pose estimation, presents challenges for both tasks. The two missions maintain close relationships due to the spatial geometry constraint, but differing task requirements result in distinct feature extraction. In this paper, we focus on simultaneous scene-independent camera localization and category-level object pose estimation with a unified learning framework. The system consists of a localization branch called SLO-LocNet, a pose estimation branch called SLO-ObjNet, a feature fusion module for feature sharing between two tasks, and two decoders for creating coordinate maps. In SLO-LocNet, localization features are produced for anticipating the relative pose between two adjusted frames using inputs of color and depth images. Furthermore, we establish an image fusion module in order to promote feature sharing in depth and color branches. With SLO-ObjNet, we take the detected depth image and its corresponding point cloud as inputs, and produce object pose features for pose estimation. A geometry fusion module is created to combine depth and point cloud information simultaneously. Between the two tasks, the image fusion module is also exploited to accomplish feature sharing. In terms of the loss function, we present a mixed optimization function that is composed of the relative camera pose, geometry constraint, absolute and relative object pose terms. To verify how well our algorithm could perform, we conduct experiments on both localization and pose estimation datasets, covering 7 Scenes, ScanNet, REAL275 and YCB-Video. All experiments demonstrate superior performance to other existing methods. We specifically train the network on ScanNet and test it on 7 Scenes to demonstrate the universality performance. Additionally, the positive effects of fusion modules and loss function are also demonstrated.
基于多层次特征融合的场景无关相机定位和类别级目标姿态估计
在AR/MR应用中,相机定位和物体姿态估计都起着至关重要的作用。学习技术的普遍性,通常被称为场景无关定位和类别级姿态估计,对这两项任务都提出了挑战。由于空间几何的限制,这两个任务保持着密切的关系,但不同的任务要求导致了不同的特征提取。在本文中,我们重点研究了基于统一学习框架的场景无关相机定位和类别级目标姿态估计。该系统由定位分支(SLO-LocNet)、姿态估计分支(SLO-ObjNet)、特征融合模块(用于两个任务之间的特征共享)和两个解码器(用于创建坐标地图)组成。在lo - locnet中,使用颜色和深度图像的输入来产生定位特征,以预测两个调整帧之间的相对姿态。此外,我们建立了图像融合模块,以促进深度和颜色分支的特征共享。利用SLO-ObjNet,我们将检测到的深度图像及其对应的点云作为输入,生成用于姿态估计的目标姿态特征。创建了一个几何融合模块,同时结合深度和点云信息。在这两项任务之间,还利用图像融合模块实现特征共享。在损失函数方面,我们提出了一个由相对相机位姿、几何约束、绝对和相对物体位姿项组成的混合优化函数。为了验证我们的算法的性能,我们在定位和姿态估计数据集上进行了实验,包括7个场景,ScanNet, REAL275和YCB-Video。实验结果表明,该方法的性能优于其他现有方法。我们在ScanNet上对网络进行了专门的训练,并在7个场景上进行了测试,以证明其通用性。此外,还证明了融合模块和损失函数的积极作用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信