CDHN:用于三维关键点估算的跨域幻觉网络

IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Mohammad Zohaib , Milind Gajanan Padalkar , Pietro Morerio , Matteo Taiana , Alessio Del Bue
{"title":"CDHN:用于三维关键点估算的跨域幻觉网络","authors":"Mohammad Zohaib ,&nbsp;Milind Gajanan Padalkar ,&nbsp;Pietro Morerio ,&nbsp;Matteo Taiana ,&nbsp;Alessio Del Bue","doi":"10.1016/j.patcog.2024.111188","DOIUrl":null,"url":null,"abstract":"<div><div>This paper presents a novel method to estimate sparse 3D keypoints from single-view RGB images. Our network is trained in two steps using a knowledge distillation framework. In the first step, the teacher is trained to extract 3D features from point cloud data, which are used in combination with 2D features to estimate the 3D keypoints. In the second step, the teacher teaches the student module to hallucinate the 3D features from RGB images that are similar to those extracted from the point clouds. This procedure helps the network during inference to extract 2D and 3D features directly from images, without requiring point clouds as input. Moreover, the network also predicts a confidence score for every keypoint, which is used to select the valid ones from a set of <em>N</em> predicted keypoints. This allows the prediction of different number of keypoints depending on the object’s geometry. We use the estimated keypoints for computing the relative pose between two views of an object. The results are compared with those of KP-Net and StarMap , which are the state-of-the-art for estimating 3D keypoints from a single-view RGB image. The average angular distance error of our approach (5.94°) is 8.46° and 55.26° lower than that of KP-Net (14.40°) and StarMap (61.20°), respectively.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"160 ","pages":"Article 111188"},"PeriodicalIF":7.5000,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CDHN: Cross-domain hallucination network for 3D keypoints estimation\",\"authors\":\"Mohammad Zohaib ,&nbsp;Milind Gajanan Padalkar ,&nbsp;Pietro Morerio ,&nbsp;Matteo Taiana ,&nbsp;Alessio Del Bue\",\"doi\":\"10.1016/j.patcog.2024.111188\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This paper presents a novel method to estimate sparse 3D keypoints from single-view RGB images. Our network is trained in two steps using a knowledge distillation framework. In the first step, the teacher is trained to extract 3D features from point cloud data, which are used in combination with 2D features to estimate the 3D keypoints. In the second step, the teacher teaches the student module to hallucinate the 3D features from RGB images that are similar to those extracted from the point clouds. This procedure helps the network during inference to extract 2D and 3D features directly from images, without requiring point clouds as input. Moreover, the network also predicts a confidence score for every keypoint, which is used to select the valid ones from a set of <em>N</em> predicted keypoints. This allows the prediction of different number of keypoints depending on the object’s geometry. We use the estimated keypoints for computing the relative pose between two views of an object. The results are compared with those of KP-Net and StarMap , which are the state-of-the-art for estimating 3D keypoints from a single-view RGB image. The average angular distance error of our approach (5.94°) is 8.46° and 55.26° lower than that of KP-Net (14.40°) and StarMap (61.20°), respectively.</div></div>\",\"PeriodicalId\":49713,\"journal\":{\"name\":\"Pattern Recognition\",\"volume\":\"160 \",\"pages\":\"Article 111188\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2024-11-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pattern Recognition\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0031320324009397\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320324009397","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

本文提出了一种从单视角 RGB 图像中估计稀疏三维关键点的新方法。我们的网络采用知识提炼框架,分两步进行训练。第一步,训练教师从点云数据中提取三维特征,结合二维特征来估计三维关键点。在第二步中,教师教授学生模块从 RGB 图像中幻化出与点云提取的特征相似的三维特征。这一步骤有助于网络在推理过程中直接从图像中提取二维和三维特征,而无需将点云作为输入。此外,该网络还能为每个关键点预测一个置信度分数,用于从 N 个预测关键点集合中选出有效的关键点。这样就能根据物体的几何形状预测出不同数量的关键点。我们使用估计的关键点来计算物体两个视图之间的相对姿态。我们将结果与 KP-Net 和 StarMap 的结果进行了比较,后者是从单视角 RGB 图像中估计三维关键点的最先进方法。我们的方法的平均角距离误差(5.94°)比 KP-Net 的(14.40°)和 StarMap 的(61.20°)分别低 8.46°和 55.26°。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
CDHN: Cross-domain hallucination network for 3D keypoints estimation
This paper presents a novel method to estimate sparse 3D keypoints from single-view RGB images. Our network is trained in two steps using a knowledge distillation framework. In the first step, the teacher is trained to extract 3D features from point cloud data, which are used in combination with 2D features to estimate the 3D keypoints. In the second step, the teacher teaches the student module to hallucinate the 3D features from RGB images that are similar to those extracted from the point clouds. This procedure helps the network during inference to extract 2D and 3D features directly from images, without requiring point clouds as input. Moreover, the network also predicts a confidence score for every keypoint, which is used to select the valid ones from a set of N predicted keypoints. This allows the prediction of different number of keypoints depending on the object’s geometry. We use the estimated keypoints for computing the relative pose between two views of an object. The results are compared with those of KP-Net and StarMap , which are the state-of-the-art for estimating 3D keypoints from a single-view RGB image. The average angular distance error of our approach (5.94°) is 8.46° and 55.26° lower than that of KP-Net (14.40°) and StarMap (61.20°), respectively.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Pattern Recognition
Pattern Recognition 工程技术-工程:电子与电气
CiteScore
14.40
自引率
16.20%
发文量
683
审稿时长
5.6 months
期刊介绍: The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信