Mohammad Zohaib , Milind Gajanan Padalkar , Pietro Morerio , Matteo Taiana , Alessio Del Bue
{"title":"CDHN:用于三维关键点估算的跨域幻觉网络","authors":"Mohammad Zohaib , Milind Gajanan Padalkar , Pietro Morerio , Matteo Taiana , Alessio Del Bue","doi":"10.1016/j.patcog.2024.111188","DOIUrl":null,"url":null,"abstract":"<div><div>This paper presents a novel method to estimate sparse 3D keypoints from single-view RGB images. Our network is trained in two steps using a knowledge distillation framework. In the first step, the teacher is trained to extract 3D features from point cloud data, which are used in combination with 2D features to estimate the 3D keypoints. In the second step, the teacher teaches the student module to hallucinate the 3D features from RGB images that are similar to those extracted from the point clouds. This procedure helps the network during inference to extract 2D and 3D features directly from images, without requiring point clouds as input. Moreover, the network also predicts a confidence score for every keypoint, which is used to select the valid ones from a set of <em>N</em> predicted keypoints. This allows the prediction of different number of keypoints depending on the object’s geometry. We use the estimated keypoints for computing the relative pose between two views of an object. The results are compared with those of KP-Net and StarMap , which are the state-of-the-art for estimating 3D keypoints from a single-view RGB image. The average angular distance error of our approach (5.94°) is 8.46° and 55.26° lower than that of KP-Net (14.40°) and StarMap (61.20°), respectively.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"160 ","pages":"Article 111188"},"PeriodicalIF":7.5000,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CDHN: Cross-domain hallucination network for 3D keypoints estimation\",\"authors\":\"Mohammad Zohaib , Milind Gajanan Padalkar , Pietro Morerio , Matteo Taiana , Alessio Del Bue\",\"doi\":\"10.1016/j.patcog.2024.111188\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This paper presents a novel method to estimate sparse 3D keypoints from single-view RGB images. Our network is trained in two steps using a knowledge distillation framework. In the first step, the teacher is trained to extract 3D features from point cloud data, which are used in combination with 2D features to estimate the 3D keypoints. In the second step, the teacher teaches the student module to hallucinate the 3D features from RGB images that are similar to those extracted from the point clouds. This procedure helps the network during inference to extract 2D and 3D features directly from images, without requiring point clouds as input. Moreover, the network also predicts a confidence score for every keypoint, which is used to select the valid ones from a set of <em>N</em> predicted keypoints. This allows the prediction of different number of keypoints depending on the object’s geometry. We use the estimated keypoints for computing the relative pose between two views of an object. The results are compared with those of KP-Net and StarMap , which are the state-of-the-art for estimating 3D keypoints from a single-view RGB image. The average angular distance error of our approach (5.94°) is 8.46° and 55.26° lower than that of KP-Net (14.40°) and StarMap (61.20°), respectively.</div></div>\",\"PeriodicalId\":49713,\"journal\":{\"name\":\"Pattern Recognition\",\"volume\":\"160 \",\"pages\":\"Article 111188\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2024-11-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pattern Recognition\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0031320324009397\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320324009397","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
CDHN: Cross-domain hallucination network for 3D keypoints estimation
This paper presents a novel method to estimate sparse 3D keypoints from single-view RGB images. Our network is trained in two steps using a knowledge distillation framework. In the first step, the teacher is trained to extract 3D features from point cloud data, which are used in combination with 2D features to estimate the 3D keypoints. In the second step, the teacher teaches the student module to hallucinate the 3D features from RGB images that are similar to those extracted from the point clouds. This procedure helps the network during inference to extract 2D and 3D features directly from images, without requiring point clouds as input. Moreover, the network also predicts a confidence score for every keypoint, which is used to select the valid ones from a set of N predicted keypoints. This allows the prediction of different number of keypoints depending on the object’s geometry. We use the estimated keypoints for computing the relative pose between two views of an object. The results are compared with those of KP-Net and StarMap , which are the state-of-the-art for estimating 3D keypoints from a single-view RGB image. The average angular distance error of our approach (5.94°) is 8.46° and 55.26° lower than that of KP-Net (14.40°) and StarMap (61.20°), respectively.
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.