Théo Petitjean, Zongwei Wu, C. Demonceaux, O. Laligant
{"title":"基于RGB-D自适应后期融合的稳健6D姿态估计","authors":"Théo Petitjean, Zongwei Wu, C. Demonceaux, O. Laligant","doi":"10.1117/12.2690943","DOIUrl":null,"url":null,"abstract":"RGB-D 6D pose estimation has recently gained significant research attention due to the complementary information provided by depth data. However, in real-world scenarios, especially in industrial applications, the depth and color images are often more noisy1 . 2 Existing methods typically employ fusion designs that equally average RGB and depth features, which may not be optimal. In this paper, we propose a novel fusion design that adaptively merges RGB-D cues. Our approach involves assigning two learnable weight α1 and α2 to adjust the RGB and depth contributions with respect to the network depth. This enables us to improve the robustness against low-quality depth input in a simple yet effective manner. We conducted extensive experiments on the 6D pose estimation benchmark and demonstrated the effectiveness of our method. We evaluated our network in conjunction with DenseFusion on two datasets (LineMod3 and YCB4) using similar noise scenarios to verify the usefulness of reinforcing the fusion with the α1 and α2 parameters. Our experiments show that our method outperforms existing methods, particularly in low-quality depth input scenarios. We plan to make our source code publicly available for future research.","PeriodicalId":295011,"journal":{"name":"International Conference on Quality Control by Artificial Vision","volume":"12749 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"OLF: RGB-D adaptive late fusion for robust 6D pose estimation\",\"authors\":\"Théo Petitjean, Zongwei Wu, C. Demonceaux, O. Laligant\",\"doi\":\"10.1117/12.2690943\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"RGB-D 6D pose estimation has recently gained significant research attention due to the complementary information provided by depth data. However, in real-world scenarios, especially in industrial applications, the depth and color images are often more noisy1 . 2 Existing methods typically employ fusion designs that equally average RGB and depth features, which may not be optimal. In this paper, we propose a novel fusion design that adaptively merges RGB-D cues. Our approach involves assigning two learnable weight α1 and α2 to adjust the RGB and depth contributions with respect to the network depth. This enables us to improve the robustness against low-quality depth input in a simple yet effective manner. We conducted extensive experiments on the 6D pose estimation benchmark and demonstrated the effectiveness of our method. We evaluated our network in conjunction with DenseFusion on two datasets (LineMod3 and YCB4) using similar noise scenarios to verify the usefulness of reinforcing the fusion with the α1 and α2 parameters. Our experiments show that our method outperforms existing methods, particularly in low-quality depth input scenarios. We plan to make our source code publicly available for future research.\",\"PeriodicalId\":295011,\"journal\":{\"name\":\"International Conference on Quality Control by Artificial Vision\",\"volume\":\"12749 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Quality Control by Artificial Vision\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1117/12.2690943\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Quality Control by Artificial Vision","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2690943","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
OLF: RGB-D adaptive late fusion for robust 6D pose estimation
RGB-D 6D pose estimation has recently gained significant research attention due to the complementary information provided by depth data. However, in real-world scenarios, especially in industrial applications, the depth and color images are often more noisy1 . 2 Existing methods typically employ fusion designs that equally average RGB and depth features, which may not be optimal. In this paper, we propose a novel fusion design that adaptively merges RGB-D cues. Our approach involves assigning two learnable weight α1 and α2 to adjust the RGB and depth contributions with respect to the network depth. This enables us to improve the robustness against low-quality depth input in a simple yet effective manner. We conducted extensive experiments on the 6D pose estimation benchmark and demonstrated the effectiveness of our method. We evaluated our network in conjunction with DenseFusion on two datasets (LineMod3 and YCB4) using similar noise scenarios to verify the usefulness of reinforcing the fusion with the α1 and α2 parameters. Our experiments show that our method outperforms existing methods, particularly in low-quality depth input scenarios. We plan to make our source code publicly available for future research.