{"title":"DAGM-Mono:用于单目三维重建的可变形注意力引导建模","authors":"Youshaa Murhij, Dmitry Yudin","doi":"10.3103/S1060992X2470005X","DOIUrl":null,"url":null,"abstract":"<p>Accurate 3D pose estimation and shape reconstruction from monocular images is a challenging task in the field of autonomous driving. Our work introduces a novel approach to solve this task for vehicles called Deformable Attention-Guided Modeling for Monocular 3D Reconstruction (DAGM-Mono). Our proposed solution addresses the challenge of detailed shape reconstruction by leveraging deformable attention mechanisms. Specifically, given 2D primitives, DAGM-Mono reconstructs vehicles shapes using deformable attention-guided modeling, considering the relevance between detected objects and vehicle shape priors. Our method introduces two additional loss functions: Chamfer Distance (CD) and Hierarchical Chamfer Distance to enhance the process of shape reconstruction by additionally capturing fine-grained shape details at different scales. Our bi-contextual deformable attention framework estimates 3D object pose, capturing both inter-object relations and scene context. Experiments on the ApolloCar3D dataset demonstrate that DAGM-Mono achieves state-of-the-art performance and significantly enhances the performance of mature monocular 3D object detectors. Code and data are publicly available at: https://github.com/YoushaaMurhij/DAGM-Mono.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"33 2","pages":"144 - 156"},"PeriodicalIF":1.0000,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DAGM-Mono: Deformable Attention-Guided Modeling for Monocular 3D Reconstruction\",\"authors\":\"Youshaa Murhij, Dmitry Yudin\",\"doi\":\"10.3103/S1060992X2470005X\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Accurate 3D pose estimation and shape reconstruction from monocular images is a challenging task in the field of autonomous driving. Our work introduces a novel approach to solve this task for vehicles called Deformable Attention-Guided Modeling for Monocular 3D Reconstruction (DAGM-Mono). Our proposed solution addresses the challenge of detailed shape reconstruction by leveraging deformable attention mechanisms. Specifically, given 2D primitives, DAGM-Mono reconstructs vehicles shapes using deformable attention-guided modeling, considering the relevance between detected objects and vehicle shape priors. Our method introduces two additional loss functions: Chamfer Distance (CD) and Hierarchical Chamfer Distance to enhance the process of shape reconstruction by additionally capturing fine-grained shape details at different scales. Our bi-contextual deformable attention framework estimates 3D object pose, capturing both inter-object relations and scene context. Experiments on the ApolloCar3D dataset demonstrate that DAGM-Mono achieves state-of-the-art performance and significantly enhances the performance of mature monocular 3D object detectors. Code and data are publicly available at: https://github.com/YoushaaMurhij/DAGM-Mono.</p>\",\"PeriodicalId\":721,\"journal\":{\"name\":\"Optical Memory and Neural Networks\",\"volume\":\"33 2\",\"pages\":\"144 - 156\"},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2024-07-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Optical Memory and Neural Networks\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://link.springer.com/article/10.3103/S1060992X2470005X\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"OPTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Optical Memory and Neural Networks","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.3103/S1060992X2470005X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"OPTICS","Score":null,"Total":0}
DAGM-Mono: Deformable Attention-Guided Modeling for Monocular 3D Reconstruction
Accurate 3D pose estimation and shape reconstruction from monocular images is a challenging task in the field of autonomous driving. Our work introduces a novel approach to solve this task for vehicles called Deformable Attention-Guided Modeling for Monocular 3D Reconstruction (DAGM-Mono). Our proposed solution addresses the challenge of detailed shape reconstruction by leveraging deformable attention mechanisms. Specifically, given 2D primitives, DAGM-Mono reconstructs vehicles shapes using deformable attention-guided modeling, considering the relevance between detected objects and vehicle shape priors. Our method introduces two additional loss functions: Chamfer Distance (CD) and Hierarchical Chamfer Distance to enhance the process of shape reconstruction by additionally capturing fine-grained shape details at different scales. Our bi-contextual deformable attention framework estimates 3D object pose, capturing both inter-object relations and scene context. Experiments on the ApolloCar3D dataset demonstrate that DAGM-Mono achieves state-of-the-art performance and significantly enhances the performance of mature monocular 3D object detectors. Code and data are publicly available at: https://github.com/YoushaaMurhij/DAGM-Mono.
期刊介绍:
The journal covers a wide range of issues in information optics such as optical memory, mechanisms for optical data recording and processing, photosensitive materials, optical, optoelectronic and holographic nanostructures, and many other related topics. Papers on memory systems using holographic and biological structures and concepts of brain operation are also included. The journal pays particular attention to research in the field of neural net systems that may lead to a new generation of computional technologies by endowing them with intelligence.