Wenyu Chen , Mu Chen , Jian Fang , Huaici Zhao , Guogang Wang
{"title":"MonoCAPE:利用坐标感知位置嵌入进行单目三维物体检测","authors":"Wenyu Chen , Mu Chen , Jian Fang , Huaici Zhao , Guogang Wang","doi":"10.1016/j.compeleceng.2024.109781","DOIUrl":null,"url":null,"abstract":"<div><div>3D monocular detection remains to be a focal point of research, particularly due to its capacity to deliver available precision under conditions of low cost and simplified configurations, making it especially valuable in fields like autonomous driving. Current 3D object detection methods often overlook the spatial information missing from images, which is critical to spatial perception, and optimize bounding box attributes separately, failing to meet the requirements of autonomous driving. We introduce MonoCAPE, a novel 3D detection framework addressing these issues by encoding spatial information and co-optimizing attributes through a Coordinate-Aware Position Encoding (CAPE) Generator and a Task Co-optimization Strategy (TCS). The CAPE Generator produces sparse positional embeddings, enabling spatial awareness with low computational cost, while the TCS utilizes Gaussian modeling to prevent suboptimal outputs. In this way, our framework comprehensively takes into account what existing approaches ignore. Extensive experiments on the KITTI dataset demonstrate MonoCAPE significantly improves <span><math><mrow><mi>A</mi><msub><mrow><mi>P</mi></mrow><mrow><mn>3</mn><mi>D</mi></mrow></msub></mrow></math></span> and <span><math><mrow><mi>A</mi><msub><mrow><mi>P</mi></mrow><mrow><mi>B</mi><mi>E</mi><mi>V</mi></mrow></msub></mrow></math></span> metrics compared to existing advanced methods.</div></div>","PeriodicalId":50630,"journal":{"name":"Computers & Electrical Engineering","volume":"120 ","pages":"Article 109781"},"PeriodicalIF":4.0000,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MonoCAPE: Monocular 3D object detection with coordinate-aware position embeddings\",\"authors\":\"Wenyu Chen , Mu Chen , Jian Fang , Huaici Zhao , Guogang Wang\",\"doi\":\"10.1016/j.compeleceng.2024.109781\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>3D monocular detection remains to be a focal point of research, particularly due to its capacity to deliver available precision under conditions of low cost and simplified configurations, making it especially valuable in fields like autonomous driving. Current 3D object detection methods often overlook the spatial information missing from images, which is critical to spatial perception, and optimize bounding box attributes separately, failing to meet the requirements of autonomous driving. We introduce MonoCAPE, a novel 3D detection framework addressing these issues by encoding spatial information and co-optimizing attributes through a Coordinate-Aware Position Encoding (CAPE) Generator and a Task Co-optimization Strategy (TCS). The CAPE Generator produces sparse positional embeddings, enabling spatial awareness with low computational cost, while the TCS utilizes Gaussian modeling to prevent suboptimal outputs. In this way, our framework comprehensively takes into account what existing approaches ignore. Extensive experiments on the KITTI dataset demonstrate MonoCAPE significantly improves <span><math><mrow><mi>A</mi><msub><mrow><mi>P</mi></mrow><mrow><mn>3</mn><mi>D</mi></mrow></msub></mrow></math></span> and <span><math><mrow><mi>A</mi><msub><mrow><mi>P</mi></mrow><mrow><mi>B</mi><mi>E</mi><mi>V</mi></mrow></msub></mrow></math></span> metrics compared to existing advanced methods.</div></div>\",\"PeriodicalId\":50630,\"journal\":{\"name\":\"Computers & Electrical Engineering\",\"volume\":\"120 \",\"pages\":\"Article 109781\"},\"PeriodicalIF\":4.0000,\"publicationDate\":\"2024-10-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Electrical Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0045790624007080\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Electrical Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0045790624007080","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
MonoCAPE: Monocular 3D object detection with coordinate-aware position embeddings
3D monocular detection remains to be a focal point of research, particularly due to its capacity to deliver available precision under conditions of low cost and simplified configurations, making it especially valuable in fields like autonomous driving. Current 3D object detection methods often overlook the spatial information missing from images, which is critical to spatial perception, and optimize bounding box attributes separately, failing to meet the requirements of autonomous driving. We introduce MonoCAPE, a novel 3D detection framework addressing these issues by encoding spatial information and co-optimizing attributes through a Coordinate-Aware Position Encoding (CAPE) Generator and a Task Co-optimization Strategy (TCS). The CAPE Generator produces sparse positional embeddings, enabling spatial awareness with low computational cost, while the TCS utilizes Gaussian modeling to prevent suboptimal outputs. In this way, our framework comprehensively takes into account what existing approaches ignore. Extensive experiments on the KITTI dataset demonstrate MonoCAPE significantly improves and metrics compared to existing advanced methods.
期刊介绍:
The impact of computers has nowhere been more revolutionary than in electrical engineering. The design, analysis, and operation of electrical and electronic systems are now dominated by computers, a transformation that has been motivated by the natural ease of interface between computers and electrical systems, and the promise of spectacular improvements in speed and efficiency.
Published since 1973, Computers & Electrical Engineering provides rapid publication of topical research into the integration of computer technology and computational techniques with electrical and electronic systems. The journal publishes papers featuring novel implementations of computers and computational techniques in areas like signal and image processing, high-performance computing, parallel processing, and communications. Special attention will be paid to papers describing innovative architectures, algorithms, and software tools.