{"title":"用于自动驾驶和地图学习的定制地标表示的自动映射","authors":"Jan-Hendrik Pauls, Benjamin Schmidt, C. Stiller","doi":"10.1109/ICRA48506.2021.9561432","DOIUrl":null,"url":null,"abstract":"While the automatic creation of maps for localization is a widely tackled problem, the automatic inference of higher layers of HD maps is not. Additionally, approaches that learn from maps require richer and more precise landmarks than currently available.In this work, we fuse semantic detections from a monocular camera with depth and orientation estimation from lidar to automatically detect, track and map parametric, semantic map elements. We propose the use of tailored representations that are minimal in the number of parameters, making the map compact and the estimation robust and precise enough to enable map inference even from single frame detections. As examples, we map traffic signs, traffic lights and poles using upright rectangles and cylinders.After robust multi-view optimization, traffic lights and signs have a mean absolute position error of below 10 cm, extent estimates are below 5 cm and orientation MAE is below 6◦. This proves the suitability as automatically generated, pixel-accurate ground truth, reducing the task of ground truth generation from tedious 3D annotation to a post-processing of misdetections.","PeriodicalId":108312,"journal":{"name":"2021 IEEE International Conference on Robotics and Automation (ICRA)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Automatic Mapping of Tailored Landmark Representations for Automated Driving and Map Learning\",\"authors\":\"Jan-Hendrik Pauls, Benjamin Schmidt, C. Stiller\",\"doi\":\"10.1109/ICRA48506.2021.9561432\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"While the automatic creation of maps for localization is a widely tackled problem, the automatic inference of higher layers of HD maps is not. Additionally, approaches that learn from maps require richer and more precise landmarks than currently available.In this work, we fuse semantic detections from a monocular camera with depth and orientation estimation from lidar to automatically detect, track and map parametric, semantic map elements. We propose the use of tailored representations that are minimal in the number of parameters, making the map compact and the estimation robust and precise enough to enable map inference even from single frame detections. As examples, we map traffic signs, traffic lights and poles using upright rectangles and cylinders.After robust multi-view optimization, traffic lights and signs have a mean absolute position error of below 10 cm, extent estimates are below 5 cm and orientation MAE is below 6◦. This proves the suitability as automatically generated, pixel-accurate ground truth, reducing the task of ground truth generation from tedious 3D annotation to a post-processing of misdetections.\",\"PeriodicalId\":108312,\"journal\":{\"name\":\"2021 IEEE International Conference on Robotics and Automation (ICRA)\",\"volume\":\"39 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-05-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Conference on Robotics and Automation (ICRA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICRA48506.2021.9561432\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Robotics and Automation (ICRA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICRA48506.2021.9561432","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Automatic Mapping of Tailored Landmark Representations for Automated Driving and Map Learning
While the automatic creation of maps for localization is a widely tackled problem, the automatic inference of higher layers of HD maps is not. Additionally, approaches that learn from maps require richer and more precise landmarks than currently available.In this work, we fuse semantic detections from a monocular camera with depth and orientation estimation from lidar to automatically detect, track and map parametric, semantic map elements. We propose the use of tailored representations that are minimal in the number of parameters, making the map compact and the estimation robust and precise enough to enable map inference even from single frame detections. As examples, we map traffic signs, traffic lights and poles using upright rectangles and cylinders.After robust multi-view optimization, traffic lights and signs have a mean absolute position error of below 10 cm, extent estimates are below 5 cm and orientation MAE is below 6◦. This proves the suitability as automatically generated, pixel-accurate ground truth, reducing the task of ground truth generation from tedious 3D annotation to a post-processing of misdetections.