Shang-Lin Yu, Thomas Westfechtel, Ryunosuke Hamada, K. Ohno, S. Tadokoro
{"title":"基于卷积神经网络的鸟瞰高程图像车辆检测与定位","authors":"Shang-Lin Yu, Thomas Westfechtel, Ryunosuke Hamada, K. Ohno, S. Tadokoro","doi":"10.1109/SSRR.2017.8088147","DOIUrl":null,"url":null,"abstract":"For autonomous vehicles, the ability to detect and localize surrounding vehicles is critical. It is fundamental for further processing steps like collision avoidance or path planning. This paper introduces a convolutional neural network- based vehicle detection and localization method using point cloud data acquired by a LIDAR sensor. Acquired point clouds are transformed into bird's eye view elevation images, where each pixel represents a grid cell of the horizontal x-y plane. We intentionally encode each pixel using three channels, namely the maximal, median and minimal height value of all points within the respective grid. A major advantage of this three channel representation is that it allows us to utilize common RGB image-based detection networks without modification. The bird's eye view elevation images are processed by a two stage detector. Due to the nature of the bird's eye view, each pixel of the image represent ground coordinates, meaning that the bounding box of detected vehicles correspond directly to the horizontal position of the vehicles. Therefore, in contrast to RGB-based detectors, we not just detect the vehicles, but simultaneously localize them in ground coordinates. To evaluate the accuracy of our method and the usefulness for further high-level applications like path planning, we evaluate the detection results based on the localization error in ground coordinates. Our proposed method achieves an average precision of 87.9% for an intersection over union (IoU) value of 0.5. In addition, 75% of the detected cars are localized with an absolute positioning error of below 0.2m.","PeriodicalId":403881,"journal":{"name":"2017 IEEE International Symposium on Safety, Security and Rescue Robotics (SSRR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"47","resultStr":"{\"title\":\"Vehicle detection and localization on bird's eye view elevation images using convolutional neural network\",\"authors\":\"Shang-Lin Yu, Thomas Westfechtel, Ryunosuke Hamada, K. Ohno, S. Tadokoro\",\"doi\":\"10.1109/SSRR.2017.8088147\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"For autonomous vehicles, the ability to detect and localize surrounding vehicles is critical. It is fundamental for further processing steps like collision avoidance or path planning. This paper introduces a convolutional neural network- based vehicle detection and localization method using point cloud data acquired by a LIDAR sensor. Acquired point clouds are transformed into bird's eye view elevation images, where each pixel represents a grid cell of the horizontal x-y plane. We intentionally encode each pixel using three channels, namely the maximal, median and minimal height value of all points within the respective grid. A major advantage of this three channel representation is that it allows us to utilize common RGB image-based detection networks without modification. The bird's eye view elevation images are processed by a two stage detector. Due to the nature of the bird's eye view, each pixel of the image represent ground coordinates, meaning that the bounding box of detected vehicles correspond directly to the horizontal position of the vehicles. Therefore, in contrast to RGB-based detectors, we not just detect the vehicles, but simultaneously localize them in ground coordinates. To evaluate the accuracy of our method and the usefulness for further high-level applications like path planning, we evaluate the detection results based on the localization error in ground coordinates. Our proposed method achieves an average precision of 87.9% for an intersection over union (IoU) value of 0.5. In addition, 75% of the detected cars are localized with an absolute positioning error of below 0.2m.\",\"PeriodicalId\":403881,\"journal\":{\"name\":\"2017 IEEE International Symposium on Safety, Security and Rescue Robotics (SSRR)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"47\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE International Symposium on Safety, Security and Rescue Robotics (SSRR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SSRR.2017.8088147\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Symposium on Safety, Security and Rescue Robotics (SSRR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SSRR.2017.8088147","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Vehicle detection and localization on bird's eye view elevation images using convolutional neural network
For autonomous vehicles, the ability to detect and localize surrounding vehicles is critical. It is fundamental for further processing steps like collision avoidance or path planning. This paper introduces a convolutional neural network- based vehicle detection and localization method using point cloud data acquired by a LIDAR sensor. Acquired point clouds are transformed into bird's eye view elevation images, where each pixel represents a grid cell of the horizontal x-y plane. We intentionally encode each pixel using three channels, namely the maximal, median and minimal height value of all points within the respective grid. A major advantage of this three channel representation is that it allows us to utilize common RGB image-based detection networks without modification. The bird's eye view elevation images are processed by a two stage detector. Due to the nature of the bird's eye view, each pixel of the image represent ground coordinates, meaning that the bounding box of detected vehicles correspond directly to the horizontal position of the vehicles. Therefore, in contrast to RGB-based detectors, we not just detect the vehicles, but simultaneously localize them in ground coordinates. To evaluate the accuracy of our method and the usefulness for further high-level applications like path planning, we evaluate the detection results based on the localization error in ground coordinates. Our proposed method achieves an average precision of 87.9% for an intersection over union (IoU) value of 0.5. In addition, 75% of the detected cars are localized with an absolute positioning error of below 0.2m.