A. Hanel, Uwe Stilla
{"title":"基于卷积神经网络检测交通标志的车载摄像头迭代标定","authors":"A. Hanel, Uwe Stilla","doi":"10.5220/0006711201870195","DOIUrl":null,"url":null,"abstract":"Intrinsic camera parameters are estimated during calibration typically using special reference patterns. Mechanical and thermal effects might cause the parameters to change over time, requiring iterative calibration. For vehicle cameras, reference information needed therefore has to be extracted from the scenario, as reference patterns are not available on public streets. In this contribution, a method for iterative camera calibration using scale references extracted from traffic signs is proposed. Traffic signs are detected in images recorded during driving using a convolutional neural network. Multiple detections are reduced by mean shift clustering, before the shape of each sign is fitted robustly with RANSAC. Unique image points along the shape contour together with the metric size of the traffic sign are included iteratively in the bundle adjustment performed for camera calibration. The neural network is trained and validated with over 50,000 images of traffic signs. The iterative calibration is tested with an image sequence of an urban scenario showing traffic signs. The results show that the estimated parameters vary in the first iterations, until they converge to stable values after several iterations. The standard deviations are comparable to the initial calibration with a reference pattern. 1 CALIBRATION OF CAMERAS FOR ADVANCED DRIVER ASSISTANCE SYSTEMS In recent years, an increasing number and capability (figure 1) of advanced driver assistance systems per vehicle can be observed (Shapiro, 2017), what is also reflected by the continuously growing sales of needed electronic control units in cars (AlixPartners, 2015). For capturing the scenario in and around the car for advanced driver assistance systems, different sensors are used (Dempsey, 2016). Ultrasonic sensors in the front and rear bumper can capture the close scenario in front and behind the car to avoid collisions during parking maneuvers. Radar sensors can be distinguished by their operating range. Cross traffic warnings can be realized with a short-range radar system with a range up to 30 m. A cruise control system adapting the speed of the ego-car dependent on preceding cars is used typically in highways scenarios, wherefore long-range radar systems with a range of more than 200 m are suitable. Pedestrian detection systems are typically used in urban scenarios with moderate speeds driven, requiring medium-range sensors like a LiDAR or a camera (Ors, 2017). During development of a new car model, costs are Figure 1: Traffic signs detected in images of a vehicle camera (field of view in blue) can be used to warn the driver against speed limits or other traffic regulations. These detections can be also used to iteratively calibrate the camera (Auto Body Professionals Club, 2017). an important design factor regarding customer acceptance. As the type of sensors installed in a car influences the total costs of advanced driver assistance systems, cameras with lower costs than for example LiDAR or radar sensors (e.g. BMW 7 series parts Hanel, A. and Stilla, U. Iterative Calibration of a Vehicle Camera using Traffic Signs Detected by a Convolutional Neural Network. DOI: 10.5220/0006711201870195 In Proceedings of the 4th International Conference on Vehicle Technology and Intelligent Transport Systems (VEHITS 2018), pages 187-195 ISBN: 978-989-758-293-6 Copyright c © 2019 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved 187 catalogue: (bmwfans.info, 2017)) are interesting to consider. Images of an optical camera installed in a car can be used for detections on the one hand and for measurements on the other hand. Detecting a preceding car and measuring the relative distance to the ego-car are application examples, respectively. The accuracy and reliability of the measurements depend on the quality of the sensor, in this case the optical camera. For this purpose, a camera calibration can be performed. Several authors have already worked on the calibration of vehicular cameras recording the environment (Broggi et al., 2001) (Bellino et al., 2005) (Ribeiro et al., 2006). Their works can be distinguished by the estimated parameters: either the estimation of the intrinsic parameters (e.g. (Heng et al., 2013)) or the estimation of the extrinsic parameters (e.g. (Friel et al., 2012)), typically relative to a vehicular coordinate system (Hanel and Stilla, 2017). Their works can also be distinguished based on the calibration method: either a calibration method using a specific calibration pattern in the acquired scene (Bellino et al., 2005) (Hanel et al., 2016) or an auto-calibration method without the need of a special calibration pattern (Heng et al., 2013) (Bovyrin and Kozlov, 2017). Basis for auto-calibration is in many cases the detection of road markings in images (Ribeiro et al., 2006) (Paula et al., 2014) providing image points for camera calibration. Typically, the road markings are shown in the lower half of an image of a vehicle camera, making it impossible to conclude on the distortions in the upper image half. Furthermore, it can’t be assumed that in all scenarios road markings are available, for example on narrow roads or in parking garages. Another frequent type of objects in street scenarios are traffic signs. In the field of view of a vehicular camera, they are typically shown in the upper half of the image. As well as for road markings, the shape and size of traffic signs are standardized (e.g. (Department of Transport Ireland, 2010)), allowing to obtain 3d object coordinates corresponding to the extracted image points for camera calibration. When the car is being driven, a rough street surface can cause vibrations in the car or the sun can heat up its interior. Both mechanical and thermal effects can influence the intrinsic parameters of cameras installed in the car over time (Dang et al., 2009) (Smith and Cope, 2010). Therefore, it is recommended to update the calibration parameters iteratively to have valid parameter values for images recorded during a longer drive. Especially scale information has a strong influence on the estimated parameters (Luhmann et al., 2013). Therefore, in this contribution a method to iteratively estimate the intrinsic parameters using scale references extracted from images of traffic signs using a convolutional neural network is proposed. The remainder of this paper is organized as follows: in section 2 the processing steps to extract the scale references from images of traffic signs and to perform the iterative camera calibration are described. Section 3 shows the experimental setup and data used to test the proposed method. In section 4 the results of the camera calibration are described and discussed. Section 5 concludes the paper. 2 INITIAL AND ITERATIVE CAMERA CALIBRATION This section is divided into two parts. In subsection 2.1, the process for initial camera calibration to obtain initial intrinsic parameter values is described. This step is designated to be performed before a vehicle equipped with a camera is driven on public streets. In subsection 2.2, the details of the method for extracting scale references and including them into the iterative camera calibration are shown. This step is designated to be performed iteratively when and during the vehicle is driven on public streets. 2.1 Initial Camera Calibration Objective of the initial camera calibration is to provide estimates for the intrinsic parameters including distortion parameters of the vehicle camera. A central perspective camera model is used. According to (Hastedt et al., 2016), a central perspective camera model is valid also for wide-angle action cameras, which are due to their low costs and small size interesting for automotive use, if the manufacturerprovided distortion correction has been applied to the images. This correction reduces the largest part of distortions in the image, so that only small parts remain, which can be modelled by a central perspective camera model. Additionally, in the case of distortioncorrected images, it is valid to use a planar calibration pattern. As these authors have further reported difficulties in estimating the decentering distortion parameters, they are not considered. The estimated intrinsic parameters of the camera are x′ 0,y ′ 0 as principal point, c ′ as focal length, and three radial-symmetric distortion parameters according to the model of Brown (Brown, 1971) (equations 1, 2): xdist,rad = x · (1+ k1r + k2r + k3r) (1) ydist,rad = y · (1+ k1r + k2r + k3r) (2) VEHITS 2018 4th International Conference on Vehicle Technology and Intelligent Transport Systems","PeriodicalId":218840,"journal":{"name":"International Conference on Vehicle Technology and Intelligent Transport Systems","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Iterative Calibration of a Vehicle Camera using Traffic Signs Detected by a Convolutional Neural Network\",\"authors\":\"A. Hanel, Uwe Stilla\",\"doi\":\"10.5220/0006711201870195\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Intrinsic camera parameters are estimated during calibration typically using special reference patterns. Mechanical and thermal effects might cause the parameters to change over time, requiring iterative calibration. For vehicle cameras, reference information needed therefore has to be extracted from the scenario, as reference patterns are not available on public streets. In this contribution, a method for iterative camera calibration using scale references extracted from traffic signs is proposed. Traffic signs are detected in images recorded during driving using a convolutional neural network. Multiple detections are reduced by mean shift clustering, before the shape of each sign is fitted robustly with RANSAC. Unique image points along the shape contour together with the metric size of the traffic sign are included iteratively in the bundle adjustment performed for camera calibration. The neural network is trained and validated with over 50,000 images of traffic signs. The iterative calibration is tested with an image sequence of an urban scenario showing traffic signs. The results show that the estimated parameters vary in the first iterations, until they converge to stable values after several iterations. The standard deviations are comparable to the initial calibration with a reference pattern. 1 CALIBRATION OF CAMERAS FOR ADVANCED DRIVER ASSISTANCE SYSTEMS In recent years, an increasing number and capability (figure 1) of advanced driver assistance systems per vehicle can be observed (Shapiro, 2017), what is also reflected by the continuously growing sales of needed electronic control units in cars (AlixPartners, 2015). For capturing the scenario in and around the car for advanced driver assistance systems, different sensors are used (Dempsey, 2016). Ultrasonic sensors in the front and rear bumper can capture the close scenario in front and behind the car to avoid collisions during parking maneuvers. Radar sensors can be distinguished by their operating range. Cross traffic warnings can be realized with a short-range radar system with a range up to 30 m. A cruise control system adapting the speed of the ego-car dependent on preceding cars is used typically in highways scenarios, wherefore long-range radar systems with a range of more than 200 m are suitable. Pedestrian detection systems are typically used in urban scenarios with moderate speeds driven, requiring medium-range sensors like a LiDAR or a camera (Ors, 2017). During development of a new car model, costs are Figure 1: Traffic signs detected in images of a vehicle camera (field of view in blue) can be used to warn the driver against speed limits or other traffic regulations. These detections can be also used to iteratively calibrate the camera (Auto Body Professionals Club, 2017). an important design factor regarding customer acceptance. As the type of sensors installed in a car influences the total costs of advanced driver assistance systems, cameras with lower costs than for example LiDAR or radar sensors (e.g. BMW 7 series parts Hanel, A. and Stilla, U. Iterative Calibration of a Vehicle Camera using Traffic Signs Detected by a Convolutional Neural Network. DOI: 10.5220/0006711201870195 In Proceedings of the 4th International Conference on Vehicle Technology and Intelligent Transport Systems (VEHITS 2018), pages 187-195 ISBN: 978-989-758-293-6 Copyright c © 2019 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved 187 catalogue: (bmwfans.info, 2017)) are interesting to consider. Images of an optical camera installed in a car can be used for detections on the one hand and for measurements on the other hand. Detecting a preceding car and measuring the relative distance to the ego-car are application examples, respectively. The accuracy and reliability of the measurements depend on the quality of the sensor, in this case the optical camera. For this purpose, a camera calibration can be performed. Several authors have already worked on the calibration of vehicular cameras recording the environment (Broggi et al., 2001) (Bellino et al., 2005) (Ribeiro et al., 2006). Their works can be distinguished by the estimated parameters: either the estimation of the intrinsic parameters (e.g. (Heng et al., 2013)) or the estimation of the extrinsic parameters (e.g. (Friel et al., 2012)), typically relative to a vehicular coordinate system (Hanel and Stilla, 2017). Their works can also be distinguished based on the calibration method: either a calibration method using a specific calibration pattern in the acquired scene (Bellino et al., 2005) (Hanel et al., 2016) or an auto-calibration method without the need of a special calibration pattern (Heng et al., 2013) (Bovyrin and Kozlov, 2017). Basis for auto-calibration is in many cases the detection of road markings in images (Ribeiro et al., 2006) (Paula et al., 2014) providing image points for camera calibration. Typically, the road markings are shown in the lower half of an image of a vehicle camera, making it impossible to conclude on the distortions in the upper image half. Furthermore, it can’t be assumed that in all scenarios road markings are available, for example on narrow roads or in parking garages. Another frequent type of objects in street scenarios are traffic signs. In the field of view of a vehicular camera, they are typically shown in the upper half of the image. As well as for road markings, the shape and size of traffic signs are standardized (e.g. (Department of Transport Ireland, 2010)), allowing to obtain 3d object coordinates corresponding to the extracted image points for camera calibration. When the car is being driven, a rough street surface can cause vibrations in the car or the sun can heat up its interior. Both mechanical and thermal effects can influence the intrinsic parameters of cameras installed in the car over time (Dang et al., 2009) (Smith and Cope, 2010). Therefore, it is recommended to update the calibration parameters iteratively to have valid parameter values for images recorded during a longer drive. Especially scale information has a strong influence on the estimated parameters (Luhmann et al., 2013). Therefore, in this contribution a method to iteratively estimate the intrinsic parameters using scale references extracted from images of traffic signs using a convolutional neural network is proposed. The remainder of this paper is organized as follows: in section 2 the processing steps to extract the scale references from images of traffic signs and to perform the iterative camera calibration are described. Section 3 shows the experimental setup and data used to test the proposed method. In section 4 the results of the camera calibration are described and discussed. Section 5 concludes the paper. 2 INITIAL AND ITERATIVE CAMERA CALIBRATION This section is divided into two parts. In subsection 2.1, the process for initial camera calibration to obtain initial intrinsic parameter values is described. This step is designated to be performed before a vehicle equipped with a camera is driven on public streets. In subsection 2.2, the details of the method for extracting scale references and including them into the iterative camera calibration are shown. This step is designated to be performed iteratively when and during the vehicle is driven on public streets. 2.1 Initial Camera Calibration Objective of the initial camera calibration is to provide estimates for the intrinsic parameters including distortion parameters of the vehicle camera. A central perspective camera model is used. According to (Hastedt et al., 2016), a central perspective camera model is valid also for wide-angle action cameras, which are due to their low costs and small size interesting for automotive use, if the manufacturerprovided distortion correction has been applied to the images. This correction reduces the largest part of distortions in the image, so that only small parts remain, which can be modelled by a central perspective camera model. Additionally, in the case of distortioncorrected images, it is valid to use a planar calibration pattern. As these authors have further reported difficulties in estimating the decentering distortion parameters, they are not considered. The estimated intrinsic parameters of the camera are x′ 0,y ′ 0 as principal point, c ′ as focal length, and three radial-symmetric distortion parameters according to the model of Brown (Brown, 1971) (equations 1, 2): xdist,rad = x · (1+ k1r + k2r + k3r) (1) ydist,rad = y · (1+ k1r + k2r + k3r) (2) VEHITS 2018 4th International Conference on Vehicle Technology and Intelligent Transport Systems\",\"PeriodicalId\":218840,\"journal\":{\"name\":\"International Conference on Vehicle Technology and Intelligent Transport Systems\",\"volume\":\"40 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-05-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Vehicle Technology and Intelligent Transport Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5220/0006711201870195\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Vehicle Technology and Intelligent Transport Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5220/0006711201870195","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Iterative Calibration of a Vehicle Camera using Traffic Signs Detected by a Convolutional Neural Network
Intrinsic camera parameters are estimated during calibration typically using special reference patterns. Mechanical and thermal effects might cause the parameters to change over time, requiring iterative calibration. For vehicle cameras, reference information needed therefore has to be extracted from the scenario, as reference patterns are not available on public streets. In this contribution, a method for iterative camera calibration using scale references extracted from traffic signs is proposed. Traffic signs are detected in images recorded during driving using a convolutional neural network. Multiple detections are reduced by mean shift clustering, before the shape of each sign is fitted robustly with RANSAC. Unique image points along the shape contour together with the metric size of the traffic sign are included iteratively in the bundle adjustment performed for camera calibration. The neural network is trained and validated with over 50,000 images of traffic signs. The iterative calibration is tested with an image sequence of an urban scenario showing traffic signs. The results show that the estimated parameters vary in the first iterations, until they converge to stable values after several iterations. The standard deviations are comparable to the initial calibration with a reference pattern. 1 CALIBRATION OF CAMERAS FOR ADVANCED DRIVER ASSISTANCE SYSTEMS In recent years, an increasing number and capability (figure 1) of advanced driver assistance systems per vehicle can be observed (Shapiro, 2017), what is also reflected by the continuously growing sales of needed electronic control units in cars (AlixPartners, 2015). For capturing the scenario in and around the car for advanced driver assistance systems, different sensors are used (Dempsey, 2016). Ultrasonic sensors in the front and rear bumper can capture the close scenario in front and behind the car to avoid collisions during parking maneuvers. Radar sensors can be distinguished by their operating range. Cross traffic warnings can be realized with a short-range radar system with a range up to 30 m. A cruise control system adapting the speed of the ego-car dependent on preceding cars is used typically in highways scenarios, wherefore long-range radar systems with a range of more than 200 m are suitable. Pedestrian detection systems are typically used in urban scenarios with moderate speeds driven, requiring medium-range sensors like a LiDAR or a camera (Ors, 2017). During development of a new car model, costs are Figure 1: Traffic signs detected in images of a vehicle camera (field of view in blue) can be used to warn the driver against speed limits or other traffic regulations. These detections can be also used to iteratively calibrate the camera (Auto Body Professionals Club, 2017). an important design factor regarding customer acceptance. As the type of sensors installed in a car influences the total costs of advanced driver assistance systems, cameras with lower costs than for example LiDAR or radar sensors (e.g. BMW 7 series parts Hanel, A. and Stilla, U. Iterative Calibration of a Vehicle Camera using Traffic Signs Detected by a Convolutional Neural Network. DOI: 10.5220/0006711201870195 In Proceedings of the 4th International Conference on Vehicle Technology and Intelligent Transport Systems (VEHITS 2018), pages 187-195 ISBN: 978-989-758-293-6 Copyright c © 2019 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved 187 catalogue: (bmwfans.info, 2017)) are interesting to consider. Images of an optical camera installed in a car can be used for detections on the one hand and for measurements on the other hand. Detecting a preceding car and measuring the relative distance to the ego-car are application examples, respectively. The accuracy and reliability of the measurements depend on the quality of the sensor, in this case the optical camera. For this purpose, a camera calibration can be performed. Several authors have already worked on the calibration of vehicular cameras recording the environment (Broggi et al., 2001) (Bellino et al., 2005) (Ribeiro et al., 2006). Their works can be distinguished by the estimated parameters: either the estimation of the intrinsic parameters (e.g. (Heng et al., 2013)) or the estimation of the extrinsic parameters (e.g. (Friel et al., 2012)), typically relative to a vehicular coordinate system (Hanel and Stilla, 2017). Their works can also be distinguished based on the calibration method: either a calibration method using a specific calibration pattern in the acquired scene (Bellino et al., 2005) (Hanel et al., 2016) or an auto-calibration method without the need of a special calibration pattern (Heng et al., 2013) (Bovyrin and Kozlov, 2017). Basis for auto-calibration is in many cases the detection of road markings in images (Ribeiro et al., 2006) (Paula et al., 2014) providing image points for camera calibration. Typically, the road markings are shown in the lower half of an image of a vehicle camera, making it impossible to conclude on the distortions in the upper image half. Furthermore, it can’t be assumed that in all scenarios road markings are available, for example on narrow roads or in parking garages. Another frequent type of objects in street scenarios are traffic signs. In the field of view of a vehicular camera, they are typically shown in the upper half of the image. As well as for road markings, the shape and size of traffic signs are standardized (e.g. (Department of Transport Ireland, 2010)), allowing to obtain 3d object coordinates corresponding to the extracted image points for camera calibration. When the car is being driven, a rough street surface can cause vibrations in the car or the sun can heat up its interior. Both mechanical and thermal effects can influence the intrinsic parameters of cameras installed in the car over time (Dang et al., 2009) (Smith and Cope, 2010). Therefore, it is recommended to update the calibration parameters iteratively to have valid parameter values for images recorded during a longer drive. Especially scale information has a strong influence on the estimated parameters (Luhmann et al., 2013). Therefore, in this contribution a method to iteratively estimate the intrinsic parameters using scale references extracted from images of traffic signs using a convolutional neural network is proposed. The remainder of this paper is organized as follows: in section 2 the processing steps to extract the scale references from images of traffic signs and to perform the iterative camera calibration are described. Section 3 shows the experimental setup and data used to test the proposed method. In section 4 the results of the camera calibration are described and discussed. Section 5 concludes the paper. 2 INITIAL AND ITERATIVE CAMERA CALIBRATION This section is divided into two parts. In subsection 2.1, the process for initial camera calibration to obtain initial intrinsic parameter values is described. This step is designated to be performed before a vehicle equipped with a camera is driven on public streets. In subsection 2.2, the details of the method for extracting scale references and including them into the iterative camera calibration are shown. This step is designated to be performed iteratively when and during the vehicle is driven on public streets. 2.1 Initial Camera Calibration Objective of the initial camera calibration is to provide estimates for the intrinsic parameters including distortion parameters of the vehicle camera. A central perspective camera model is used. According to (Hastedt et al., 2016), a central perspective camera model is valid also for wide-angle action cameras, which are due to their low costs and small size interesting for automotive use, if the manufacturerprovided distortion correction has been applied to the images. This correction reduces the largest part of distortions in the image, so that only small parts remain, which can be modelled by a central perspective camera model. Additionally, in the case of distortioncorrected images, it is valid to use a planar calibration pattern. As these authors have further reported difficulties in estimating the decentering distortion parameters, they are not considered. The estimated intrinsic parameters of the camera are x′ 0,y ′ 0 as principal point, c ′ as focal length, and three radial-symmetric distortion parameters according to the model of Brown (Brown, 1971) (equations 1, 2): xdist,rad = x · (1+ k1r + k2r + k3r) (1) ydist,rad = y · (1+ k1r + k2r + k3r) (2) VEHITS 2018 4th International Conference on Vehicle Technology and Intelligent Transport Systems