Iterative Calibration of a Vehicle Camera using Traffic Signs Detected by a Convolutional Neural Network
A. Hanel, Uwe Stilla
{"title":"Iterative Calibration of a Vehicle Camera using Traffic Signs Detected by a Convolutional Neural Network","authors":"A. Hanel, Uwe Stilla","doi":"10.5220/0006711201870195","DOIUrl":null,"url":null,"abstract":"Intrinsic camera parameters are estimated during calibration typically using special reference patterns. Mechanical and thermal effects might cause the parameters to change over time, requiring iterative calibration. For vehicle cameras, reference information needed therefore has to be extracted from the scenario, as reference patterns are not available on public streets. In this contribution, a method for iterative camera calibration using scale references extracted from traffic signs is proposed. Traffic signs are detected in images recorded during driving using a convolutional neural network. Multiple detections are reduced by mean shift clustering, before the shape of each sign is fitted robustly with RANSAC. Unique image points along the shape contour together with the metric size of the traffic sign are included iteratively in the bundle adjustment performed for camera calibration. The neural network is trained and validated with over 50,000 images of traffic signs. The iterative calibration is tested with an image sequence of an urban scenario showing traffic signs. The results show that the estimated parameters vary in the first iterations, until they converge to stable values after several iterations. The standard deviations are comparable to the initial calibration with a reference pattern. 1 CALIBRATION OF CAMERAS FOR ADVANCED DRIVER ASSISTANCE SYSTEMS In recent years, an increasing number and capability (figure 1) of advanced driver assistance systems per vehicle can be observed (Shapiro, 2017), what is also reflected by the continuously growing sales of needed electronic control units in cars (AlixPartners, 2015). For capturing the scenario in and around the car for advanced driver assistance systems, different sensors are used (Dempsey, 2016). Ultrasonic sensors in the front and rear bumper can capture the close scenario in front and behind the car to avoid collisions during parking maneuvers. Radar sensors can be distinguished by their operating range. Cross traffic warnings can be realized with a short-range radar system with a range up to 30 m. A cruise control system adapting the speed of the ego-car dependent on preceding cars is used typically in highways scenarios, wherefore long-range radar systems with a range of more than 200 m are suitable. Pedestrian detection systems are typically used in urban scenarios with moderate speeds driven, requiring medium-range sensors like a LiDAR or a camera (Ors, 2017). During development of a new car model, costs are Figure 1: Traffic signs detected in images of a vehicle camera (field of view in blue) can be used to warn the driver against speed limits or other traffic regulations. These detections can be also used to iteratively calibrate the camera (Auto Body Professionals Club, 2017). an important design factor regarding customer acceptance. As the type of sensors installed in a car influences the total costs of advanced driver assistance systems, cameras with lower costs than for example LiDAR or radar sensors (e.g. BMW 7 series parts Hanel, A. and Stilla, U. Iterative Calibration of a Vehicle Camera using Traffic Signs Detected by a Convolutional Neural Network. DOI: 10.5220/0006711201870195 In Proceedings of the 4th International Conference on Vehicle Technology and Intelligent Transport Systems (VEHITS 2018), pages 187-195 ISBN: 978-989-758-293-6 Copyright c © 2019 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved 187 catalogue: (bmwfans.info, 2017)) are interesting to consider. Images of an optical camera installed in a car can be used for detections on the one hand and for measurements on the other hand. Detecting a preceding car and measuring the relative distance to the ego-car are application examples, respectively. The accuracy and reliability of the measurements depend on the quality of the sensor, in this case the optical camera. For this purpose, a camera calibration can be performed. Several authors have already worked on the calibration of vehicular cameras recording the environment (Broggi et al., 2001) (Bellino et al., 2005) (Ribeiro et al., 2006). Their works can be distinguished by the estimated parameters: either the estimation of the intrinsic parameters (e.g. (Heng et al., 2013)) or the estimation of the extrinsic parameters (e.g. (Friel et al., 2012)), typically relative to a vehicular coordinate system (Hanel and Stilla, 2017). Their works can also be distinguished based on the calibration method: either a calibration method using a specific calibration pattern in the acquired scene (Bellino et al., 2005) (Hanel et al., 2016) or an auto-calibration method without the need of a special calibration pattern (Heng et al., 2013) (Bovyrin and Kozlov, 2017). Basis for auto-calibration is in many cases the detection of road markings in images (Ribeiro et al., 2006) (Paula et al., 2014) providing image points for camera calibration. Typically, the road markings are shown in the lower half of an image of a vehicle camera, making it impossible to conclude on the distortions in the upper image half. Furthermore, it can’t be assumed that in all scenarios road markings are available, for example on narrow roads or in parking garages. Another frequent type of objects in street scenarios are traffic signs. In the field of view of a vehicular camera, they are typically shown in the upper half of the image. As well as for road markings, the shape and size of traffic signs are standardized (e.g. (Department of Transport Ireland, 2010)), allowing to obtain 3d object coordinates corresponding to the extracted image points for camera calibration. When the car is being driven, a rough street surface can cause vibrations in the car or the sun can heat up its interior. Both mechanical and thermal effects can influence the intrinsic parameters of cameras installed in the car over time (Dang et al., 2009) (Smith and Cope, 2010). Therefore, it is recommended to update the calibration parameters iteratively to have valid parameter values for images recorded during a longer drive. Especially scale information has a strong influence on the estimated parameters (Luhmann et al., 2013). Therefore, in this contribution a method to iteratively estimate the intrinsic parameters using scale references extracted from images of traffic signs using a convolutional neural network is proposed. The remainder of this paper is organized as follows: in section 2 the processing steps to extract the scale references from images of traffic signs and to perform the iterative camera calibration are described. Section 3 shows the experimental setup and data used to test the proposed method. In section 4 the results of the camera calibration are described and discussed. Section 5 concludes the paper. 2 INITIAL AND ITERATIVE CAMERA CALIBRATION This section is divided into two parts. In subsection 2.1, the process for initial camera calibration to obtain initial intrinsic parameter values is described. This step is designated to be performed before a vehicle equipped with a camera is driven on public streets. In subsection 2.2, the details of the method for extracting scale references and including them into the iterative camera calibration are shown. This step is designated to be performed iteratively when and during the vehicle is driven on public streets. 2.1 Initial Camera Calibration Objective of the initial camera calibration is to provide estimates for the intrinsic parameters including distortion parameters of the vehicle camera. A central perspective camera model is used. According to (Hastedt et al., 2016), a central perspective camera model is valid also for wide-angle action cameras, which are due to their low costs and small size interesting for automotive use, if the manufacturerprovided distortion correction has been applied to the images. This correction reduces the largest part of distortions in the image, so that only small parts remain, which can be modelled by a central perspective camera model. Additionally, in the case of distortioncorrected images, it is valid to use a planar calibration pattern. As these authors have further reported difficulties in estimating the decentering distortion parameters, they are not considered. The estimated intrinsic parameters of the camera are x′ 0,y ′ 0 as principal point, c ′ as focal length, and three radial-symmetric distortion parameters according to the model of Brown (Brown, 1971) (equations 1, 2): xdist,rad = x · (1+ k1r + k2r + k3r) (1) ydist,rad = y · (1+ k1r + k2r + k3r) (2) VEHITS 2018 4th International Conference on Vehicle Technology and Intelligent Transport Systems","PeriodicalId":218840,"journal":{"name":"International Conference on Vehicle Technology and Intelligent Transport Systems","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Vehicle Technology and Intelligent Transport Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5220/0006711201870195","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Intrinsic camera parameters are estimated during calibration typically using special reference patterns. Mechanical and thermal effects might cause the parameters to change over time, requiring iterative calibration. For vehicle cameras, reference information needed therefore has to be extracted from the scenario, as reference patterns are not available on public streets. In this contribution, a method for iterative camera calibration using scale references extracted from traffic signs is proposed. Traffic signs are detected in images recorded during driving using a convolutional neural network. Multiple detections are reduced by mean shift clustering, before the shape of each sign is fitted robustly with RANSAC. Unique image points along the shape contour together with the metric size of the traffic sign are included iteratively in the bundle adjustment performed for camera calibration. The neural network is trained and validated with over 50,000 images of traffic signs. The iterative calibration is tested with an image sequence of an urban scenario showing traffic signs. The results show that the estimated parameters vary in the first iterations, until they converge to stable values after several iterations. The standard deviations are comparable to the initial calibration with a reference pattern. 1 CALIBRATION OF CAMERAS FOR ADVANCED DRIVER ASSISTANCE SYSTEMS In recent years, an increasing number and capability (figure 1) of advanced driver assistance systems per vehicle can be observed (Shapiro, 2017), what is also reflected by the continuously growing sales of needed electronic control units in cars (AlixPartners, 2015). For capturing the scenario in and around the car for advanced driver assistance systems, different sensors are used (Dempsey, 2016). Ultrasonic sensors in the front and rear bumper can capture the close scenario in front and behind the car to avoid collisions during parking maneuvers. Radar sensors can be distinguished by their operating range. Cross traffic warnings can be realized with a short-range radar system with a range up to 30 m. A cruise control system adapting the speed of the ego-car dependent on preceding cars is used typically in highways scenarios, wherefore long-range radar systems with a range of more than 200 m are suitable. Pedestrian detection systems are typically used in urban scenarios with moderate speeds driven, requiring medium-range sensors like a LiDAR or a camera (Ors, 2017). During development of a new car model, costs are Figure 1: Traffic signs detected in images of a vehicle camera (field of view in blue) can be used to warn the driver against speed limits or other traffic regulations. These detections can be also used to iteratively calibrate the camera (Auto Body Professionals Club, 2017). an important design factor regarding customer acceptance. As the type of sensors installed in a car influences the total costs of advanced driver assistance systems, cameras with lower costs than for example LiDAR or radar sensors (e.g. BMW 7 series parts Hanel, A. and Stilla, U. Iterative Calibration of a Vehicle Camera using Traffic Signs Detected by a Convolutional Neural Network. DOI: 10.5220/0006711201870195 In Proceedings of the 4th International Conference on Vehicle Technology and Intelligent Transport Systems (VEHITS 2018), pages 187-195 ISBN: 978-989-758-293-6 Copyright c © 2019 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved 187 catalogue: (bmwfans.info, 2017)) are interesting to consider. Images of an optical camera installed in a car can be used for detections on the one hand and for measurements on the other hand. Detecting a preceding car and measuring the relative distance to the ego-car are application examples, respectively. The accuracy and reliability of the measurements depend on the quality of the sensor, in this case the optical camera. For this purpose, a camera calibration can be performed. Several authors have already worked on the calibration of vehicular cameras recording the environment (Broggi et al., 2001) (Bellino et al., 2005) (Ribeiro et al., 2006). Their works can be distinguished by the estimated parameters: either the estimation of the intrinsic parameters (e.g. (Heng et al., 2013)) or the estimation of the extrinsic parameters (e.g. (Friel et al., 2012)), typically relative to a vehicular coordinate system (Hanel and Stilla, 2017). Their works can also be distinguished based on the calibration method: either a calibration method using a specific calibration pattern in the acquired scene (Bellino et al., 2005) (Hanel et al., 2016) or an auto-calibration method without the need of a special calibration pattern (Heng et al., 2013) (Bovyrin and Kozlov, 2017). Basis for auto-calibration is in many cases the detection of road markings in images (Ribeiro et al., 2006) (Paula et al., 2014) providing image points for camera calibration. Typically, the road markings are shown in the lower half of an image of a vehicle camera, making it impossible to conclude on the distortions in the upper image half. Furthermore, it can’t be assumed that in all scenarios road markings are available, for example on narrow roads or in parking garages. Another frequent type of objects in street scenarios are traffic signs. In the field of view of a vehicular camera, they are typically shown in the upper half of the image. As well as for road markings, the shape and size of traffic signs are standardized (e.g. (Department of Transport Ireland, 2010)), allowing to obtain 3d object coordinates corresponding to the extracted image points for camera calibration. When the car is being driven, a rough street surface can cause vibrations in the car or the sun can heat up its interior. Both mechanical and thermal effects can influence the intrinsic parameters of cameras installed in the car over time (Dang et al., 2009) (Smith and Cope, 2010). Therefore, it is recommended to update the calibration parameters iteratively to have valid parameter values for images recorded during a longer drive. Especially scale information has a strong influence on the estimated parameters (Luhmann et al., 2013). Therefore, in this contribution a method to iteratively estimate the intrinsic parameters using scale references extracted from images of traffic signs using a convolutional neural network is proposed. The remainder of this paper is organized as follows: in section 2 the processing steps to extract the scale references from images of traffic signs and to perform the iterative camera calibration are described. Section 3 shows the experimental setup and data used to test the proposed method. In section 4 the results of the camera calibration are described and discussed. Section 5 concludes the paper. 2 INITIAL AND ITERATIVE CAMERA CALIBRATION This section is divided into two parts. In subsection 2.1, the process for initial camera calibration to obtain initial intrinsic parameter values is described. This step is designated to be performed before a vehicle equipped with a camera is driven on public streets. In subsection 2.2, the details of the method for extracting scale references and including them into the iterative camera calibration are shown. This step is designated to be performed iteratively when and during the vehicle is driven on public streets. 2.1 Initial Camera Calibration Objective of the initial camera calibration is to provide estimates for the intrinsic parameters including distortion parameters of the vehicle camera. A central perspective camera model is used. According to (Hastedt et al., 2016), a central perspective camera model is valid also for wide-angle action cameras, which are due to their low costs and small size interesting for automotive use, if the manufacturerprovided distortion correction has been applied to the images. This correction reduces the largest part of distortions in the image, so that only small parts remain, which can be modelled by a central perspective camera model. Additionally, in the case of distortioncorrected images, it is valid to use a planar calibration pattern. As these authors have further reported difficulties in estimating the decentering distortion parameters, they are not considered. The estimated intrinsic parameters of the camera are x′ 0,y ′ 0 as principal point, c ′ as focal length, and three radial-symmetric distortion parameters according to the model of Brown (Brown, 1971) (equations 1, 2): xdist,rad = x · (1+ k1r + k2r + k3r) (1) ydist,rad = y · (1+ k1r + k2r + k3r) (2) VEHITS 2018 4th International Conference on Vehicle Technology and Intelligent Transport Systems
基于卷积神经网络检测交通标志的车载摄像头迭代标定
在校准过程中,通常使用特殊的参考模式估计相机的固有参数。机械和热效应可能导致参数随时间变化,需要反复校准。对于车载摄像头来说,所需的参考信息必须从场景中提取,因为公共街道上没有参考模式。本文提出了一种基于交通标志尺度参考的迭代摄像机标定方法。使用卷积神经网络在驾驶过程中记录的图像中检测交通标志。在使用RANSAC鲁棒拟合每个符号的形状之前,通过均值移位聚类减少多个检测。沿着形状轮廓的唯一图像点以及交通标志的度量尺寸迭代地包含在用于相机校准的束调整中。该神经网络经过5万多张交通标志图像的训练和验证。用一个城市场景的交通标志图像序列对迭代校准进行了测试。结果表明,在第一次迭代中,估计参数不断变化,经过多次迭代后,估计参数收敛到稳定值。标准偏差与初始校准的参考模式相当。近年来,每辆车的高级驾驶辅助系统的数量和能力(图1)都在不断增加(Shapiro, 2017),这也反映在汽车所需电子控制单元的销量不断增长上(AlixPartners, 2015)。为了捕捉车内和周围的场景,先进的驾驶员辅助系统使用了不同的传感器(Dempsey, 2016)。前保险杠和后保险杠上的超声波传感器可以捕捉汽车前后的近距离场景,以避免在停车时发生碰撞。雷达传感器可以通过它们的工作范围来区分。使用距离30米的近程雷达系统可以实现交叉交通预警。自动驾驶汽车的巡航控制系统通常在高速公路上使用,根据前车的速度来调整速度,因此距离超过200米的远程雷达系统是合适的。行人检测系统通常用于中速行驶的城市场景,需要激光雷达或摄像头等中距离传感器(Ors, 2017)。在新车型的开发过程中,成本如图1所示:在车载摄像头的图像中检测到的交通标志(蓝色的视场)可以用来警告驾驶员注意速度限制或其他交通法规。这些检测也可用于迭代校准相机(Auto Body Professionals Club, 2017)。关于客户接受度的一个重要设计因素。由于安装在汽车上的传感器类型会影响高级驾驶员辅助系统的总成本,因此成本低于激光雷达或雷达传感器(例如宝马7系部件)的摄像头(Hanel, a .和Stilla, U.)使用卷积神经网络检测交通标志的车载摄像头的迭代校准。DOI: 10.5220/0006711201870195第四届车辆技术与智能交通系统国际会议论文集(VEHITS 2018),页187-195 ISBN: 978-989-758-293-6版权所有c©2019 by sciitepress - Science and Technology Publications, Lda。版权所有187目录:(bmwfans.info, 2017))值得考虑。安装在汽车上的光学摄像机的图像一方面可以用于检测,另一方面可以用于测量。检测前车和测量与自车的相对距离分别是应用实例。测量的准确性和可靠性取决于传感器的质量,在这种情况下是光学相机。为此,可以执行相机校准。几位作者已经着手校准记录环境的车载摄像头(Broggi等人,2001年)(Bellino等人,2005年)(Ribeiro等人,2006年)。他们的工作可以通过估计的参数来区分:要么是对内在参数的估计(例如(Heng et al., 2013)),要么是对外在参数的估计(例如(Friel et al., 2012)),通常相对于车辆坐标系(Hanel and Stilla, 2017)。他们的工作也可以根据校准方法来区分:要么是在获取的场景中使用特定校准模式的校准方法(Bellino等人,2005)(Hanel等人,2016),要么是不需要特殊校准模式的自动校准方法(Heng等人,2013)(Bovyrin和Kozlov, 2017)。在许多情况下,自动校准的基础是检测图像中的道路标记(Ribeiro等人,2006)(Paula等人,2014),为相机校准提供图像点。 通常,道路标记显示在车辆摄像头图像的下半部分,因此无法根据上半部分图像的失真情况得出结论。此外,不能假设在所有情况下道路标记都是可用的,例如在狭窄的道路或停车场。街道场景中另一种常见的物体类型是交通标志。在车载相机的视野中,它们通常显示在图像的上半部分。对于道路标记,交通标志的形状和大小都是标准化的(例如(Department of Transport Ireland, 2010)),从而可以获得与提取的图像点对应的3d物体坐标,用于相机校准。当汽车行驶时,粗糙的路面会引起汽车的振动,或者太阳会加热汽车内部。随着时间的推移,机械和热效应都会影响安装在汽车上的摄像头的内在参数(Dang等人,2009)(Smith和Cope, 2010)。因此,建议迭代地更新校准参数,以便在更长的驱动器期间记录的图像具有有效的参数值。尤其是尺度信息对估计参数的影响较大(Luhmann et al., 2013)。因此,本文提出了一种利用卷积神经网络从交通标志图像中提取尺度参考来迭代估计其固有参数的方法。本文的其余部分组织如下:第2节描述了从交通标志图像中提取尺度参考和进行迭代相机标定的处理步骤。第3节展示了用于测试所提出方法的实验设置和数据。在第4节中,对摄像机标定的结果进行了描述和讨论。第五部分总结全文。2初始和迭代相机标定本节分为两部分。在2.1小节中,描述了相机初始标定以获得初始固有参数值的过程。这一步骤是在安装了摄像头的车辆在公共街道上行驶之前进行的。2.2小节详细介绍了提取尺度参考并将其纳入迭代摄像机标定的方法。此步骤指定在车辆在公共街道上行驶时迭代执行。2.1摄像机初始标定摄像机初始标定的目的是对车载摄像机畸变参数等固有参数进行估计。采用中心透视相机模型。根据(Hastedt et al., 2016),中央视角相机模型也适用于广角动作相机,如果制造商提供的畸变校正已应用于图像,则广角动作相机由于其低成本和小尺寸而适合汽车使用。这种校正减少了图像中大部分的扭曲,因此只留下一小部分,这可以通过中央透视相机模型来建模。此外,在畸变校正图像的情况下,使用平面校准模式是有效的。由于这些作者进一步报道了估计偏心畸变参数的困难,因此不考虑它们。根据Brown (Brown, 1971)(方程1,2)的模型,估计相机的固有参数为x ' 0,y ' 0为主点,c '为焦距,以及三个径向对称畸变参数:xdist,rad = x·(1+ k1r + k2r + k3r) (1) ydist,rad = y·(1+ k1r + k2r + k3r) (2) VEHITS 2018第四届车辆技术与智能交通系统国际会议
本文章由计算机程序翻译,如有差异,请以英文原文为准。