Optimising Faster R-CNN Training to Enable Video Camera Compression for Assisted and Automated Driving Systems

2022 2nd International Conference on Robotics, Automation and Artificial Intelligence (RAAI) Pub Date : 2022-12-09 DOI:10.1109/RAAI56146.2022.10092961

V. Donzella, P. H. Chan, A. Huggett

{"title":"Optimising Faster R-CNN Training to Enable Video Camera Compression for Assisted and Automated Driving Systems","authors":"V. Donzella, P. H. Chan, A. Huggett","doi":"10.1109/RAAI56146.2022.10092961","DOIUrl":null,"url":null,"abstract":"Advanced driving assistance systems based on only one camera or one RADAR are evolving into the current assisted and automated driving functions delivering SAE Level 2 and above capabilities. A suite of environmental perception sensors is required to achieve safe and reliable planning and navigation in future vehicles equipped with these capabilities. The sensor suite, based on several cameras, LiDARs, RADARs and ultrasonic sensors, needs to be adequate to provide sufficient (and redundant, depending on the level of driving automation) spatial and temporal coverage of the environment around the vehicle. However, the data amount produced by the sensor suite can easily exceed a few tens of Gb/s, with a single ‘average’ automotive camera producing more than 3 Gb/s. It is therefore important to consider leveraging traditional video compression techniques as well as to investigate novel ones to reduce the amount of video camera data to be transmitted to the vehicle processing unit(s). In this paper, we demonstrate that lossy compression schemes, with high compression ratios (up to 1:1,000) can be applied safely to the camera video data stream when machine learning based object detection is used to consume the sensor data. We show that transfer learning can be used to re-train a deep neural network with H.264 and H.265 compliant compressed data, and it allows the network performance to be optimised based on the compression level of the generated sensor data. Moreover, this form of transfer learning improves the neural network performance when evaluating uncompressed data, increasing its robustness to real world variations of the data.","PeriodicalId":190255,"journal":{"name":"2022 2nd International Conference on Robotics, Automation and Artificial Intelligence (RAAI)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 2nd International Conference on Robotics, Automation and Artificial Intelligence (RAAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RAAI56146.2022.10092961","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Advanced driving assistance systems based on only one camera or one RADAR are evolving into the current assisted and automated driving functions delivering SAE Level 2 and above capabilities. A suite of environmental perception sensors is required to achieve safe and reliable planning and navigation in future vehicles equipped with these capabilities. The sensor suite, based on several cameras, LiDARs, RADARs and ultrasonic sensors, needs to be adequate to provide sufficient (and redundant, depending on the level of driving automation) spatial and temporal coverage of the environment around the vehicle. However, the data amount produced by the sensor suite can easily exceed a few tens of Gb/s, with a single ‘average’ automotive camera producing more than 3 Gb/s. It is therefore important to consider leveraging traditional video compression techniques as well as to investigate novel ones to reduce the amount of video camera data to be transmitted to the vehicle processing unit(s). In this paper, we demonstrate that lossy compression schemes, with high compression ratios (up to 1:1,000) can be applied safely to the camera video data stream when machine learning based object detection is used to consume the sensor data. We show that transfer learning can be used to re-train a deep neural network with H.264 and H.265 compliant compressed data, and it allows the network performance to be optimised based on the compression level of the generated sensor data. Moreover, this form of transfer learning improves the neural network performance when evaluating uncompressed data, increasing its robustness to real world variations of the data.

查看原文本刊更多论文

优化更快的R-CNN训练，使视频摄像机压缩辅助和自动驾驶系统

仅基于一个摄像头或一个雷达的高级驾驶辅助系统正在发展成为当前的辅助和自动驾驶功能，可提供SAE 2级及以上的功能。在未来配备这些功能的车辆中，需要一套环境感知传感器来实现安全可靠的规划和导航。该传感器套件基于多个摄像头、激光雷达、雷达和超声波传感器，需要提供足够的空间和时间覆盖车辆周围的环境，这取决于驾驶自动化水平。然而，传感器套件产生的数据量很容易超过几十Gb/s，单个“平均”汽车摄像头产生的数据量超过3gb /s。因此，考虑利用传统的视频压缩技术以及研究新的技术来减少传输到车辆处理单元的视频摄像机数据量是很重要的。在本文中，我们证明了当使用基于机器学习的对象检测来消耗传感器数据时，具有高压缩比(高达1:10 00)的有损压缩方案可以安全地应用于摄像机视频数据流。我们表明，迁移学习可以用来重新训练一个深度神经网络，使用H.264和H.265兼容的压缩数据，它允许网络性能根据生成的传感器数据的压缩水平进行优化。此外，这种形式的迁移学习提高了神经网络在评估未压缩数据时的性能，增加了其对真实世界数据变化的鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 2nd International Conference on Robotics, Automation and Artificial Intelligence (RAAI)

自引率

0.00%

发文量