Performance Evaluation of Semantic Video Compression using Multi-cue Object Detection

2019 IEEE Applied Imagery Pattern Recognition Workshop (AIPR) Pub Date : 2019-10-01 DOI:10.1109/AIPR47015.2019.9174601

Noor M. Al-Shakarji, F. Bunyak, H. Aliakbarpour, G. Seetharaman, K. Palaniappan

{"title":"Performance Evaluation of Semantic Video Compression using Multi-cue Object Detection","authors":"Noor M. Al-Shakarji, F. Bunyak, H. Aliakbarpour, G. Seetharaman, K. Palaniappan","doi":"10.1109/AIPR47015.2019.9174601","DOIUrl":null,"url":null,"abstract":"Video compression becomes a very important task during real-time aerial surveillance scenarios where limited communication bandwidth and on-board storage greatly restrict air-to-ground and air-to-air communications. In these cases, efficient handling of video data is needed to ensure optimum storage, smoother video transmission, fast and reliable video analysis. Conventional video compression schemes were typically designed for human visual perception rather than automated video analytics. Information loss and artifacts introduced during image/video compression impose serious limitations on the performance of automated video analytics tasks. These limitations are further increased in aerial imagery due to complex background and small size of objects. In this paper, we describe and evaluate a salient region estimation pipeline for aerial imagery to enable adaptive bit-rate allocation during video compression. The salient regions are estimated using a multi-cue moving vehicle detection pipeline, which synergistically fuses complementary appearance and motion cues using deep learning-based object detection and flux tensor-based spatio-temporal filtering approaches. Adaptive compression results using the described multi-cue saliency estimation pipeline are compared against conventional MPEG and JPEG encoding in terms of compression ratio, image quality, and impact on automated video analytics operations. Experimental results on ABQ urban aerial video dataset [1] show that incorporation of contextual information enables high semantic compression ratios of over 2000:1 while preserving image quality for the regions of interest. The proposed pipeline enables better utilization of the limited bandwidth of the air-to-ground or air-to-air network links.","PeriodicalId":167075,"journal":{"name":"2019 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AIPR47015.2019.9174601","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Video compression becomes a very important task during real-time aerial surveillance scenarios where limited communication bandwidth and on-board storage greatly restrict air-to-ground and air-to-air communications. In these cases, efficient handling of video data is needed to ensure optimum storage, smoother video transmission, fast and reliable video analysis. Conventional video compression schemes were typically designed for human visual perception rather than automated video analytics. Information loss and artifacts introduced during image/video compression impose serious limitations on the performance of automated video analytics tasks. These limitations are further increased in aerial imagery due to complex background and small size of objects. In this paper, we describe and evaluate a salient region estimation pipeline for aerial imagery to enable adaptive bit-rate allocation during video compression. The salient regions are estimated using a multi-cue moving vehicle detection pipeline, which synergistically fuses complementary appearance and motion cues using deep learning-based object detection and flux tensor-based spatio-temporal filtering approaches. Adaptive compression results using the described multi-cue saliency estimation pipeline are compared against conventional MPEG and JPEG encoding in terms of compression ratio, image quality, and impact on automated video analytics operations. Experimental results on ABQ urban aerial video dataset [1] show that incorporation of contextual information enables high semantic compression ratios of over 2000:1 while preserving image quality for the regions of interest. The proposed pipeline enables better utilization of the limited bandwidth of the air-to-ground or air-to-air network links.

查看原文本刊更多论文

基于多线索目标检测的语义视频压缩性能评价

在实时空中监视场景中，有限的通信带宽和机载存储极大地限制了空对地和空对空通信，视频压缩成为一项非常重要的任务。在这些情况下，需要对视频数据进行有效的处理，以保证最佳的存储，更流畅的视频传输，快速可靠的视频分析。传统的视频压缩方案通常是为人类视觉感知而不是自动视频分析而设计的。在图像/视频压缩过程中引入的信息丢失和伪影严重限制了自动视频分析任务的性能。由于复杂的背景和小尺寸的对象，这些限制在航空成像中进一步增加。在本文中，我们描述和评估了一个显著区域估计管道，用于航空图像，以实现视频压缩过程中的自适应比特率分配。使用多线索移动车辆检测管道估计突出区域，该管道使用基于深度学习的物体检测和基于通量张量的时空滤波方法协同融合互补的外观和运动线索。使用所描述的多线索显著性估计管道的自适应压缩结果与传统的MPEG和JPEG编码在压缩比、图像质量和对自动视频分析操作的影响方面进行了比较。在ABQ城市航拍视频数据集[1]上的实验结果表明，上下文信息的结合可以实现超过2000:1的高语义压缩比，同时保持感兴趣区域的图像质量。拟议的管道能够更好地利用空对地或空对空网络链路的有限带宽。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)

自引率

0.00%

发文量