Shearlet Transform Based Prediction Scheme for Light Field Compression

2018 Data Compression Conference Pub Date : 2018-03-27 DOI:10.1109/DCC.2018.00049

Waqas Ahmad, Suren Vagharshakyan, Mårten Sjöström, A. Gotchev, R. Bregović, R. Olsson

{"title":"Shearlet Transform Based Prediction Scheme for Light Field Compression","authors":"Waqas Ahmad, Suren Vagharshakyan, Mårten Sjöström, A. Gotchev, R. Bregović, R. Olsson","doi":"10.1109/DCC.2018.00049","DOIUrl":null,"url":null,"abstract":"Light field acquisition technologies capture angular and spatial information of the scene. The spatial and angular information enables various post processing applications, e.g. 3D scene reconstruction, refocusing, synthetic aperture etc at the expense of an increased data size. In this paper, we present a novel prediction tool for compression of light field data acquired with multiple camera system. The captured light field (LF) can be described using two plane parametrization as, L(u, v, s, t), where (u, v) represents each view image plane coordinates and (s, t) represents the coordinates of the capturing plane. In the proposed scheme, the captured LF is uniformly decimated by a factor d in both directions (in s and t coordinates), resulting in a sparse set of views also referred to as key views. The key views are converted into a pseudo video sequence and compressed using high efficiency video coding (HEVC). The shearlet transform based reconstruction approach, presented in [1], is used at the decoder side to predict the decimated views with the help of the key views. Four LF images (Truck, Bunny from Stanford dataset, Set2 and Set9 from High Density Camera Array dataset) are used in the experiments. Input LF views are converted into a pseudo video sequence and compressed with HEVC to serve as anchor. Rate distortion analysis shows the average PSNR gain of 0.98 dB over the anchor scheme. Moreover, in low bit-rates, the compression efficiency of the proposed scheme is higher compared to the anchor and on the other hand the performance of the anchor is better in high bit-rates. Different compression response of the proposed and anchor scheme is a consequence of their utilization of input information. In the high bit-rate scenario, high quality residual information enables the anchor to achieve efficient compression. On the contrary, the shearlet transform relies on key views to predict the decimated views without incorporating residual information. Hence, it has inherit reconstruction error. In the low bit-rate scenario, the bit budget of the proposed compression scheme allows the encoder to achieve high quality for the key views. The HEVC anchor scheme distributes the same bit budget among all the input LF views that results in degradation of the overall visual quality. The sensitivity of human vision system toward compression artifacts in low-bit-rate cases favours the proposed compression scheme over the anchor scheme.","PeriodicalId":137206,"journal":{"name":"2018 Data Compression Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 Data Compression Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.2018.00049","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

Abstract

Light field acquisition technologies capture angular and spatial information of the scene. The spatial and angular information enables various post processing applications, e.g. 3D scene reconstruction, refocusing, synthetic aperture etc at the expense of an increased data size. In this paper, we present a novel prediction tool for compression of light field data acquired with multiple camera system. The captured light field (LF) can be described using two plane parametrization as, L(u, v, s, t), where (u, v) represents each view image plane coordinates and (s, t) represents the coordinates of the capturing plane. In the proposed scheme, the captured LF is uniformly decimated by a factor d in both directions (in s and t coordinates), resulting in a sparse set of views also referred to as key views. The key views are converted into a pseudo video sequence and compressed using high efficiency video coding (HEVC). The shearlet transform based reconstruction approach, presented in [1], is used at the decoder side to predict the decimated views with the help of the key views. Four LF images (Truck, Bunny from Stanford dataset, Set2 and Set9 from High Density Camera Array dataset) are used in the experiments. Input LF views are converted into a pseudo video sequence and compressed with HEVC to serve as anchor. Rate distortion analysis shows the average PSNR gain of 0.98 dB over the anchor scheme. Moreover, in low bit-rates, the compression efficiency of the proposed scheme is higher compared to the anchor and on the other hand the performance of the anchor is better in high bit-rates. Different compression response of the proposed and anchor scheme is a consequence of their utilization of input information. In the high bit-rate scenario, high quality residual information enables the anchor to achieve efficient compression. On the contrary, the shearlet transform relies on key views to predict the decimated views without incorporating residual information. Hence, it has inherit reconstruction error. In the low bit-rate scenario, the bit budget of the proposed compression scheme allows the encoder to achieve high quality for the key views. The HEVC anchor scheme distributes the same bit budget among all the input LF views that results in degradation of the overall visual quality. The sensitivity of human vision system toward compression artifacts in low-bit-rate cases favours the proposed compression scheme over the anchor scheme.

查看原文本刊更多论文

基于Shearlet变换的光场压缩预测方案

光场采集技术捕获场景的角度和空间信息。空间和角度信息支持各种后处理应用，例如3D场景重建，重新对焦，合成光圈等，但代价是增加数据大小。本文提出了一种用于多摄像机采集的光场数据压缩的新型预测工具。捕获的光场(LF)可以用两个平面参数化来描述，L(u, v, s, t)，其中(u, v)表示每个视图图像平面坐标，(s, t)表示捕获平面坐标。在提出的方案中，捕获的LF在两个方向上(在s和t坐标上)被一个因子d统一抽取，从而产生一个稀疏的视图集，也称为关键视图。将关键视图转换成伪视频序列，并使用高效视频编码(HEVC)进行压缩。[1]中提出的基于shearlet变换的重建方法在解码器端使用关键视图来预测抽取视图。实验中使用了四幅LF图像(来自Stanford数据集的Truck, Bunny，来自High Density Camera Array数据集的Set2和Set9)。输入的LF视图被转换成一个伪视频序列，用HEVC压缩作为锚点。速率失真分析表明，锚点方案的平均PSNR增益为0.98 dB。此外，在低比特率下，该方案的压缩效率高于锚点压缩，而在高比特率下，锚点压缩的性能则更好。所提方案和锚定方案的不同压缩响应是它们对输入信息利用的结果。在高比特率场景下，高质量的残余信息使锚点能够实现高效压缩。相反，shearlet变换依赖于关键视图来预测抽取视图，而不包含残差信息。因此，它继承了重构误差。在低比特率场景下，所提出的压缩方案的比特预算允许编码器实现关键视图的高质量。HEVC锚点方案在所有输入的LF视图中分配相同的位预算，导致整体视觉质量下降。在低比特率情况下，人类视觉系统对压缩伪影的敏感性使所提出的压缩方案优于锚点方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 Data Compression Conference

自引率

0.00%

发文量