Daniele Bonatto, Grégoire Hirt, Alexander Kvasov, Sarah Fachada, G. Lafruit
{"title":"MPEG Immersive Video tools for Light Field Head Mounted Displays","authors":"Daniele Bonatto, Grégoire Hirt, Alexander Kvasov, Sarah Fachada, G. Lafruit","doi":"10.1109/VCIP53242.2021.9675317","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675317","url":null,"abstract":"Light field displays project hundreds of micro-parallax views for users to perceive 3D without wearing glasses. It results in gigantic bandwidth requirements if all views would be transmitted, even using conventional video compression per view. MPEG Immersive Video (MIV) follows a smarter strategy by transmitting only key images and some metadata to synthesize all the missing views. We developed (and will demonstrate) a real-time Depth Image Based Rendering software that follows this approach for synthesizing all light field micro-parallax views from a couple of RGBD input views.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122519132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning in Compressed Domain for Faster Machine Vision Tasks","authors":"Jinming Liu, Heming Sun, J. Katto","doi":"10.1109/VCIP53242.2021.9675369","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675369","url":null,"abstract":"Learned image compression (LIC) has illustrated good ability for reconstruction quality driven tasks (e.g. PSNR, MS-SSIM) and machine vision tasks such as image understanding. However, most LIC frameworks are based on pixel domain, which requires the decoding process. In this paper, we develop a learned compressed domain framework for machine vision tasks. 1) By sending the compressed latent representation directly to the task network, the decoding computation can be eliminated to reduce the complexity. 2) By sorting the latent channels by entropy, only selective channels will be transmitted to the task network, which can reduce the bitrate. As a result, compared with the traditional pixel domain methods, we can reduce about 1/3 multiply-add operations (MACs) and 1/5 inference time while keeping the same accuracy. Moreover, proposed channel selection can contribute to at most 6.8% bitrate saving.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115869058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Reda Kaafarani, Médéric Blestel, Thomas Maugey, M. Ropert, A. Roumy
{"title":"Evaluation Of Bitrate Ladders For Versatile Video Coder","authors":"Reda Kaafarani, Médéric Blestel, Thomas Maugey, M. Ropert, A. Roumy","doi":"10.1109/VCIP53242.2021.9675425","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675425","url":null,"abstract":"Many video service providers take advantage of bitrate ladders in adaptive HTTP video streaming to account for different network states and user display specifications by providing bitrate/resolution pairs that best fit client's network conditions and display capabilities. These bitrate ladders, however, differ when using different codecs and thus the couples bitrate/resolution differ as well. In addition, bitrate ladders are based on previously available codecs (H.264/MPEG4-AVC, HEVC, etc.), i.e. codecs that are already in service, hence the introduction of new codecs e.g. Versatile Video Coding (VVC) requires re-analyzing these ladders. For that matter, we will analyze the evolution of the bitrate ladder when using VVC. We show how VVC impacts this ladder when compared to HEVC and H.264/AVC and in particular, that there is no need to switch to lower resolutions at the lower bitrates defined in the Call for Evidence on Transcoding for Network Distributed Video Coding (CfE).","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128403688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-camera system for placing the viewer between the players of a live sports match: Blind Review","authors":"","doi":"10.1109/VCIP53242.2021.9675336","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675336","url":null,"abstract":"We demonstrate a new capture system that allows generation of virtual views corresponding with a virtual camera that is placed between the players on a sports field. Our depth estimation and segmentation pipeline can reduce 2K resolution views from 16 cameras to patches in a single 4K resolution texture atlas. We have created a real time, WebGL 2 based, playback application that renders an arbitrary view from the 4K atlas. The application allows a user to change viewpoint in real time. Additionally, to interpret the scene, a user can also remove objects such as a player or the ball. At the conference we will demonstrate both the automatic multi-camera conversion pipeline and the real-time rendering/object removal on a smartphone.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127288781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lu Wang, Jianfeng Sun, Hui Yuan, R. Hamzaoui, Xiaohui Wang
{"title":"Kalman filter-based prediction refinement and quality enhancement for geometry-based point cloud compression","authors":"Lu Wang, Jianfeng Sun, Hui Yuan, R. Hamzaoui, Xiaohui Wang","doi":"10.1109/VCIP53242.2021.9675412","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675412","url":null,"abstract":"A point cloud is a set of points representing a three-dimensional (3D) object or scene. To compress a point cloud, the Motion Picture Experts Group (MPEG) geometry-based point cloud compression (G-PCC) scheme may use three attribute coding methods: region adaptive hierarchical transform (RAHT), predicting transform (PT), and lifting transform (LT). To improve the coding efficiency of PT, we propose to use a Kalman filter to refine the predicted attribute values. We also apply a Kalman filter to improve the quality of the reconstructed attribute values at the decoder side. Experimental results show that the combination of the two proposed methods can achieve an average Bjøntegaard delta bitrate of −0.48%, −5.18%, and −6.27% for the Luma, Chroma Cb, and Chroma Cr components, respectively, compared with a recent G-PCC reference software.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131834071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gang Zhang, Haoquan Wang, Yedong Wang, Haijie Shen
{"title":"Attention-guided Convolutional Neural Network for Lightweight JPEG Compression Artifacts Removal","authors":"Gang Zhang, Haoquan Wang, Yedong Wang, Haijie Shen","doi":"10.1109/VCIP53242.2021.9675320","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675320","url":null,"abstract":"JPEG compression artifacts seriously affect the viewing experience. While previous studies mainly focused on the deep convolutional networks for compression artifacts removal, of which the model size and inference speed limit their application prospects. In order to solve the above problems, this paper proposed two methods that can improve the training performance of the compact convolution network without slowing down its inference speed. Firstly, a fully explainable attention loss is designed to guide the network for training, which is calculated by local entropy to accurately locate compression artifacts. Secondly, Fully Expanded Block (FEB) is proposed to replace the convolutional layer in compact network, which can be contracted back to a normal convolutional layer after the training process is completed. Extensive experiments demonstrate that the proposed method outperforms the existing lightweight methods in terms of performance and inference speed.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133344768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vivien Boussard, S. Coulombe, F. Coudoux, P. Corlay, Anthony Trioux
{"title":"CRC-Based Multi-Error Correction of H.265 Encoded Videos in Wireless Communications","authors":"Vivien Boussard, S. Coulombe, F. Coudoux, P. Corlay, Anthony Trioux","doi":"10.1109/VCIP53242.2021.9675400","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675400","url":null,"abstract":"This paper analyzes the benefits of extending CRC-based error correction (CRC-EC) to handle more errors in the context of error-prone wireless networks. In the literature, CRC-EC has been used to correct up to 3 binary errors per packet. We first present a theoretical analysis of the CRC-EC candidate list while increasing the number of errors considered. We then analyze the candidate list reduction resulting from subsequent checksum validation and video decoding steps. Simulations conducted on two wireless networks show that the network considered has a huge impact on CRC-EC performance. Over a Bluetooth low energy (BLE) channel with Eb/No=8 dB, an average PSNR improvement of 4.4 dB on videos is achieved when CRC-EC corrects up to 5, rather than 3 errors per packet.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"129 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132757244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hewei Liu, Shuyuan Zhu, Ruiqin Xiong, Guanghui Liu, B. Zeng
{"title":"Cross-Block Difference Guided Fast CU Partition for VVC Intra Coding","authors":"Hewei Liu, Shuyuan Zhu, Ruiqin Xiong, Guanghui Liu, B. Zeng","doi":"10.1109/VCIP53242.2021.9675409","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675409","url":null,"abstract":"In this paper, we propose a new fast CU partition method for VVC intra coding based on the cross-block difference. This difference is measured by the gradient and the content of sub-blocks obtained from partition and is employed to guide the skipping of unnecessary horizontal and vertical partition modes. With this guidance, a fast determination of block partitions is accordingly achieved. Compared with VVC, our proposed method can save 41.64% (on average) encoding time with only 0.97% (on average) increase of BD-rate.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129510306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Action Recognition Improved by Correlations and Attention of Subjects and Scene","authors":"Manh-Hung Ha, O. Chen","doi":"10.1109/VCIP53242.2021.9675340","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675340","url":null,"abstract":"Comprehensive activity understanding of multiple subjects in a video requires subject detection, action identification, and behavior interpretation as well as the interactions among subjects and background. This work develops the action recognition of subject(s) based on the correlations and interactions of the whole scene and subject(s) by using the Deep Neural Network (DNN). The proposed DNN consists of 3D Convolutional Neural Network (CNN), Spatial Attention (SA) generation layer, mapping convolutional fused-depth layer, Transformer Encoder (TE), and two fully connected layers with late fusion for final classification. Especially, the attention mechanisms in SA and TE are implemented to find out meaningful action information on spatial and temporal domains for enhancing recognition performance, respectively. The experimental results reveal that the proposed DNN shows the superior accuracies of 97.8%, 98.4% and 85.6% in the datasets of traffic police, UCF101-24 and JHMDB-21, respectively. Therefore, our DNN is an outstanding classifier for various action recognitions involving one or multiple subjects.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131363603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Nearly Reversible Image-to-Image Translation Using Joint Inter-Frame Coding and Embedding","authors":"Xinzhu Cao, Yuanzhi Yao, Nenghai Yu","doi":"10.1109/VCIP53242.2021.9675370","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675370","url":null,"abstract":"Image-to-image translation tasks which have been widely investigated with generative adversarial networks (GAN) aim to map an image from the source domain to the target domain. The translated image can be inversely mapped to the reconstructed source image. However, existing GAN-based schemes lack the ability to accomplish reversible translation. To remedy this drawback, a nearly reversible image-to-image translation scheme where the reconstructed source image is approximately distortion-free compared with the corresponding source image is proposed in this paper. The proposed scheme jointly considers inter-frame coding and embedding. Firstly, we organize the GAN-generated reconstructed source image and the source image into a pseudo video. Furthermore, the bitstream obtained by inter-frame coding is reversibly embedded in the translated image for nearly lossless source image reconstruction. Extensive experimental results and analysis demonstrate that the proposed scheme can achieve a high level of performance in image quality and security.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"343 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124234169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}