Sarah Fachada, Daniele Bonatto, Mehrdad Teratani, G. Lafruit
{"title":"Polynomial Image-Based Rendering for non-Lambertian Objects","authors":"Sarah Fachada, Daniele Bonatto, Mehrdad Teratani, G. Lafruit","doi":"10.1109/VCIP53242.2021.9675371","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675371","url":null,"abstract":"Non-Lambertian objects present an aspect which depends on the viewer's position towards the surrounding scene. Contrary to diffuse objects, their features move non-linearly with the camera, preventing rendering them with existing Depth Image-Based Rendering (DIBR) approaches, or to triangulate their surface with Structure-from-Motion (SfM). In this paper, we propose an extension of the DIBR paradigm to describe these non-linearities, by replacing the depth maps by more complete multi-channel “non-Lambertian maps”, without attempting a 3D reconstruction of the scene. We provide a study of the importance of each coefficient of the proposed map, measuring the trade-off between visual quality and data volume to optimally render non-Lambertian objects. We compare our method to other state-of-the-art image-based rendering methods and outperform them with promising subjective and objective results on a challenging dataset.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117349813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Liping Zhao, Kailun Zhou, Qingyang Zhou, Huihui Wang, Tao Lin
{"title":"An Intra String Copy Approach for SCC in AVS3","authors":"Liping Zhao, Kailun Zhou, Qingyang Zhou, Huihui Wang, Tao Lin","doi":"10.1109/VCIP53242.2021.9675427","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675427","url":null,"abstract":"An efficient SCC tool named Intra String Copy (ISC) has been proposed and adopted in AVS3 recently. ISC has two CU-level sub-modes: FPSP (fully-matching-string and partially-matching-string based string prediction) sub-mode and EUSP (equal-value-string, unit-basis-vector-string and unmatched-pixel-string based string prediction) sub-mode. Compared with the latest AVS3 reference software HPM with SCC tools disabled, using AVS3 SCC Common Test Condition and YUV test sequences in text and graphics with motion (TGM) and mixed content (MC) categories, the proposed tool achieves an average Y BD-rate reduction of 57.7%/39.5% and 77.2%/57.9% for TGM and MC in All Intra (AI)/Low Delay B(LDB) configurations, respectively, with low additional encoding complexity and almost the same decoding complexity.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125968873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhenghao Zhang, Zhao Wang, Yan Ye, Shiqi Wang, Changwen Zheng
{"title":"Encoder-Decoder Joint Enhancement for Video Chat","authors":"Zhenghao Zhang, Zhao Wang, Yan Ye, Shiqi Wang, Changwen Zheng","doi":"10.1109/VCIP53242.2021.9675448","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675448","url":null,"abstract":"Video chat becomes more and more popular in our daily life. However, how to provide a high-quality video chat with the limited bandwidth is a key challenging task. In this paper, beyond the state-of-the-art video compression system, we propose an encoder-decoder joint enhancement algorithm for the video chat. In particular, the sparse map of the original frame is extracted at the encoder side and signaled to the decoder, which is utilized together with the sparse map of the decoded frame to obtain the boundary transformation map. In this manner, the boundary transformation map represents the key difference between the original frame and the decoded frame and hence can be used to enhance the decoded frame. Experimental results show that the proposed algorithm brings clear subjective and objective quality improvements. At the same quality, the proposed algorithm can achieve 35% bitrate savings compared to the VVC.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126747531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Stereoscopic Video Quality Assessment with Multi-level Binocular Fusion Network Considering Disparity and Multi-scale Information","authors":"Yingjie Feng, Sumei Li","doi":"10.1109/VCIP53242.2021.9675404","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675404","url":null,"abstract":"Stereoscopic video quality assessment (SVQA) is of great importance to promote the development of the stereoscopic video industry. In this paper, we propose a three-branch multi-level binocular fusion convolutional neural network (MBFNet) which is highly consistent with human visual perception. Our network mainly includes three innovative structures. Firstly, we construct a multi-scale cross-dimension attention module (MSCAM) on the left and right branches to capture more critical semantic information. Then, we design a multi-level binocular fusion unit (MBFU) to fuse the features from left and right branches adaptively. Besides, a disparity compensation branch (DCB) containing an enhancement unit (EU) is added to provide disparity feature. The experimental results show that the proposed method is superior to other existing SVQA methods with state-of-the-art performance.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115327696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Inter Prediction via Reference Frame Interpolation for Blurry Video Coding","authors":"Zezhi Zhu, Lili Zhao, Xuhu Lin, Xuezhou Guo, Jianwen Chen","doi":"10.1109/VCIP53242.2021.9675429","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675429","url":null,"abstract":"In High Efficiency Video Coding (HEVC), inter prediction is an important module for removing temporal redundancy. The accuracy of inter prediction is much affected by the similarity between the current and reference frames. However, for blurry videos, the performance of inter coding will be degraded by varying motion blur, which is derived from camera shake or the acceleration of objects in the scene. To address this problem, we propose to synthesize additional reference frame via the frame interpolation network. The synthesized reference frame is added into reference picture lists to supply more credible reference candidate, and the searching mechanism for motion candidates is changed accordingly. In addition, to make our interpolation network more robust to various inputs with different compression artifacts, we establish a new blurry video database to train our network. With the well-trained frame interpolation network, compared with the reference software HM-16.9, the proposed method achieves on average 1.55% BD-rate reduction under random access (RA) configuration for blurry videos, and also obtains on average 0.75% BD-rate reduction for common test sequences.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116550543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Study on 4D Light Field Compression Using Multi-focus Images and Reference Views","authors":"Shuho Umebayashi, K. Kodama, T. Hamamoto","doi":"10.1109/VCIP53242.2021.9675378","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675378","url":null,"abstract":"We propose a novel method of light field compression using multi-focus images and reference views. Light fields enable us to observe scenes from various viewpoints. However, it generally consists of 4D enormous data, that are not suitable for storing or transmitting without effective compression at relatively low bit-rates. On the other hand, 4D light fields are essentially redundant because it includes just 3D scene information. While robust 3D scene estimation such as depth recovery from light fields is not so easy, a method of reconstructing light fields directly from 3D information composed of multi-focus images without any scene estimation is successfully derived. Based on the method, we previously proposed light field compression via multi-focus images as effective representation of 3D scenes. Actually, its high performance can be seen only at very low bit-rates, because there exists some degradation of low frequency components and occluded regions on light fields predicted from multi-focus images. In this paper, we study higher quality light field compression by using reference views to improve quality of the prediction from multi-focus images. Our contribution is twofold: first, our improved method can keep good performance of 4D light field compression at a wider range of low bit-rates than the previous one working effectively only for very low bit-rates; second, we clarify how the proposed method can improve its performance continuously by introducing recent video codec such as HEVC and VVC into our compression framework, that does not depend on 3D-SPIHT previously adopted for the corresponding component. We show experimental results by using synthetic and real images, where quality of reconstructed light fields is evaluated by PSNR and SSIM for analyzing characteristics of our novel method well. We notice that it is much superior to light field compression using HEVC directly at low bit-rates regardless of its light field scan order.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"2013 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128218512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Binocular Visual Mechanism Guided No-Reference Stereoscopic Image Quality Assessment Considering Spatial Saliency","authors":"Jinhui Feng, Sumei Li, Yongli Chang","doi":"10.1109/vcip53242.2021.9675338","DOIUrl":"https://doi.org/10.1109/vcip53242.2021.9675338","url":null,"abstract":"In recent years, with the popularization of 3D technology, stereoscopic image quality assessment (SIQA) has attracted extensive attention. In this paper, we propose a two-stage binocular fusion network for SIQA, which takes binocular fusion, binocular rivalry and binocular suppression into account to imitate the complex binocular visual mechanism in the human brain. Besides, to extract spatial saliency features of the left view, the right view, and the fusion view, saliency generating layers (SGLs) are applied in the network. The SGL apply multi-scale dilated convolution to emphasize essential spatial information of the input features. Experimental results on four public stereoscopic image databases demonstrate that the proposed method outperforms the state-of-the-art SIQA methods on both symmetrical and asymmetrical distortion stereoscopic images.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124675516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Stereo Image Super-Resolution Based on Pixel-Wise Knowledge Distillation Strategy","authors":"Li Ma, Sumei Li","doi":"10.1109/VCIP53242.2021.9675446","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675446","url":null,"abstract":"In stereo image super-resolution (SR), it is equally important to utilize intra-view and cross-view information. However, most existing methods only focus on the exploration of cross-view information and neglect the full mining of intra-view information, which limits the reconstruction performance of these methods. Since single image SR (SISR) methods are powerful in intra-view information exploitation, we propose to introduce the knowledge distillation strategy to transfer the knowledge of a SISR network (teacher network) to a stereo image SR network (student network). With the help of the teacher network, the student network can easily learn more intra-view information. Specifically, we propose pixel-wise distillation as the implementation method, which not only improves the intra-view information extraction ability of student network, but also ensures the effective learning of cross-view information. Moreover, we propose a lightweight student network named Adaptive Residual Feature Aggregation network (ARFAnet). Its main unit, the ARFA module, can aggregate informative residual features and produce more representative features for image reconstruction. Experimental results demonstrate that our teacher-student network achieves state-of-the-art performance on all benchmark datasets.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129482859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DensER: Density-imbalance-Eased Representation for LiDAR-based Whole Scene Upsampling","authors":"Tso-Yuan Chen, Ching-Chun Hsiao, Wen-Huang Cheng, Hong-Han Shuai, Peter Chen, Ching-Chun Huang","doi":"10.1109/VCIP53242.2021.9675334","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675334","url":null,"abstract":"With the development of depth sensors, 3D point cloud upsampling that generates a high-resolution point cloud given a sparse input becomes emergent. However, many previous works focused on single 3D object reconstruction and refinement. Although a few recent works began to discuss 3D structure refine-ment for a more complex scene, they do not target LiDAR-based point clouds, which have density imbalance issues from near to far. This paper proposed DensER, a Density-imbalance-Eased regional Representation. Notably, to learn robust representations and model local geometry under imbalance point density, we designed density-aware multiple receptive fields to extract the regional features. Moreover, founded on the patch reoccurrence property of a nature scene, we proposed a density-aided attentive module to enrich the extracted features of point-sparse areas by referring to other non-local regions. Finally, by coupling with novel manifold-based upsamplers, DensER shows the ability to super-resolve LiDAR-based whole-scene point clouds. The exper-imental results show DensER outperforms related works both in qualitative and quantitative evaluation. We also demonstrate that the enhanced point clouds can improve downstream tasks such as 3D object detection and depth completion.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129340430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Revisiting Flipping Strategy for Learning-based Stereo Depth Estimation","authors":"Yue Li, Yueyi Zhang, Zhiwei Xiong","doi":"10.1109/VCIP53242.2021.9675450","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675450","url":null,"abstract":"Deep neural networks (DNNs) have been widely used for stereo depth estimation, which achieve great success in performance. In this paper, we introduce a novel flipping strategy for DNN on the stereo depth estimation task. Specifically, based on a common DNN for stereo matching, we apply the flipping operation for both input stereo images, which are further fed to the original DNN. A flipping loss function is proposed to jointly train the network with the initial loss. We apply our strategy to many representative networks in both supervised and self-supervised manners. Extensive experimental results demonstrate that our proposed strategy improves the performance of these networks.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"05 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115930092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}