Hadi Amirpour, R. Schatz, C. Timmerer, M. Ghanbari
{"title":"On the Impact of Viewing Distance on Perceived Video Quality","authors":"Hadi Amirpour, R. Schatz, C. Timmerer, M. Ghanbari","doi":"10.1109/VCIP53242.2021.9675431","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675431","url":null,"abstract":"Due to the growing importance of optimizing the quality and efficiency of video streaming delivery, accurate assessment of user-perceived video quality becomes increasingly important. However, due to the wide range of viewing distances encountered in real-world viewing settings, the perceived video quality can vary significantly in everyday viewing situations. In this paper, we investigate and quantify the influence of viewing distance on perceived video quality. A subjective experiment was conducted with full HD sequences at three different fixed viewing distances, with each video sequence being encoded at three different quality levels. Our study results confirm that the viewing distance has a significant influence on the quality assessment. In particular, they show that an increased viewing distance generally leads to increased perceived video quality, especially at low media encoding quality levels. In this context, we also provide an estimation of potential bitrate savings that knowledge of actual viewing distance would enable in practice. Since current objective video quality metrics do not systematically take into account viewing distance, we also analyze and quantify the influence of viewing distance on the correlation between objective and subjective metrics. Our results confirm the need for distance-aware objective metrics when the accurate prediction of perceived video quality in real-world environments is required.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115268245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xinye Jiang, Zhenyu Liu, Yongbing Zhang, Xiangyang Ji
{"title":"A Distortion Propagation Oriented CU-tree Algorithm for x265","authors":"Xinye Jiang, Zhenyu Liu, Yongbing Zhang, Xiangyang Ji","doi":"10.1109/VCIP53242.2021.9675426","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675426","url":null,"abstract":"Rate-distortion optimization (RDO) is widely used in video coding to improve coding efficiency. Conventionally, RDO is applied to each block independently to avoid high computational complexity. However, various prediction techniques introduce spatio-temporal dependency between blocks, therefore the independent RDO is not optimal. Specifically, because of the motion compensation, the distortion of reference blocks will affect the quality of subsequent prediction blocks. And considering this temporal dependency in RDO can improve the global rate-distortion (R-D) performance. x265 leveraged on a lookahead module to analyze the temporal dependency between blocks, and weighted the quality of each block based on its reference strength. However, the original algorithm in x265 ignored the impacts of quantization, and this shortcoming degraded the R-D performance of x265. In this paper, we propose a new linear distortion propagation model to estimate the temporal dependency, which introduces the impacts of quantization. And from a perspective of global RDO, a corresponding adaptive quantization formula is presented. The proposed algorithm was conducted in x265 version 3.2. Experiments revealed that, the proposed algorithm achieved average 15.43% PSNR-based and 23.81% SSIM-based BD-rate reductions, which outperformed the original algorithm in x265 by 4.14% and 9.68%, respectively.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131560756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dictionary Learning-based Reference Picture Resampling in VVC","authors":"J. Schneider, Christian Rohlfing","doi":"10.1109/VCIP53242.2021.9675361","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675361","url":null,"abstract":"Versatile Video Coding (VVC) introduces the con-cept of Reference Picture Resampling (RPR), which allows for a resolution change of the video during decoding, without introducing an additional Intra Random Access Point (IRAP) into the bitstream. When the resolution is increased, an upsampling operation of the reference picture is required in order to apply motion compensated prediction. Conceptually, the upsampling by linear interpolation filters fails to recover frequencies which were lost during downsampling. Yet, the quality of the upsampled reference picture is crucial to the pre-diction performance. In recent years, machine learning based Super-Resolution (SR) has shown to outperform conventional interpolation filters by far in regard to super-resolving a previ-ously downsampled image. In particular, Dictionary Learning-based Super-Resolution (DLSR) was shown to improve the inter-layer prediction in SHVC [1]. Thus, this paper introduces DLSR to the prediction process in RPR. Further, the approach is experimentally evaluated by an implementation based on the VTM-9.3 reference software. The simulation results show a reduction of the instantaneous bitrate of 0.98% on average at the same objective quality in terms of PSNR. Moreover, the peak bitrate reduction is measured to 4.74% for the “Johnny” sequence of the JVET test set.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134548173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Error Self-learning Semi-supervised Method for No-reference Image Quality Assessment","authors":"Yingjie Feng, Sumei Li, Sihan Hao","doi":"10.1109/VCIP53242.2021.9675352","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675352","url":null,"abstract":"In recent years, deep learning has achieved significant progress in many respects. However, unlike other research fields with millions of labeled data such as image recognition, only several thousand labeled images are available in image quality assessment (IQA) field for deep learning, which heavily hinders the development and application for IQA. To tackle this problem, in this paper, we proposed an error self-learning semi-supervised method for no-reference (NR) IQA (ESSIQA), which is based on deep learning. We employed an advanced full reference (FR) IQA method to expand databases and supervise the training of network. In addition, the network outputs of expanding images were used as proxy labels replacing errors between subjective scores and objective scores to achieve error self-learning. Two weights of error back propagation were designed to reduce the impact of inaccurate outputs. The experimental results show that the proposed method yielded comparative effect.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115719799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Portable Congenital Glaucoma Detection System","authors":"Chunjun Hua, Menghan Hu, Yue Wu","doi":"10.1109/VCIP53242.2021.9675423","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675423","url":null,"abstract":"Congenital glaucoma is an eye disease caused by embryonic developmental disorders, which damages the optic nerve. In this demo paper, we proposed a portable non-contact congenital glaucoma detection system, which can evaluate the condition of children's eyes by measuring the cornea size using the developed mobile application. The system consists of two modules viz. cornea identification module and diagnosis module. This system can be utilized by everyone with a smartphone, which is of wider application. It can be used as a convenient home self-examination tool for children in the large-scale screening of congenital glaucoma. The demo video of the proposed detection system is available at: https://doi.org/10.6084/m9.figshare.14728854.v1.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116542331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"VCIP 2021 Organizing Committee","authors":"","doi":"10.1109/vcip53242.2021.9675374","DOIUrl":"https://doi.org/10.1109/vcip53242.2021.9675374","url":null,"abstract":"","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125832555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multicomponent Secondary Transform","authors":"M. Krishnan, Xin Zhao, Shanchun Liu","doi":"10.1109/VCIP53242.2021.9675447","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675447","url":null,"abstract":"The Alliance for Open Media has recently initiated coding tool exploration activities towards the next-generation video coding beyond AV1. In this regard, a frequency-domain coding tool, which is designed to leverage the cross-component correlation existing between collocated chroma blocks, is explored in this paper. The tool, henceforth known as multi-component secondary transform (MCST), is implemented as a low complexity secondary transform with primary transform coefficients of multiple color components as input. The proposed tool is implemented and tested on top of libaom. Experimental results show that, compared to libaom, the proposed method achieves an average 0.34% to 0.44% overall coding efficiency for All Intra (AI) coding configuration for a wide range of video content.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129305427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Urban Planter: A Web App for Automatic Classification of Urban Plants","authors":"Sarit Divekar, Irina Rabaev, Marina Litvak","doi":"10.1109/VCIP53242.2021.9675318","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675318","url":null,"abstract":"Plant classification requires an expert because subtle differences in leaves or petal forms might differentiate between different species. On the contrary, some species are characterized by high variability in appearance. This paper introduces a web app for assisting people in identifying plants for discovering the best growing methods. The uploaded picture is submitted to the back-end server, and a pre-trained neural network classifies it to one of the predefined classes. The classification label and confidence are displayed to the end user on the front-end page. The application focuses on the house and garden plant species that can be grown mainly in a desert climate and are not covered by existing datasets. For training a model, we collected the Urban Planter dataset. The installation code of the alpha version and the demo video of the app can be found on https://github.com/UrbanPlanter/urbanplanterapp.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115999684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Two-stage Parallax Correction and Multi-stage Cross-view Fusion Network Based Stereo Image Super-Resolution","authors":"Yijian Zheng, Sumei Li","doi":"10.1109/VCIP53242.2021.9675418","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675418","url":null,"abstract":"Stereo image super-resolution (SR) has achieved great progress in recent years. However, the two major problems of the existing methods are that the parallax correction is insufficient and the cross-view information fusion only occurs in the beginning of the network. To address these problems, we propose a two-stage parallax correction and a multi-stage cross-view fusion network for better stereo image SR results. Specially, the two-stage parallax correction module consists of horizontal parallax correction and refined parallax correction. The first stage corrects horizontal parallax by parallax attention. The second stage is based on deformable convolution to refine horizontal parallax and correct vertical parallax simultaneously. Then, multiple cascaded enhanced residual spatial feature transform blocks are developed to fuse cross-view information at multiple stages. Extensive experiments show that our method achieves state-of-the-art performance on the KITTI2012, KITTI2015, Middlebury and Flickr1024 datasets.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126014821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"No-Reference Stereoscopic Image Quality Assessment Based on The Visual Pathway of Human Visual System","authors":"F. Meng, Sumei Li","doi":"10.1109/VCIP53242.2021.9675346","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675346","url":null,"abstract":"With the development of stereoscopic imaging technology, stereoscopic image quality assessment (SIQA) has gradually been more and more important, and how to design a method in line with human visual perception is full of challenges due to the complex relationship between binocular views. In this article, firstly, convolutional neural network (CNN) based on the visual pathway of human visual system (HVS) is built, which simulates different parts of visual pathway such as the optic chiasm, lateral geniculate nucleus (LGN), and visual cortex. Secondly, the two pathways of our method simulate the ‘what’ and ‘where’ visual pathway respectively, which are endowed with different feature extraction capabilities. Finally, we find a different application way for 3D-convolution, employing it fuse the information from left and right view, rather than just extracting temporal features in video. The experimental results show that our proposed method is more in line with subjective score and has good generalization.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126348823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}