{"title":"A Hybrid Weighted Compound Motion Compensated Prediction for Video Compression","authors":"Cheng Chen, Jingning Han, Yaowu Xu","doi":"10.1109/PCS.2018.8456241","DOIUrl":"https://doi.org/10.1109/PCS.2018.8456241","url":null,"abstract":"Compound motion compensated prediction that combines reconstructed reference blocks to exploit the temporal correlation is a major component in the hierarchical coding scheme. A uniform combination that applies equal weights to reference blocks regardless of distances towards the current frame is widely employed in mainstream codecs. Linear distance weighted combination, while reflecting the temporal correlation, is likely to ignore the quantization noise factor and hence degrade the prediction quality. This work builds on the premise that the compound prediction mode effectively embeds two functionalities - exploiting temporal correlation in the video signal and canceling the quantization noise from reference blocks. A modified distance weighting scheme is introduced to optimize the trade-off between these two factors. It quantizes the weights to limit the minimum contribution from both reference blocks for noise cancellation. We further introduces a hybrid scheme allowing the codec to switch between the proposed distance weighted compound mode and the averaging mode to provide more flexibility for the trade-off between temporal correlation and noise cancellation. The scheme is implemented in the AV1 codec as part of the syntax definition. It is experimentally demonstrated to provide on average 1.5% compression gains across a wide range of test sets.","PeriodicalId":433667,"journal":{"name":"2018 Picture Coding Symposium (PCS)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128490448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Simple Prediction Fusion Improves Data-driven Full-Reference Video Quality Assessment Models","authors":"C. Bampis, A. Bovik, Zhi Li","doi":"10.1109/PCS.2018.8456293","DOIUrl":"https://doi.org/10.1109/PCS.2018.8456293","url":null,"abstract":"When developing data-driven video quality assessment algorithms, the size of the available ground truth subjective data may hamper the generalization capabilities of the trained models. Nevertheless, if the application context is known a priori, leveraging data-driven approaches for video quality prediction can deliver promising results. Towards achieving highperforming video quality prediction for compression and scaling artifacts, Netflix developed the Video Multi-method Assessment Fusion (VMAF) Framework, a full-reference prediction system which uses a regression scheme to integrate multiple perceptionmotivated features to predict video quality. However, the current version of VMAF does not fully capture temporal video features relevant to temporal video distortions. To achieve this goal, we developed Ensemble VMAF (E-VMAF): a video quality predictor that combines two models: VMAF and predictions based on entropic differencing features calculated on video frames and frame differences. We demonstrate the improved performance of E-VMAF on various subjective video databases. The proposed model will become available as part of the open source package in https://github. com/Netflix/vmaf.","PeriodicalId":433667,"journal":{"name":"2018 Picture Coding Symposium (PCS)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114412572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Schwarz, M. Hannuksela, Vida Fakour Sevom, Nahid Sheikhi-Pour
{"title":"2D Video Coding of Volumetric Video Data","authors":"S. Schwarz, M. Hannuksela, Vida Fakour Sevom, Nahid Sheikhi-Pour","doi":"10.1109/PCS.2018.8456265","DOIUrl":"https://doi.org/10.1109/PCS.2018.8456265","url":null,"abstract":"Due to the increased popularity of augmented and virtual reality experiences, the interest in representing the real world in an immersive fashion has never been higher. Distributing such representations enables users all over the world to freely navigate in never seen before media experiences. Unfortunately, such representations require a large amount of data, not feasible for transmission on today’s networks. Thus, efficient compression technologies are in high demand. This paper proposes an approach to compress 3D video data utilizing 2D video coding technology. The proposed solution was developed to address the needs of ‘tele-immersive’ applications, such as virtual (VR), augmented (AR) or mixed (MR) reality with Six Degrees of Freedom (6DoF) capabilities. Volumetric video data is projected on 2D image planes and compressed using standard 2D video coding solutions. A key benefit of this approach is its compatibility with readily available 2D video coding infrastructure. Furthermore, objective and subjective evaluation shows significant improvement in coding efficiency over reference technology.","PeriodicalId":433667,"journal":{"name":"2018 Picture Coding Symposium (PCS)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126520870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Machine learning as applied intrinsically to individual dimensions of HDR Display Quality","authors":"A. Choudhury, S. Daly","doi":"10.1109/PCS.2018.8456284","DOIUrl":"https://doi.org/10.1109/PCS.2018.8456284","url":null,"abstract":"This study builds on previous work exploring machine learning and perceptual transforms in predicting overall display quality as a function of image quality dimensions that correspond to physical display parameters. Previously, we found that the use of perceptually transformed parameters or machine learning exceeded the performance of predictors using just physical parameters and linear regression. Further, the combination of perceptually transformed parameters with machine learning allowed for robustness to parameters outside of the data set, both for cases of interpolation and extrapolation. Here we apply machine learning at a more intrinsic level. We first evaluate how well the machine learning can develop predictors of the individual dimensions of the overall quality, and then how well those individual predictors can be consolidated across themselves to predict the overall display quality. Having predictions of individual dimensions of quality that are closely related to specific hardware design choices enables more nimble cost trade-off design options.","PeriodicalId":433667,"journal":{"name":"2018 Picture Coding Symposium (PCS)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129406526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Perceptually-Aligned Frame Rate Selection Using Spatio-Temporal Features","authors":"Angeliki V. Katsenou, Di Ma, D. Bull","doi":"10.1109/PCS.2018.8456274","DOIUrl":"https://doi.org/10.1109/PCS.2018.8456274","url":null,"abstract":"During recent years, the standardisation committees on video compression and broadcast formats have worked on extending practical video frame rates up to 120 frames per second. Generally, increased video frame rates have been shown to improve immersion, but at the cost of higher bit rates. Taking into consideration that the benefits of high frame rates are content dependent, a decision mechanism that recommends the appropriate frame rate for the specific content would provide benefits prior to compression and transmission. Furthermore, this decision mechanism must take account of the perceived video quality. The proposed method extracts and selects suitable spatio-temporal features and uses a supervised machine learning technique to build a model that is able to predict, with high accuracy, the lowest frame rate for which the perceived video quality is indistinguishable from that of video at the acquisition frame rate. The results show that it is a promising tool for prior to compression and delivery processing of videos, such as content-aware frame rate adaptation.","PeriodicalId":433667,"journal":{"name":"2018 Picture Coding Symposium (PCS)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114304125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zeina Sinno, Anush K. Moorthy, J. D. Cock, Zhi Li, A. Bovik
{"title":"Quality Assessment of Thumbnail and Billboard Images on Mobile Devices","authors":"Zeina Sinno, Anush K. Moorthy, J. D. Cock, Zhi Li, A. Bovik","doi":"10.1109/PCS.2018.8456285","DOIUrl":"https://doi.org/10.1109/PCS.2018.8456285","url":null,"abstract":"Objective image quality assessment (IQA) research entails developing algorithms that predict human judgments of picture quality. Validating performance entails evaluating algorithms under conditions similar to where they are deployed. Hence, creating image quality databases representative of target use cases is an important endeavor. Here we present a database that relates to quality assessment of billboard images commonly displayed on mobile devices. Billboard images are a subset of thumbnail images, that extend across a display screen, representing things like album covers, banners, or frames or artwork. We conducted a subjective study of the quality of billboard images distorted by processes like compression, scaling and chroma-subsampling, and compared high-performance quality prediction models on the images and subjective data.","PeriodicalId":433667,"journal":{"name":"2018 Picture Coding Symposium (PCS)","volume":"10 8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127088882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Wavelet Decomposition Pre-processing for Spatial Scalability Video Compression Scheme","authors":"Glenn Herrou, W. Hamidouche, L. Morin","doi":"10.1109/PCS.2018.8456307","DOIUrl":"https://doi.org/10.1109/PCS.2018.8456307","url":null,"abstract":"Scalable video coding enables to compress the video at different formats within a single layered bitstream. SHVC, the scalable extension of the High Efficiency Video Coding (HEVC) standard, enables x2 spatial scalability, among other additional features. The closed-loop architecture of the SHVC codec is based on the use of multiple instances of the HEVC codec to encode the video layers, which considerably increases the encoding complexity. With the arrival of new immersive video formats, like 4K, 8K, High Frame Rate (HFR) and 360° videos, the quantity of data to compress is exploding, making the use of high-complexity coding algorithms unsuitable. In this paper, we propose a lowcomplexity scalable coding scheme based on the use of a single HEVC codec instance and a wavelet-based decomposition as pre-processing. The pre-encoding image decomposition relies on well-known simple Discrete Wavelet Transform (DWT) kernels, such as Haar or Le Gall 5/3. Compared to SHVC, the proposed architecture achieves a similar rate distortion performance with a coding complexity reduction of 50%.","PeriodicalId":433667,"journal":{"name":"2018 Picture Coding Symposium (PCS)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114063397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Masaru Takeuchi, Shintaro Saika, Yusuke Sakamoto, Tatsuya Nagashima, Zhengxue Cheng, Kenji Kanai, J. Katto, Kaijin Wei, Ju Zengwei, Xu Wei
{"title":"Perceptual Quality Driven Adaptive Video Coding Using JND Estimation","authors":"Masaru Takeuchi, Shintaro Saika, Yusuke Sakamoto, Tatsuya Nagashima, Zhengxue Cheng, Kenji Kanai, J. Katto, Kaijin Wei, Ju Zengwei, Xu Wei","doi":"10.1109/PCS.2018.8456297","DOIUrl":"https://doi.org/10.1109/PCS.2018.8456297","url":null,"abstract":"We introduce a perceptual video quality driven video encoding solution for optimized adaptive streaming. By using multiple bitrate/resolution encoding like MPEG-DASH, video streaming services can deliver the best video stream to a client, under the conditions of the client's available bandwidth and viewing device capability. However, conventional fixed encoding recipes (i.e., resolution-bitrate pairs) suffer from many problems, such as improper resolution selection and stream redundancy. To avoid these problems, we propose a novel video coding method, which generates multiple representations with constant JustNoticeable Difference (JND) interval. For this purpose, we developed a JND scale estimator using Support Vector Regression (SVR), and designed a pre-encoder which outputs an encoding recipe with constant JND interval in an adaptive manner to input video.","PeriodicalId":433667,"journal":{"name":"2018 Picture Coding Symposium (PCS)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125872696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}