{"title":"Bit Allocation based on Visual Saliency in HEVC","authors":"ChungWen Ku, Guoqing Xiang, Feng Qi, Wei Yan, Yuan Li, Xiaodong Xie","doi":"10.1109/VCIP47243.2019.8965753","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8965753","url":null,"abstract":"As one of the important part in the HEVC reference software, R-lambda model adopts mean absolute difference (MAD) for the coding unit tree (CTU) level bit allocation. However, this optimum method may neglect some important characteristics of human visual system (HVS). In this paper, we propose a novel bit allocation algorithm to process some salient visual information priority. Firstly, an improved video saliency detection algorithm is proposed, which induces temporal correlation into a 2D visual attention model. Secondly, the visual saliency based CTU level bit allocation algorithm is presented by allocating bits for CTUs with their saliency weights. What’s more, with considerations of the temporal quality consistence among Saliency Areas (SAs), a window based weight smoothing model is proposed to achieve better subjective quality. Finally, several experiments are performed on the HEVC reference software, HM16.9, under the low delay P configuration, and the experimental results show that the average BD-Rate of the entire test sequences and of the SAs reduce 1.7% and 6.2%, respectively. The proposed algorithm can also improve subjective quality remarkably.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124693282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wang Shen, Lianghui Ding, Guangtao Zhai, Ying Cui, Zhiyong Gao
{"title":"A QoE-oriented Saliency-aware Approach for 360-degree Video Transmission","authors":"Wang Shen, Lianghui Ding, Guangtao Zhai, Ying Cui, Zhiyong Gao","doi":"10.1109/VCIP47243.2019.8965847","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8965847","url":null,"abstract":"The tradeoff between bandwidth efficiency and quality of experience (QoE) is a key issue in 360 video transmission. In this paper, we propose a QoE-oriented saliency-aware 360 video transmission framework to balance this tradeoff. The target is to reduce the bandwidth demand without declining the QoE. Specifically, the proposed model is based on the decision-making process. We use Lyapunov optimization to solve the decisionmaking problem. Furthermore, we integrate saliency information into the model to influence the decision policy, so that the model has the advantage of bandwidth efficiency. The simulation results show that the tradeoff parameter of Lyapunov optimization can balance the tradeoff between QoE and bandwidth efficiency, and 360 video saliency entropy limits the upper and lower bounds of QoE and bandwidth efficiency.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125189925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reference Picture Synthesis for Video Sequences Captured with a Monocular Moving Camera","authors":"H. Golestani, Christian Rohlfing, J. Ohm","doi":"10.1109/VCIP47243.2019.8965883","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8965883","url":null,"abstract":"Inter-frame prediction plays an important role in video coding by predicting the current frame from previously encoded pictures, called reference pictures. In the case of camera motion, the content of a current frame could be very different from its reference pictures and may consequently lead to a more difficult Motion Compensation (MC). The main idea of this paper is to process the input 2D video sequence in order to estimate the 3D geometry of the scene and then employ this data to virtually synthesize \"geometrically compensated\" reference pictures. Since these virtual reference pictures are more similar to the current frame, motion estimation and consequently coding efficiency could be enhanced. The proposed method is tested over six different video sequences and around 11% bitrate reduction is achieved compared to the High Efficiency Video Coding (HEVC) standard.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126209659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hangshi Zhong, Zhentao Tan, Bin Liu, Weihai Li, Nenghai Yu
{"title":"PPML: Metric Learning with Prior Probability for Video Object Segmentation","authors":"Hangshi Zhong, Zhentao Tan, Bin Liu, Weihai Li, Nenghai Yu","doi":"10.1109/VCIP47243.2019.8965961","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8965961","url":null,"abstract":"Video object segmentation plays an important role in computer vision and has attracted much attention. Although many recent works have removed the fine-tuning process in pursuit of fast inference speed, while achieving high segmentation accuracy, they are still far from being real-time. In this paper, we regard this task as a feature matching problem and propose a prior probability based metric learning (PPML) method for faster inference speed and higher segmentation accuracy. The proposed method consists of two ingredients: a novel template space updating strategy that improves the efficiency of segmentation by avoiding the explosion of data in template space, and a novel feature matching method which applies more potential probability information through integrating the prior of the first frame and the predicted score of previous frames. Experimental results on DAVIS datasets demonstrate that the proposed method reaches the state-of-the-art competitive performance and is more efficient in time consumption.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117025424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Progressive Semantic Image Synthesis via Generative Adversarial Network","authors":"Ke Yue, Yidong Li, Huifang Li","doi":"10.1109/VCIP47243.2019.8966069","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8966069","url":null,"abstract":"Semantic image synthesis via text description is a desirable and challenging task, which requires more protection of the text irrelevant content in the original image. Existing methods directly modify the original image, which become more difficult when encountering high resolution image, and the generated images are also blurred and lack in detail. This paper presents a novel network architecture to progressively manipulate an image starting from low-resolution, while introducing the original image of corresponding size at different stages with our proposed union module to avoid losing of detail. And the progressive design of the network allows us to modify the image from coarse into fine. Compared with the previous methods, our new method can successfully manipulate a high resolution image and generate a new image with background protection and fine details. The experimental results on CUB-200-2011 dataset show that the proposed approach outperforms existing methods in terms of image detail, background protection and high resolution generation.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130819784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shuqiao Sun, Rongke Liu, Qiuchen Du, Shantong Sun, Shaoli Kang
{"title":"Image-Based End-to-End Neural Network for Dense Disparity Estimation","authors":"Shuqiao Sun, Rongke Liu, Qiuchen Du, Shantong Sun, Shaoli Kang","doi":"10.1109/VCIP47243.2019.8965761","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8965761","url":null,"abstract":"Stereo matching is a challenging yet important task to various computer vision applications, e.g. 3D reconstruction, augmented reality, and autonomous vehicles. In this paper, we present a novel image-based convolutional neural network (CNN) for dense disparity estimation using stereo image pairs. In order to achieve precise and robust stereo matching, we introduce a feature extraction module that learns both local and global information. These features are then passed through an hour-glass structure to generate disparity maps from lower resolution to full resolution. We test the proposed method in several datasets including indoor scenes and synthetic scenes. Experimental results demonstrate that the proposed method outperforms the state-of-the-art methods in several datasets.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129750320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tingting Li, Yunhui Shi, Xiaoyan Sun, Jin Wang, Baocai Yin
{"title":"PGAN: Prediction Generative Adversarial Nets for Meshes","authors":"Tingting Li, Yunhui Shi, Xiaoyan Sun, Jin Wang, Baocai Yin","doi":"10.1109/VCIP47243.2019.8965985","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8965985","url":null,"abstract":"Unlike images, the topology similarity among meshes can hardly be handled with traditional signal processing tools because of their irregular structures. Geometry image parameterization provides a way to represent 3D meshes in the form of 2D geometry and normal images. However, most existing methods, including the CoGAN are not suitable for such unnatural images corresponding to meshes. To solve this problem, we propose a Prediction Generative Adversarial Network (PGAN) to learn a joint distribution of geometry and normal images for generating meshes. Particularly, we enforce a prediction constraint on the geometry GAN and normal GAN in our PGAN utilizing the inherent relationship between the geometry and normal. The experimental results on face mesh generation indicate that our PGAN outperforms in generating realistic face models with rich facial attributes such as facial expression and retaining the geometry of the faces.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114565472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Predicting the visual saliency of the people with VIMS","authors":"Jiawei Yang, Guangtao Zhai, Huiyu Duan","doi":"10.1109/VCIP47243.2019.8965925","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8965925","url":null,"abstract":"As is known to us, visually induced motion sickness (VIMS) is often experienced in a virtual environment. Learning the visual attention of people with VIMS contributes to related research in the field of virtual reality (VR) content design and psychology. In this paper, we first construct a saliency prediction for people with VIMS (SPPV) database, which is the first of its kind. The database consists of 80 omnidirectional images and the corresponding eye tracking data collected from 30 individuals. We analyze the performance of five state-of-the-art deep neural networks (DNN)-based saliency prediction algorithms with their original networks and the fine-tuned networks on our database. We predict the atypical visual attention of people with VIMS for the first time and obtain relatively good saliency prediction results for VIMS controls so far.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117236745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Miko Atokari, Marko Viitanen, Alexandre Mercat, Emil Kattainen, Jarno Vanne
{"title":"Parallax-Tolerant 360 Live Video Stitcher","authors":"Miko Atokari, Marko Viitanen, Alexandre Mercat, Emil Kattainen, Jarno Vanne","doi":"10.1109/VCIP47243.2019.8965900","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8965900","url":null,"abstract":"This paper presents an open-source software implementation for real-time 360-degree video stitching. To ensure a seamless stitching result, cylindrical and content-preserving warping are implemented to dynamically correct image alignment and parallax, which may drift due to scene changes, moving objects, or camera movement. Depth variation, color changes, and lighting differences between adjacent frames are also smoothed out to improve visual quality of the panoramic video. The system is benchmarked with six 1080p videos, which are stitched into 4096×732 pixel output format. The proposed algorithm attains an output rate of 18 frames per second on GeForce GTX 1070 GPU and real-time speed can be met with a high-end GPU.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123007844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CR-U-Net: Cascaded U-Net with Residual Mapping for Liver Segmentation in CT Images*","authors":"Yiwei Liu, Na Qi, Qing Zhu, Weiran Li","doi":"10.1109/VCIP47243.2019.8966072","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8966072","url":null,"abstract":"Abdominal computed tomography (CT) is a common modality to detect liver lesions. Liver segmentation in CT scan is important for diagnosis and analysis of liver lesions. However, the accuracy of existing liver segmentation methods is slightly insufficient. In this paper, we propose a liver segmentation architecture named CR-U-Net, which is composed of cascade U-Net combined with residual mapping. We make use of the MDice loss function for training in CR-U-Net, and the second-level of cascade network is deeper than the first-level to extract more detailed image features. Morphological algorithms are utilized as an intermediate-processing step to improve the segmentation accuracy. In addition, we evaluate our proposed CR-U-Net on liver segmentation task under the dataset provided by the 2017 ISBI LiTS Challenge. The experimental result demonstrates that our proposed CR-U-Net can outperform the state-of-the-art methods in term of the performance measures, such as Dice score, VOE, and so on.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129605381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}