{"title":"Icon Colorization Based On Triple Conditional Generative Adversarial Networks","authors":"Qin-Ru Han, Wenzhe Zhu, Qing Zhu","doi":"10.1109/VCIP49819.2020.9301890","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301890","url":null,"abstract":"Current automatic colorization systems have many defects such as \"contour blur\", \"color overflow\"and \"color miscellaneous\", especially when they are coloring the images with hollowed-out structure. We propose a model based on triple conditional generative adversarial networks, for generator we provide contour image, colored icon and colorization mask as inputs, our network has three discriminators, structure discriminator is trained to judge if the generated icon has similar contour to the input icon, color discriminator anticipates generated icon and the input icon has the similar color style, the function of mask discriminator is to distinguish whether the output has the similar colorization area to the input mask. For the evaluation, we compared with some existing colorization models, also we made a questionnaire to obtain the evaluation of generated icons from different models. The results showed that our colorization model obtain better results comparing to the other models both in generating hollowed-out and solid structure icons.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124450605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Inter Coding with Interpolated Reference Frame for Hierarchical Coding Structure","authors":"Yu Guo, Zizheng Liu, Zhenzhong Chen, Shan Liu","doi":"10.1109/VCIP49819.2020.9301769","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301769","url":null,"abstract":"In the hybrid video coding framework, inter prediction is an efficient tool to exploit temporal redundancy. Since the performance of inter prediction depends on the content of reference frames, coding efficiency can be significantly improved by having more effective reference frames. In this paper, we propose an enhanced inter coding scheme by generating artificial reference frames with deep neural network. Specifically, a new reference frame is interpolated from two-sided previously reconstructed frames, which can be regarded as the prediction of the to-be-coded frame. The synthesized frame is merged into reference picture list for motion estimation to further decrease the prediction residual. We integrate the proposed method into HM-16.20 under random access configuration. Experimental results show that the proposed method can significantly boost the coding performance, which provides 4.6% BD-rate reduction on average compared to HEVC baseline.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123728211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving Compression Artifact Reduction via End-to-End Learning of Side Information","authors":"Haichuan Ma, Dong Liu, Feng Wu","doi":"10.1109/VCIP49819.2020.9301805","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301805","url":null,"abstract":"We propose to improve neural network-based compression artifact reduction by transmitting side information for the neural network. The side information consists of artifact descriptors that are obtained by analyzing the original and compressed images in the encoder. In the decoder, the received descriptors are used as additional input to a well-designed conditional post-processing neural network. To reduce the transmission overhead, the entire model is optimized under the rate-distortion constraint via end-to-end learning. Experimental results show that introducing the side information greatly improves the ability of the post-processing neural network, and improves the rate-distortion performance.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123190168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Optimized Video Encoder Implementation with Screen Content Coding Tools","authors":"Xiaozhong Xu, Shitao Wang, Yu Chen, Yiming Li, Qing Zhang, Yushan Zheng, Shan Liu","doi":"10.1109/VCIP49819.2020.9301875","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301875","url":null,"abstract":"Screen content video applications require efficient coding of computer-generated materials. The new screen content coding tools such as intra block copy (IBC) and palette mode (PLT) have addressed this requirement. However, the added computational complexity on top of the existing sophisticated video encoders is also challenging. In this paper, we focus on the fast and efficient encoder implementation of these screen content coding tools. Improvements on hash-based IBC search, PLT optimization, mode decision between PLT and intra mode, and other general encoder accelerations towards screen content applications are studied and discussed. Experimental results show that with these methods added, the encoder can achieve some faster runtime performance than before while the compression efficiency is almost doubled with screen content coding tools.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134418050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Mixed Appearance-based and Coding Distortion-based CNN Fusion Approach for In-loop Filtering in Video Coding","authors":"Jian Yue, Yanbo Gao, Shuai Li, Menghu Jia","doi":"10.1109/VCIP49819.2020.9301895","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301895","url":null,"abstract":"With the success of the convolutional neural networks (CNNs) in image denoising and other computer vision tasks, CNNs have been investigated for in-loop filtering in video coding. Many existing methods directly use CNNs as powerful tools for filtering without much analysis on its effect. Considering the in-loop filters process the reconstructed video frames produced from a fixed line of video coding operations, the coding distortion in the reconstructed frames may share similar properties that can be learned by CNNs in addition to being a noisy image. Therefore, in this paper, we first categorize the CNN based filtering into two types of processes: appearance-based CNN filtering and coding distortion-based CNN filtering, and develop a two-stream CNN fusion framework accordingly. In the appearance-based CNN filtering, a CNN processes the reconstructed frame as a distorted image and extracts the global appearance information to restore the original image. In order to extract the global information, a CNN with pooling is used first to increase the receptive field and up-sampling is added in the late stage to produce pixel-level frame information. On the contrary, in the coding distortion-based filtering, a CNN processes the reconstructed frame as blocks with certain types of distortions by focusing on the local information to learn the coding distortion resulted by the fixed video coding pipeline. Finally, the appearance-based filtering stream and the coding distortion-based filtering stream are fused together to combine the two aspects of CNN filtering, and also the global and local information. To further reduce the complexity, the similar initial and last convolutional layers are shared over two streams to generate a mixed CNN. Experiments demonstrate that the proposed method achieves better performance than the existing CNN-based filtering methods, with 11.26% BD-rate saving under the All Intra configuration.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"394 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113997253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"APL: Adaptive Preloading of Short Video with Lyapunov Optimization","authors":"Haodan Zhang, Yixuan Ban, Xinggong Zhang, Zongming Guo, Zhimin Xu, Shengbin Meng, Junlin Li, Yue Wang","doi":"10.1109/VCIP49819.2020.9301886","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301886","url":null,"abstract":"Short video applications, like TikTok, have attracted many users across the world. It can feed short videos based on users' preferences and allow users to slide the boring content anywhere and anytime. To reduce the loading time and keep playback smoothness, most of the short video apps will preload the recommended short videos in advance. However, these apps preload short videos in fixed size and fixed order, which can lead to huge playback stall and huge bandwidth waste. To deal with these problems, we present an Adaptive Preloading mechanism for short videos based on Lyapunov Optimization, also called APL, to achieve near-optimal playback experience, i.e., maximizing playback smoothness and minimizing bandwidth waste considering users' sliding behaviors. Specifically, we make three technical contributions: (1) We design a novel short video streaming framework which can dynamically preload the recommended short videos before the current video is downloaded completely. (2) We formulate the preloading problem into a playback experience optimization problem to maximize the playback smoothness and minimize the bandwidth waste. (3) We transform the playback experience optimization problem during the whole viewing process into a single-step greedy algorithm based on the Lyapunov optimization theory to make the online decisions during playback. Through extensive experiments based on the real datasets that generously provided by TikTok, we demonstrate that APL can reduce the stall ratio by 81%/12% and bandwidth waste by 11%/31% compared with no-preloading/fixed-preloading mechanism.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114011376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yunqian Wen, Bo Liu, Rong Xie, Yunhui Zhu, Jingyi Cao, Li Song
{"title":"A Hybrid Model for Natural Face De-Identiation with Adjustable Privacy","authors":"Yunqian Wen, Bo Liu, Rong Xie, Yunhui Zhu, Jingyi Cao, Li Song","doi":"10.1109/VCIP49819.2020.9301866","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301866","url":null,"abstract":"As more and more personal photos are shared and tagged in social media, security and privacy protection are becoming an unprecedentedly focus of attention. Avoiding privacy risks such as unintended verification, becomes increasingly challenging. To enable people to enjoy uploading photos without having to consider these privacy concerns, it is crucial to study techniques that allow individuals to limit the identity information leaked in visual data. In this paper, we propose a novel hybrid model consists of two stages to generate visually pleasing de-identified face images according to a single input. Meanwhile, we successfully preserve visual similarity with the original face to retain data usability. Our approach combines latest advances in GAN-based face generation with well-designed adjustable randomness. In our experiments we show visually pleasing de-identified output of our method while preserving a high similarity to the original image content. Moreover, our method adapts well to the verificator of unknown structure, which further improves the practical value in our real life.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124944552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhiming Zhou, Yu Dong, Li Song, Rong Xie, Lin Li, Bing Zhou
{"title":"Quality of Experience Evaluation for Streaming Video Using CGNN","authors":"Zhiming Zhou, Yu Dong, Li Song, Rong Xie, Lin Li, Bing Zhou","doi":"10.1109/VCIP49819.2020.9301799","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301799","url":null,"abstract":"One of the principal contradictions these days in the field of video i s lying between the booming demand for evaluating the streaming video quality and the low precision of the Quality of Experience prediction results. In this paper, we propose Convolutional Neural Network and Gate Recurrent Unit (CGNN)-QoE, a deep learning QoE model, that can predict overall and continuous scores of video streaming services accurately in real time. We further implement state-of-the-art models on the basis of their works and compare with our method on six public available datasets. In all considered scenarios, the CGNN-QoE outperforms existing methods.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"130 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127904174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hao-Lun Fu, Po-Hsiang Fang, Chan-Yu Chi, Chung-ting Kuo, Meng-Hsuan Liu, Howard Muchen Hsu, Cheng-Hsun Hsieh, Sheng-Fu Liang, S. Hsieh, Cheng-Ta Yang
{"title":"Application of Brain-Computer Interface and Virtual Reality in Advancing Cultural Experience","authors":"Hao-Lun Fu, Po-Hsiang Fang, Chan-Yu Chi, Chung-ting Kuo, Meng-Hsuan Liu, Howard Muchen Hsu, Cheng-Hsun Hsieh, Sheng-Fu Liang, S. Hsieh, Cheng-Ta Yang","doi":"10.1109/VCIP49819.2020.9301801","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301801","url":null,"abstract":"Virtual reality (VR), a computer-generated interactive environment, is provided to a user by projecting a peripheral image onto environmental surfaces. VR has an advantage of enhancing the immersive experience. Nowadays, VR has been widely applied in tourism and cultural experience. On the other hand, a recent integration of electroencephalography-based (EEG-based) brain-computer interface (BCI) and VR is capable of promoting the immersive virtual experience. Therefore, our study aims to propose an integrative framework to implement EEG-based BCI in a VR game to advance the cultural experience. A room escape game in a Tainan temple is created. EEG signals arc recorded while users arc playing the game. The online analyses of EEG signals arc used to interact with the VR display. This integrative framework can result in a better experience than the conventional setup.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126261817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Hough-Based Multibeamlet Transform","authors":"A. Lisowska","doi":"10.1109/VCIP49819.2020.9301812","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301812","url":null,"abstract":"There are plenty of geometrical multiresolution transforms devoted to efficient edge representation. However, they have two drawbacks. The first one is that such transforms represent mono edge models. And the second one is that they are often based on approximations which are optimal according to the Mean Square Error what does not necessarily lead to optimal edge approximation. In this paper the multibeamlet transform based on the Hough transform is proposed. This transform is defined to properly detect multiedges present in images. Next, the method of image approximation with the use of the multibeamlet transform is described. Additionally, the modified bottom-up tree pruning algorithm is presented in order to properly approximate images with the use of multibeamlets. As follows from the performed experiments, this approach leads to image approximations with better quality than the state-of-the-art geometrical multiresolution transforms.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129055012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}