{"title":"A Novel Visual Analysis Oriented Rate Control Scheme for HEVC","authors":"Qi Zhang, Shanshe Wang, Siwei Ma","doi":"10.1109/VCIP49819.2020.9301817","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301817","url":null,"abstract":"Recent years have witnessed an explosion of machine visual intelligence. While impressive performance on visual analysis has been achieved by powerful Deep-Learning-based models, the texture and feature distortion caused by image and video coding is becoming a challenge in practical situations. In this paper, a new rate control scheme is proposed to improve visual analysis performance on coded video frames. Firstly, a new kind of visual analysis distortion is introduced to build a Rate-Joint-Distortion model. Secondly, the Rate-Joint-Distortion Optimization problem is solved by using Lagrange multiplier method, and the relationship between rate and Lagrange multiplier λ is described by a hyperbolic model. Thirdly, a logarithmic λ − QP model is established to achieve minimum Rate-Joint-Distortion cost for given λs. The experimental results show that the proposed scheme can improve visual analysis performance with stable bits used for coding.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114042014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Drone-Based Car Counting via Density Map Learning","authors":"Jingxian Huang, Guanchen Ding, Yujia Guo, Daiqin Yang, Sihan Wang, Tao Wang, Yunfei Zhang","doi":"10.1109/VCIP49819.2020.9301785","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301785","url":null,"abstract":"Car counting on drone-based images is a challenging task in computer vision. Most advanced methods for counting are based on density maps. Usually, density maps are first generated by convolving ground truth point maps with a Gaussian kernel for later model learning (generation). Then, the counting network learns to predict density maps from input images (estimation). Most studies focus on the estimation problem while overlooking the generation problem. In this paper, a training framework is proposed to generate density maps by learning and train generation and estimation subnetworks jointly. Experiments demonstrate that our method outperforms other density map-based methods and shows the best performance on drone-based car counting.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123180446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"VCIP 2020 Index","authors":"","doi":"10.1109/vcip49819.2020.9301896","DOIUrl":"https://doi.org/10.1109/vcip49819.2020.9301896","url":null,"abstract":"","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129770108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"UGNet: Underexposed Images Enhancement Network based on Global Illumination Estimation","authors":"Yuan Fang, Wenzhe Zhu, Qing Zhu","doi":"10.1109/VCIP49819.2020.9301810","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301810","url":null,"abstract":"This paper proposes a new neural network for enhancing underexposed images. Instead of the decomposition method based on Retinex theory, we introduce smooth dilated convolution to estimate global illumination of the input image, and implement an end-to-end learning network model. Based on this model, we formulate a multi-term loss function that combines content, color, texture and smoothness losses. Our extensive experiments demonstrate that this method is superior to other methods in underexposed image enhancement. It can cover more color details and be applied to various underexposed images robustly.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126852375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Dense-Gated U-Net for Brain Lesion Segmentation","authors":"Zhongyi Ji, Xiao Han, Tong Lin, Wenmin Wang","doi":"10.1109/VCIP49819.2020.9301852","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301852","url":null,"abstract":"Brain lesion segmentation plays a crucial role in diagnosis and monitoring of disease progression. DenseNets have been widely used for medical image segmentation, but much redundancy arises in dense-connected feature maps and the training process becomes harder. In this paper, we address the brain lesion segmentation task by proposing a Dense-Gated U-Net (DGNet), which is a hybrid of Dense-gated blocks and U-Net. The main contribution lies in the dense-gated blocks that explicitly model dependencies among concatenated layers and alleviate redundancy. Based on dense-gated blocks, DGNet can achieve weighted concatenation and suppress useless features. Extensive experiments on MICCAI BraTS 2018 challenge and our collected intracranial hemorrhage dataset demonstrate that our approach outperforms a powerful backbone model and other state-of-the-art methods.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126715838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive Resolution Change for Versatile Video Coding","authors":"Tsui-Shan Chang, Yu-Chen Sun, Ling Zhu, J.-G. Lou","doi":"10.1109/VCIP49819.2020.9301762","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301762","url":null,"abstract":"This paper presents an adaptive resolution change (ARC) method adopted in versatile video coding (VVC) to adapt the video bit-stream transmission to dynamic network environments. This approach enables resolution changes within a video sequence at any frame without the insertion of an instantaneous decoder refresh (IDR) or intra random access picture (IRAP). The underlying techniques include reference picture resampling and handling of interactions between the existing coding tools and the changes in resolution. In addition to the techniques adopted in VVC, this paper proposes two techniques for temporal motion vector prediction and deblocking filter to further improve both subjective and objective quality. The experimental results show that the combined ARC method can prevent the burden on bit cost exerted by the insertion of an intra frame during resolution changes. At the same time, 18%, 21% and 21% BD-rate reductions are achieved for Y, Cb, and Cr components, respectively.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127958452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"News Image Steganography: A Novel Architecture Facilitates the Fake News Identification","authors":"Jizhe Zhou, Chi-Man Pun, Yu Tong","doi":"10.1109/VCIP49819.2020.9301846","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301846","url":null,"abstract":"A larger portion of fake news quotes untampered images from other sources with ulterior motives rather than conducting image forgery. Such elaborate engraftments keep the inconsistency between images and text reports stealthy, thereby, palm off the spurious for the genuine. This paper proposes an architecture named News Image Steganography (NIS) to reveal the aforementioned inconsistency through image steganography based on GAN. Extractive summarization about a news image is generated based on its source texts, and a learned steganographic algorithm encodes and decodes the summarization of the image in a manner that approaches perceptual invisibility. Once an encoded image is quoted, its source summarization can be decoded and further presented as the ground truth to verify the quoting news. The pairwise encoder and decoder endow images of the capability to carry along their imperceptible summarization. Our NIS reveals the underlying inconsistency, thereby, according to our experiments and investigations, contributes to the identification accuracy of fake news that engrafts untampered images.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127554923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"NIR image colorization with graph-convolutional neural networks","authors":"D. Valsesia, Giulia Fracastoro, E. Magli","doi":"10.1109/VCIP49819.2020.9301839","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301839","url":null,"abstract":"Colorization of near-infrared (NIR) images is a challenging problem due to the different material properties at the infared wavelenghts, thus reducing the correlation with visible images. In this paper, we study how graph-convolutional neural networks allow exploiting a more powerful inductive bias than standard CNNs, in the form of non-local self-similiarity. Its impact is evaluated by showing how training with mean squared error only as loss leads to poor results with a standard CNN, while the graph-convolutional network produces significantly sharper and more realistic colorizations.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133747944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Low Resolution Facial Manipulation Detection","authors":"Xiao Han, Zhongyi Ji, Wenmin Wang","doi":"10.1109/VCIP49819.2020.9301796","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301796","url":null,"abstract":"Detecting manipulated images and videos is an important aspect of digital media forensics. Due to severe discriminative information loss caused by resolution degradation, the performance of most existing methods is significantly reduced on low resolution manipulated images. To address this issue, we propose an Artifacts-Focus Super-Resolution (AFSR) module and a Two-stream Feature Extractor (TFE). The AFSR recovers facial cues and manipulation artifact details using an autoencoder learned with an artifacts focus training loss. The TFE adopts a two-stream feature extractor with key points-based fusion pooling to learn discriminative facial representations. These two complementary modules are jointly trained to recover and capture distinctive manipulation artifacts in low resolution images. Extensive experiments on two benchmarks including FaceForensics++ and DeepfakeTIMIT, evidence the favorable performance of our method against other state-of-the-art methods.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117008684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zaichao Sun, G. Qian, Zhaoyu Peng, Weiju Dai, Dongjun Sun, Gongyuan Zhang, Nongtao Zhang, Jun Xu, Ren Wang, Chunlin Li
{"title":"Orthogonal Coded Multi-view Structured Light for Inter-view Interference Elimination","authors":"Zaichao Sun, G. Qian, Zhaoyu Peng, Weiju Dai, Dongjun Sun, Gongyuan Zhang, Nongtao Zhang, Jun Xu, Ren Wang, Chunlin Li","doi":"10.1109/VCIP49819.2020.9301891","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301891","url":null,"abstract":"Rapid 3D reconstruction of dynamic scenes is very useful in 3D object structure analysis, accident avoidance for UAV, and other visual applications. Against dynamic scenes, coded structured light methods have been proposed to obtain the depth information of an object in 3D world, and most of them are based on spatial codification. A brutal truth is that two or more cameras and projectors from different viewpoints are needed to measure the dynamic scene simultaneously for rapid 3D reconstruction. However, when two traditional patterns, especially the binaries, are mutually overlapped, interference between them arises to a new challenge to 3D reconstruction. Traditional patterns can hardly be separated from each other, which surely influence the quality of the 3D reconstruction. To eliminate the interference problem, we propose a scheme of orthogonal coded multi-view structured light systems, which can obtain accurate of depth maps for a scene. Besides, we also test the stability of the orthogonal patterns by establishing three different scenes and making a comparisons to traditional patterns. New state-of-the-art results can be obtained by our scheme in the experiments.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131689406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}