{"title":"No-Reference Stereoscopic Image Quality Assessment Based on Dilation Convolution","authors":"Ping Zhao, Sumei Li, Yongli Chang","doi":"10.1109/VCIP47243.2019.8966075","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8966075","url":null,"abstract":"Over the years, with the popularization of 3D technology, the demands of accurate and efficient 3D image quality evaluation (SIQA) methods are increasing constantly. Due to the wide application of CNN, CNN-based SIQA methods emerge one after another. However, current methods only consider a single scale or resolution, and some CNN-based methods directly take left and right views as an input of the network ignoring the visual fusion mechanism. In this work, a multi-scale no-reference SIQA method is proposed based on dilation convolution neural network (DCNN). Different from other CNN-based SIQA methods, the proposed one uses dilation convolution to imitate different scale of information processing fields in the human brain. Instead of left or right image, the cyclopean image generated by a new method is used as the input of the network. Moreover, the proposed multi-scale unit significantly can reduce computational parameters and computational complexity. Experimental results on two public databases show that the proposed model is superior to the state-of-the-art no-reference SIQA methods.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122749779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Light Field Reconstruction Based on Compressed Sensing via Deep Learning","authors":"Linhui Wei, Yu Liu, Yumei Wang","doi":"10.1109/VCIP47243.2019.8965747","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8965747","url":null,"abstract":"The light field has excellent application prospects in immersive media because of the abundant information of the light. Due to the sparsity and redundancy in light field images, light field reconstruction based on compressed sensing is used to recover light field images from only a few measurements. And the light field compressed sensing usually optimizes the measurement matrix and the dictionary and processes each of the light field images separately. Since the high similarity of light field images, the different viewpoints of images can be stacked together and formed as a 4D tensor. In this paper, we propose tensor based on compressed sensing (TCS) method to yield measurements with common characteristics. Besides, a better deep learning network is designed for TCS, the measurement matrix optimization and image reconstruction will be performed simultaneously. Experimental results show that the proposed method gets at least 3 dB gain in PSNR and outperforms state-of-the-art in the reconstruction quality.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123774201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Frequency Descriptor based Light Field Depth Estimation","authors":"Junke Li, Xin Jin","doi":"10.1109/VCIP47243.2019.8965944","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8965944","url":null,"abstract":"Depth estimation plays an important role in light field data processing. However, conventional focus measurement based approaches fail at the angular patches containing occlusion boundaries. In this paper, a novel depth estimation algorithm is proposed based on frequency descriptors. On the basis of the imaging process analysis, we propose to first perform the occlusion discrimination and edge orientation extraction in the frequency domain for the spatial patch from the central sub-aperture image. Then, according to the occlusion orientation, a variable-block-size angular patch is selected in the normal direction to construct the frequency descriptors for focus measurement in the focal stack. Experimental results demonstrate superior performance of the proposed method in robustness and depth accuracy.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126562600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Vehicle Re-Identification: Logistic Triplet Embedding Regularized by Label Smoothing","authors":"Chenggang Li, Yinhao Wang, Zhicheng Zhao, Fei Su","doi":"10.1109/VCIP47243.2019.8965834","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8965834","url":null,"abstract":"The explosive increasing of vehicles cause amount of traffic problems. Although vehicle re-identification (Re-ID) can help to acquire and manage vehicles, some intrinsic difficulties hinder the application of vehicle Re-ID. For example, vehicles have little inter-instance discrepancy due to their rigid structures and finite models. To address this problem, in this paper, a logistic triplet loss is proposed to fuse a label-smoothing cross entropy to extract fine-grained feature embeddings. Via exploring deeper into the inter-instance variances, the novel loss combines advantages of classification and metric learning, and reveals more stable performance than popular triplet loss. The experimental results on public datasets demonstrate the effectiveness of the proposed loss compared with state-of-the-art approaches.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125216938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yangfan Sun, Renlong Hang, Zhu Li, M. Jin, Kelvin Xu
{"title":"Privacy-Preserving Fall Detection with Deep Learning on mmWave Radar Signal","authors":"Yangfan Sun, Renlong Hang, Zhu Li, M. Jin, Kelvin Xu","doi":"10.1109/VCIP47243.2019.8965661","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8965661","url":null,"abstract":"Fall is one of the main reasons for body injuries among seniors. Traditional fall detection methods are mainly achieved by wearable and non-wearable techniques, which may cause skin discomfort or invasion of privacy to users. In this paper, we propose an automatic fall detection method with the assist of the mmWave radar signal to solve the aforementioned issues. The radar devices are capable to record the reflection from objects in both the spatial and temporal domain, which can be used to depict the activities of users with the support of a recurrent neural network (RNN) with long-short-term memory (LSTM) units. First, we employ the radar low-dimension embedding (RLDE) algorithm to preprocess the Range-angle reflection heatmap sequence converted from the raw radar signal for reducing the redundancy in the spatial domain. Then, the processed sequence is split into frames for inputting LSTM units one by one. Eventually, the output from the last LSTM unit is fed in a Softmax layer for classifying different activities. To validate the effectiveness of our proposed method, we construct a radar dataset with the assist of market radar module devices, to implement several experiments. The experimental results demonstrate that, compared to LSTM only and the widely used 3-D convolutional neural network (3-D CNN), combining RLDE and LSTM can achieve the best detection results with much less computational time consumption. In addition, we extend the proposed method to classify multiple human activities simultaneously and the satisfied performances are observed.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128415877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jingyi Xue, Yun Zhou, Zhuqing Jiang, Yao Xie, Xiaoyu Li
{"title":"A Multiple Triplet-Ranking Model for Fine-Grained Sketch-Based Image Retrieval","authors":"Jingyi Xue, Yun Zhou, Zhuqing Jiang, Yao Xie, Xiaoyu Li","doi":"10.1109/VCIP47243.2019.8965842","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8965842","url":null,"abstract":"Fine-grained sketch-based image retrieval (FG-SBIR) addresses the problem of matching an input sketch with a specific photo containing the same instance. The key challenge of learning a FG-SBIR model is to bridge the domain gap between photo and sketch. Most existing approaches build a joint embedding space where two domains can be directly compared. They only focus on the highly abstract features in final fully connected (FC) layer, ignore some low-level semantic concepts in convolutional layers. In this paper, we propose a multiple triplet-ranking model in FG-SBIR task. Specially, we introduce an auxiliary supervision loss function in the convolutional layer, and we use the fusion of features from convolutional layer and final FC layer to build the joint embedding space. Extensive experiments show that the proposed multiple triplet-ranking model significantly outperforms the state-of-the-art.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131502704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unpaired Images based Generator Architecture for Facial Expression Recognition","authors":"Xi Zhang, Feifei Zhang, Changsheng Xu","doi":"10.1109/VCIP47243.2019.8965689","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8965689","url":null,"abstract":"Facial expression recognition (FER) is a challenging task due to the lack of sufficient training data. Most conventional approaches usually rotate or flip the images for data augmentation. More recently, numerous methods synthesize images automatically by using Generative Adversarial Network (GAN). However, paired images are always required in these methods. Different from existing methods, in this paper, we propose an end-to-end deep learning model for simultaneous facial expression synthesis and facial expression recognition. In our method, paired images are not required, which makes the proposed model much more flexible and general. Furthermore, different expressions are encoded in a disentangled manner in a latent space, which enables us to generate facial images with arbitrary expressions by exchanging certain parts of their latent identity features. Finally, the facial expression synthesis and facial expression recognition tasks can further boost their performance for each other via our model. Quantitative and qualitative evaluations on both controlled and in-the-wild datasets demonstrate that the proposed method performs favorably against state-of-the-art methods.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134298356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive CU Split Decision with Pooling-variable CNN for VVC Intra Encoding","authors":"Genwei Tang, Ming-e Jing, Xiaoyang Zeng, Yibo Fan","doi":"10.1109/VCIP47243.2019.8965679","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8965679","url":null,"abstract":"In the versatile video coding (VVC) proposed by the Joint Video Exploration Team (JVET), the quad-tree with the nested multi-type tree (QTMT) partition scheme has been adopted based on the quadtree structure in the high efficiency video coding (HEVC). The video coding quality of VVC is better than the HEVC, but the algorithm complexity has also increased greatly. In this work, we present an adaptive CU split decision for intra frame with the pooling-variable convolutional neural network (CNN), targeting at various coding unit (CU) shape. The shape-adaptive CNN is realized by the variable pooling layer size where we can make the most of the pooling layer in CNN and retain the original information. Based on the proposed CNN, the CU split or not will be decided by only one trained network, same architecture and parameters for the CUs with multiple sizes. Moreover, with the proposed shape-based CNN training scheme, the various training sample size can be processed successfully. The CUbased network can avoid the full rate-distortion optimization for the CU split and the CU-level rate control can also be enabled. The experiment results show that the proposed method can save 33% coding time with only 0.99% Bjontegaard Delta bitrate (BD-rate) increase.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122369937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spherical Video Coding With Motion Vector Modulation to Account For Camera Motion","authors":"B. Vishwanath, K. Rose","doi":"10.1109/VCIP47243.2019.8966083","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8966083","url":null,"abstract":"Emerging immersive multimedia applications critically depend on efficient compression of spherical (360-degree) videos. Current approaches project spherical video onto planes for coding with standard codecs, without accounting for the properties of spherical video, a severe sub-optimality that motivates this work. A common type of spherical video is dominated by camera translation. We recently proposed a powerful motion compensation technique for such videos which builds on the observation that, with camera translation, stationary points are perceived as moving along geodesics that meet at the point where the camera translation vector intersects the sphere. However, the approach follows standard coding procedures and translates all pixels in a block by the same amount on their respective geodesics, which is sub-optimal. This paper analyzes the appropriate rate of translation along geodesics and its dependence on the elevation of a pixel on the sphere with respect to the camera velocity pole. The analysis leads to a new approach that modulates the effective motion vectors within a block such that they perfectly capture the perceived individual motion of each pixel. Consistent gains in the experiments provide evidence for the efficacy of the proposed approach.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130752662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Identifying and Pruning Redundant Structures for Deep Neural Networks","authors":"Wenyao Gan, Li Song, Li Chen, Rong Xie, Xiao Gu","doi":"10.1109/VCIP47243.2019.8966025","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8966025","url":null,"abstract":"Deep convolutional neural networks have achieved considerable success in the field of computer vision. However, it is difficult to deploy state-of-the-art models on resource-constrained platforms due to their high storage, memory bandwidth, and computational costs. In this paper, we propose a structured pruning method which employs a three-step process to reduce the resource consumption of neural networks. First, we train an initial network on the training set and evaluate it on the validation set. Next, we introduce an iterative pruning and fine-tuning algorithm to identify and prune redundant structures, which results in a pruned network with a compact architecture. Finally, we train the pruned network from scratch on both the training set and validation set to obtain the final accuracy on the test set. In the experiments, our pruning method significantly reduces the model size (by 87.2% on CIFAR-10), saves inference time (53.3% on CIFAR-10), and achieves better performance as compared to recent state-of-the-art methods.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121528636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}