2019 IEEE Visual Communications and Image Processing (VCIP)最新文献_第9页

No-Reference Stereoscopic Image Quality Assessment Based on Dilation Convolution 基于膨胀卷积的无参考立体图像质量评价

2019 IEEE Visual Communications and Image Processing (VCIP) Pub Date : 2019-12-01 DOI: 10.1109/VCIP47243.2019.8966075

Ping Zhao, Sumei Li, Yongli Chang

引用次数: 2

Light Field Reconstruction Based on Compressed Sensing via Deep Learning 基于深度学习压缩感知的光场重建

2019 IEEE Visual Communications and Image Processing (VCIP) Pub Date : 2019-12-01 DOI: 10.1109/VCIP47243.2019.8965747

Linhui Wei, Yu Liu, Yumei Wang

引用次数: 1

Frequency Descriptor based Light Field Depth Estimation 基于频率描述子的光场深度估计

2019 IEEE Visual Communications and Image Processing (VCIP) Pub Date : 2019-12-01 DOI: 10.1109/VCIP47243.2019.8965944

Junke Li, Xin Jin

引用次数: 1

Vehicle Re-Identification: Logistic Triplet Embedding Regularized by Label Smoothing 车辆再识别:标签平滑正则化的Logistic三元组嵌入

2019 IEEE Visual Communications and Image Processing (VCIP) Pub Date : 2019-12-01 DOI: 10.1109/VCIP47243.2019.8965834

Chenggang Li, Yinhao Wang, Zhicheng Zhao, Fei Su

引用次数: 2

Privacy-Preserving Fall Detection with Deep Learning on mmWave Radar Signal 基于毫米波雷达信号的深度学习保护跌倒检测

2019 IEEE Visual Communications and Image Processing (VCIP) Pub Date : 2019-12-01 DOI: 10.1109/VCIP47243.2019.8965661

Yangfan Sun, Renlong Hang, Zhu Li, M. Jin, Kelvin Xu

{"title":"Privacy-Preserving Fall Detection with Deep Learning on mmWave Radar Signal","authors":"Yangfan Sun, Renlong Hang, Zhu Li, M. Jin, Kelvin Xu","doi":"10.1109/VCIP47243.2019.8965661","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8965661","url":null,"abstract":"Fall is one of the main reasons for body injuries among seniors. Traditional fall detection methods are mainly achieved by wearable and non-wearable techniques, which may cause skin discomfort or invasion of privacy to users. In this paper, we propose an automatic fall detection method with the assist of the mmWave radar signal to solve the aforementioned issues. The radar devices are capable to record the reflection from objects in both the spatial and temporal domain, which can be used to depict the activities of users with the support of a recurrent neural network (RNN) with long-short-term memory (LSTM) units. First, we employ the radar low-dimension embedding (RLDE) algorithm to preprocess the Range-angle reflection heatmap sequence converted from the raw radar signal for reducing the redundancy in the spatial domain. Then, the processed sequence is split into frames for inputting LSTM units one by one. Eventually, the output from the last LSTM unit is fed in a Softmax layer for classifying different activities. To validate the effectiveness of our proposed method, we construct a radar dataset with the assist of market radar module devices, to implement several experiments. The experimental results demonstrate that, compared to LSTM only and the widely used 3-D convolutional neural network (3-D CNN), combining RLDE and LSTM can achieve the best detection results with much less computational time consumption. In addition, we extend the proposed method to classify multiple human activities simultaneously and the satisfied performances are observed.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128415877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

A Multiple Triplet-Ranking Model for Fine-Grained Sketch-Based Image Retrieval 基于细粒度草图的图像检索的多重三重排序模型

2019 IEEE Visual Communications and Image Processing (VCIP) Pub Date : 2019-12-01 DOI: 10.1109/VCIP47243.2019.8965842

Jingyi Xue, Yun Zhou, Zhuqing Jiang, Yao Xie, Xiaoyu Li

引用次数: 1

Unpaired Images based Generator Architecture for Facial Expression Recognition 基于非配对图像的面部表情识别生成器体系结构

2019 IEEE Visual Communications and Image Processing (VCIP) Pub Date : 2019-12-01 DOI: 10.1109/VCIP47243.2019.8965689

Xi Zhang, Feifei Zhang, Changsheng Xu

{"title":"Unpaired Images based Generator Architecture for Facial Expression Recognition","authors":"Xi Zhang, Feifei Zhang, Changsheng Xu","doi":"10.1109/VCIP47243.2019.8965689","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8965689","url":null,"abstract":"Facial expression recognition (FER) is a challenging task due to the lack of sufficient training data. Most conventional approaches usually rotate or flip the images for data augmentation. More recently, numerous methods synthesize images automatically by using Generative Adversarial Network (GAN). However, paired images are always required in these methods. Different from existing methods, in this paper, we propose an end-to-end deep learning model for simultaneous facial expression synthesis and facial expression recognition. In our method, paired images are not required, which makes the proposed model much more flexible and general. Furthermore, different expressions are encoded in a disentangled manner in a latent space, which enables us to generate facial images with arbitrary expressions by exchanging certain parts of their latent identity features. Finally, the facial expression synthesis and facial expression recognition tasks can further boost their performance for each other via our model. Quantitative and qualitative evaluations on both controlled and in-the-wild datasets demonstrate that the proposed method performs favorably against state-of-the-art methods.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134298356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Adaptive CU Split Decision with Pooling-variable CNN for VVC Intra Encoding 基于池变量CNN的VVC编码自适应分割决策

2019 IEEE Visual Communications and Image Processing (VCIP) Pub Date : 2019-12-01 DOI: 10.1109/VCIP47243.2019.8965679

Genwei Tang, Ming-e Jing, Xiaoyang Zeng, Yibo Fan

{"title":"Adaptive CU Split Decision with Pooling-variable CNN for VVC Intra Encoding","authors":"Genwei Tang, Ming-e Jing, Xiaoyang Zeng, Yibo Fan","doi":"10.1109/VCIP47243.2019.8965679","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8965679","url":null,"abstract":"In the versatile video coding (VVC) proposed by the Joint Video Exploration Team (JVET), the quad-tree with the nested multi-type tree (QTMT) partition scheme has been adopted based on the quadtree structure in the high efficiency video coding (HEVC). The video coding quality of VVC is better than the HEVC, but the algorithm complexity has also increased greatly. In this work, we present an adaptive CU split decision for intra frame with the pooling-variable convolutional neural network (CNN), targeting at various coding unit (CU) shape. The shape-adaptive CNN is realized by the variable pooling layer size where we can make the most of the pooling layer in CNN and retain the original information. Based on the proposed CNN, the CU split or not will be decided by only one trained network, same architecture and parameters for the CUs with multiple sizes. Moreover, with the proposed shape-based CNN training scheme, the various training sample size can be processed successfully. The CUbased network can avoid the full rate-distortion optimization for the CU split and the CU-level rate control can also be enabled. The experiment results show that the proposed method can save 33% coding time with only 0.99% Bjontegaard Delta bitrate (BD-rate) increase.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122369937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 51

Spherical Video Coding With Motion Vector Modulation to Account For Camera Motion 球面视频编码与运动矢量调制，以说明相机的运动

2019 IEEE Visual Communications and Image Processing (VCIP) Pub Date : 2019-12-01 DOI: 10.1109/VCIP47243.2019.8966083

B. Vishwanath, K. Rose

{"title":"Spherical Video Coding With Motion Vector Modulation to Account For Camera Motion","authors":"B. Vishwanath, K. Rose","doi":"10.1109/VCIP47243.2019.8966083","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8966083","url":null,"abstract":"Emerging immersive multimedia applications critically depend on efficient compression of spherical (360-degree) videos. Current approaches project spherical video onto planes for coding with standard codecs, without accounting for the properties of spherical video, a severe sub-optimality that motivates this work. A common type of spherical video is dominated by camera translation. We recently proposed a powerful motion compensation technique for such videos which builds on the observation that, with camera translation, stationary points are perceived as moving along geodesics that meet at the point where the camera translation vector intersects the sphere. However, the approach follows standard coding procedures and translates all pixels in a block by the same amount on their respective geodesics, which is sub-optimal. This paper analyzes the appropriate rate of translation along geodesics and its dependence on the elevation of a pixel on the sphere with respect to the camera velocity pole. The analysis leads to a new approach that modulates the effective motion vectors within a block such that they perfectly capture the perceived individual motion of each pixel. Consistent gains in the experiments provide evidence for the efficacy of the proposed approach.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130752662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Identifying and Pruning Redundant Structures for Deep Neural Networks 深度神经网络冗余结构的识别与修剪

2019 IEEE Visual Communications and Image Processing (VCIP) Pub Date : 2019-12-01 DOI: 10.1109/VCIP47243.2019.8966025

Wenyao Gan, Li Song, Li Chen, Rong Xie, Xiao Gu

引用次数: 0