2019 IEEE Visual Communications and Image Processing (VCIP)最新文献

筛选
英文 中文
No-Reference Stereoscopic Image Quality Assessment Based on Dilation Convolution 基于膨胀卷积的无参考立体图像质量评价
2019 IEEE Visual Communications and Image Processing (VCIP) Pub Date : 2019-12-01 DOI: 10.1109/VCIP47243.2019.8966075
Ping Zhao, Sumei Li, Yongli Chang
{"title":"No-Reference Stereoscopic Image Quality Assessment Based on Dilation Convolution","authors":"Ping Zhao, Sumei Li, Yongli Chang","doi":"10.1109/VCIP47243.2019.8966075","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8966075","url":null,"abstract":"Over the years, with the popularization of 3D technology, the demands of accurate and efficient 3D image quality evaluation (SIQA) methods are increasing constantly. Due to the wide application of CNN, CNN-based SIQA methods emerge one after another. However, current methods only consider a single scale or resolution, and some CNN-based methods directly take left and right views as an input of the network ignoring the visual fusion mechanism. In this work, a multi-scale no-reference SIQA method is proposed based on dilation convolution neural network (DCNN). Different from other CNN-based SIQA methods, the proposed one uses dilation convolution to imitate different scale of information processing fields in the human brain. Instead of left or right image, the cyclopean image generated by a new method is used as the input of the network. Moreover, the proposed multi-scale unit significantly can reduce computational parameters and computational complexity. Experimental results on two public databases show that the proposed model is superior to the state-of-the-art no-reference SIQA methods.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122749779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Light Field Reconstruction Based on Compressed Sensing via Deep Learning 基于深度学习压缩感知的光场重建
2019 IEEE Visual Communications and Image Processing (VCIP) Pub Date : 2019-12-01 DOI: 10.1109/VCIP47243.2019.8965747
Linhui Wei, Yu Liu, Yumei Wang
{"title":"Light Field Reconstruction Based on Compressed Sensing via Deep Learning","authors":"Linhui Wei, Yu Liu, Yumei Wang","doi":"10.1109/VCIP47243.2019.8965747","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8965747","url":null,"abstract":"The light field has excellent application prospects in immersive media because of the abundant information of the light. Due to the sparsity and redundancy in light field images, light field reconstruction based on compressed sensing is used to recover light field images from only a few measurements. And the light field compressed sensing usually optimizes the measurement matrix and the dictionary and processes each of the light field images separately. Since the high similarity of light field images, the different viewpoints of images can be stacked together and formed as a 4D tensor. In this paper, we propose tensor based on compressed sensing (TCS) method to yield measurements with common characteristics. Besides, a better deep learning network is designed for TCS, the measurement matrix optimization and image reconstruction will be performed simultaneously. Experimental results show that the proposed method gets at least 3 dB gain in PSNR and outperforms state-of-the-art in the reconstruction quality.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123774201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Frequency Descriptor based Light Field Depth Estimation 基于频率描述子的光场深度估计
2019 IEEE Visual Communications and Image Processing (VCIP) Pub Date : 2019-12-01 DOI: 10.1109/VCIP47243.2019.8965944
Junke Li, Xin Jin
{"title":"Frequency Descriptor based Light Field Depth Estimation","authors":"Junke Li, Xin Jin","doi":"10.1109/VCIP47243.2019.8965944","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8965944","url":null,"abstract":"Depth estimation plays an important role in light field data processing. However, conventional focus measurement based approaches fail at the angular patches containing occlusion boundaries. In this paper, a novel depth estimation algorithm is proposed based on frequency descriptors. On the basis of the imaging process analysis, we propose to first perform the occlusion discrimination and edge orientation extraction in the frequency domain for the spatial patch from the central sub-aperture image. Then, according to the occlusion orientation, a variable-block-size angular patch is selected in the normal direction to construct the frequency descriptors for focus measurement in the focal stack. Experimental results demonstrate superior performance of the proposed method in robustness and depth accuracy.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126562600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Vehicle Re-Identification: Logistic Triplet Embedding Regularized by Label Smoothing 车辆再识别:标签平滑正则化的Logistic三元组嵌入
2019 IEEE Visual Communications and Image Processing (VCIP) Pub Date : 2019-12-01 DOI: 10.1109/VCIP47243.2019.8965834
Chenggang Li, Yinhao Wang, Zhicheng Zhao, Fei Su
{"title":"Vehicle Re-Identification: Logistic Triplet Embedding Regularized by Label Smoothing","authors":"Chenggang Li, Yinhao Wang, Zhicheng Zhao, Fei Su","doi":"10.1109/VCIP47243.2019.8965834","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8965834","url":null,"abstract":"The explosive increasing of vehicles cause amount of traffic problems. Although vehicle re-identification (Re-ID) can help to acquire and manage vehicles, some intrinsic difficulties hinder the application of vehicle Re-ID. For example, vehicles have little inter-instance discrepancy due to their rigid structures and finite models. To address this problem, in this paper, a logistic triplet loss is proposed to fuse a label-smoothing cross entropy to extract fine-grained feature embeddings. Via exploring deeper into the inter-instance variances, the novel loss combines advantages of classification and metric learning, and reveals more stable performance than popular triplet loss. The experimental results on public datasets demonstrate the effectiveness of the proposed loss compared with state-of-the-art approaches.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125216938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Privacy-Preserving Fall Detection with Deep Learning on mmWave Radar Signal 基于毫米波雷达信号的深度学习保护跌倒检测
2019 IEEE Visual Communications and Image Processing (VCIP) Pub Date : 2019-12-01 DOI: 10.1109/VCIP47243.2019.8965661
Yangfan Sun, Renlong Hang, Zhu Li, M. Jin, Kelvin Xu
{"title":"Privacy-Preserving Fall Detection with Deep Learning on mmWave Radar Signal","authors":"Yangfan Sun, Renlong Hang, Zhu Li, M. Jin, Kelvin Xu","doi":"10.1109/VCIP47243.2019.8965661","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8965661","url":null,"abstract":"Fall is one of the main reasons for body injuries among seniors. Traditional fall detection methods are mainly achieved by wearable and non-wearable techniques, which may cause skin discomfort or invasion of privacy to users. In this paper, we propose an automatic fall detection method with the assist of the mmWave radar signal to solve the aforementioned issues. The radar devices are capable to record the reflection from objects in both the spatial and temporal domain, which can be used to depict the activities of users with the support of a recurrent neural network (RNN) with long-short-term memory (LSTM) units. First, we employ the radar low-dimension embedding (RLDE) algorithm to preprocess the Range-angle reflection heatmap sequence converted from the raw radar signal for reducing the redundancy in the spatial domain. Then, the processed sequence is split into frames for inputting LSTM units one by one. Eventually, the output from the last LSTM unit is fed in a Softmax layer for classifying different activities. To validate the effectiveness of our proposed method, we construct a radar dataset with the assist of market radar module devices, to implement several experiments. The experimental results demonstrate that, compared to LSTM only and the widely used 3-D convolutional neural network (3-D CNN), combining RLDE and LSTM can achieve the best detection results with much less computational time consumption. In addition, we extend the proposed method to classify multiple human activities simultaneously and the satisfied performances are observed.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128415877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
A Multiple Triplet-Ranking Model for Fine-Grained Sketch-Based Image Retrieval 基于细粒度草图的图像检索的多重三重排序模型
2019 IEEE Visual Communications and Image Processing (VCIP) Pub Date : 2019-12-01 DOI: 10.1109/VCIP47243.2019.8965842
Jingyi Xue, Yun Zhou, Zhuqing Jiang, Yao Xie, Xiaoyu Li
{"title":"A Multiple Triplet-Ranking Model for Fine-Grained Sketch-Based Image Retrieval","authors":"Jingyi Xue, Yun Zhou, Zhuqing Jiang, Yao Xie, Xiaoyu Li","doi":"10.1109/VCIP47243.2019.8965842","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8965842","url":null,"abstract":"Fine-grained sketch-based image retrieval (FG-SBIR) addresses the problem of matching an input sketch with a specific photo containing the same instance. The key challenge of learning a FG-SBIR model is to bridge the domain gap between photo and sketch. Most existing approaches build a joint embedding space where two domains can be directly compared. They only focus on the highly abstract features in final fully connected (FC) layer, ignore some low-level semantic concepts in convolutional layers. In this paper, we propose a multiple triplet-ranking model in FG-SBIR task. Specially, we introduce an auxiliary supervision loss function in the convolutional layer, and we use the fusion of features from convolutional layer and final FC layer to build the joint embedding space. Extensive experiments show that the proposed multiple triplet-ranking model significantly outperforms the state-of-the-art.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131502704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Unpaired Images based Generator Architecture for Facial Expression Recognition 基于非配对图像的面部表情识别生成器体系结构
2019 IEEE Visual Communications and Image Processing (VCIP) Pub Date : 2019-12-01 DOI: 10.1109/VCIP47243.2019.8965689
Xi Zhang, Feifei Zhang, Changsheng Xu
{"title":"Unpaired Images based Generator Architecture for Facial Expression Recognition","authors":"Xi Zhang, Feifei Zhang, Changsheng Xu","doi":"10.1109/VCIP47243.2019.8965689","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8965689","url":null,"abstract":"Facial expression recognition (FER) is a challenging task due to the lack of sufficient training data. Most conventional approaches usually rotate or flip the images for data augmentation. More recently, numerous methods synthesize images automatically by using Generative Adversarial Network (GAN). However, paired images are always required in these methods. Different from existing methods, in this paper, we propose an end-to-end deep learning model for simultaneous facial expression synthesis and facial expression recognition. In our method, paired images are not required, which makes the proposed model much more flexible and general. Furthermore, different expressions are encoded in a disentangled manner in a latent space, which enables us to generate facial images with arbitrary expressions by exchanging certain parts of their latent identity features. Finally, the facial expression synthesis and facial expression recognition tasks can further boost their performance for each other via our model. Quantitative and qualitative evaluations on both controlled and in-the-wild datasets demonstrate that the proposed method performs favorably against state-of-the-art methods.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134298356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive CU Split Decision with Pooling-variable CNN for VVC Intra Encoding 基于池变量CNN的VVC编码自适应分割决策
2019 IEEE Visual Communications and Image Processing (VCIP) Pub Date : 2019-12-01 DOI: 10.1109/VCIP47243.2019.8965679
Genwei Tang, Ming-e Jing, Xiaoyang Zeng, Yibo Fan
{"title":"Adaptive CU Split Decision with Pooling-variable CNN for VVC Intra Encoding","authors":"Genwei Tang, Ming-e Jing, Xiaoyang Zeng, Yibo Fan","doi":"10.1109/VCIP47243.2019.8965679","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8965679","url":null,"abstract":"In the versatile video coding (VVC) proposed by the Joint Video Exploration Team (JVET), the quad-tree with the nested multi-type tree (QTMT) partition scheme has been adopted based on the quadtree structure in the high efficiency video coding (HEVC). The video coding quality of VVC is better than the HEVC, but the algorithm complexity has also increased greatly. In this work, we present an adaptive CU split decision for intra frame with the pooling-variable convolutional neural network (CNN), targeting at various coding unit (CU) shape. The shape-adaptive CNN is realized by the variable pooling layer size where we can make the most of the pooling layer in CNN and retain the original information. Based on the proposed CNN, the CU split or not will be decided by only one trained network, same architecture and parameters for the CUs with multiple sizes. Moreover, with the proposed shape-based CNN training scheme, the various training sample size can be processed successfully. The CUbased network can avoid the full rate-distortion optimization for the CU split and the CU-level rate control can also be enabled. The experiment results show that the proposed method can save 33% coding time with only 0.99% Bjontegaard Delta bitrate (BD-rate) increase.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122369937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 51
Spherical Video Coding With Motion Vector Modulation to Account For Camera Motion 球面视频编码与运动矢量调制,以说明相机的运动
2019 IEEE Visual Communications and Image Processing (VCIP) Pub Date : 2019-12-01 DOI: 10.1109/VCIP47243.2019.8966083
B. Vishwanath, K. Rose
{"title":"Spherical Video Coding With Motion Vector Modulation to Account For Camera Motion","authors":"B. Vishwanath, K. Rose","doi":"10.1109/VCIP47243.2019.8966083","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8966083","url":null,"abstract":"Emerging immersive multimedia applications critically depend on efficient compression of spherical (360-degree) videos. Current approaches project spherical video onto planes for coding with standard codecs, without accounting for the properties of spherical video, a severe sub-optimality that motivates this work. A common type of spherical video is dominated by camera translation. We recently proposed a powerful motion compensation technique for such videos which builds on the observation that, with camera translation, stationary points are perceived as moving along geodesics that meet at the point where the camera translation vector intersects the sphere. However, the approach follows standard coding procedures and translates all pixels in a block by the same amount on their respective geodesics, which is sub-optimal. This paper analyzes the appropriate rate of translation along geodesics and its dependence on the elevation of a pixel on the sphere with respect to the camera velocity pole. The analysis leads to a new approach that modulates the effective motion vectors within a block such that they perfectly capture the perceived individual motion of each pixel. Consistent gains in the experiments provide evidence for the efficacy of the proposed approach.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130752662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Identifying and Pruning Redundant Structures for Deep Neural Networks 深度神经网络冗余结构的识别与修剪
2019 IEEE Visual Communications and Image Processing (VCIP) Pub Date : 2019-12-01 DOI: 10.1109/VCIP47243.2019.8966025
Wenyao Gan, Li Song, Li Chen, Rong Xie, Xiao Gu
{"title":"Identifying and Pruning Redundant Structures for Deep Neural Networks","authors":"Wenyao Gan, Li Song, Li Chen, Rong Xie, Xiao Gu","doi":"10.1109/VCIP47243.2019.8966025","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8966025","url":null,"abstract":"Deep convolutional neural networks have achieved considerable success in the field of computer vision. However, it is difficult to deploy state-of-the-art models on resource-constrained platforms due to their high storage, memory bandwidth, and computational costs. In this paper, we propose a structured pruning method which employs a three-step process to reduce the resource consumption of neural networks. First, we train an initial network on the training set and evaluate it on the validation set. Next, we introduce an iterative pruning and fine-tuning algorithm to identify and prune redundant structures, which results in a pruned network with a compact architecture. Finally, we train the pruned network from scratch on both the training set and validation set to obtain the final accuracy on the test set. In the experiments, our pruning method significantly reduces the model size (by 87.2% on CIFAR-10), saves inference time (53.3% on CIFAR-10), and achieves better performance as compared to recent state-of-the-art methods.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121528636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信