{"title":"VCIP 2019 Tutorials","authors":"","doi":"10.1109/VCIP47243.2019.8965680","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8965680","url":null,"abstract":"Provides an abstract for each of the tutorial presentations and may include a brief professional biography of each presenter. The complete presentations were not made available for publication as part of the conference proceedings.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126300255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhanced Semantic Features via Attention for Real-Time Visual Tracking","authors":"M. Geng, Haiying Wang, Yingsen Zeng","doi":"10.1109/VCIP47243.2019.8965870","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8965870","url":null,"abstract":"The key to balance the tracking accuracy and speed for object tracking algorithms is to learn powerful features via offline training in a lightweight tracking framework. With the development of attention mechanisms, it’s facile to apply attention to enhance the features without modifying the basic structure of the network. In this paper, a novel combination of different attention modules is implemented into a siamese-based tracker and boosts the tracking performance with little computational burden. In particular, by applying non-local self-attention and dual pooling channel attention, the extracted features tend to be more discriminative and adaptive due to the offline learning with tracking targets of different classes. Meanwhile, an Index-Difference-weight boosts the performance and reduces overfitting when full occlusion occurs. Our experimental results on OTB2013 and OTB2015 show that the tracker using the proposal to implement the attention modules can achieve state-of-the-art performance with a speed of 49 frames per second.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115824935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Framework for Real-Time Face-Recognition","authors":"Samadhi Wickrama Arachchilage, E. Izquierdo","doi":"10.1109/VCIP47243.2019.8965805","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8965805","url":null,"abstract":"The advent and wide use of deep-learning technology has enabled tremendous advancements in the accuracy of face recognition under favourable conditions. Nonetheless, the reported near-perfect performance on classic benchmarks like lfw, does not include complications in unconstrained application. The research reported in this paper addresses some of the critical challenges of face recognition under adverse conditions. In this context, we introduce an end-to-end framework for real-time video-based face recognition. This system detects, tracks and recognizes individuals from live video feed. The proposed system addresses three key challenges of video-based face recognition systems: end-to-end computational complexity, in the wild recognition and multi-person recognition. We exploit sophisticated deep neural networks for face detection and facial feature extraction, while minimizing the computational overhead from the rest of the modules in the recognition pipeline. A comprehensive evaluation shows that the proposed system can effectively recognize faces under unconstrained conditions, at elevated frames per second rates.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132061467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hybrid Regularization with Elastic Net and Linear Discriminant Analysis for Zero-Shot Image Recognition","authors":"Zhen Qin, Yan Li","doi":"10.1109/VCIP47243.2019.8966084","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8966084","url":null,"abstract":"Zero-shot learning (ZSL) is the process of recognizing unseen samples from their related classes. Generally, ZSL is realized with the help of some pre-defined semantic information via projecting high dimensional visual features of data samples and class-related semantic vectors into a common embedding space. Although classification can be simply decided through the nearest-neighbor strategy, it usually suffers from problems of domain shift and hubness. In order to address these challenges, majority of researches have introduced regularization with some existing norms, such as lasso or ridge, to constrain the learned embedding. However, the sparse estimation of lasso may cause underfitting of training data, while ridge may introduce bias in the embedding space. In order to resolve these problems, this paper proposes a novel hybrid regularization approach by leveraging elastic net and linear discriminant analysis, and formulates a unified objective function that can be solved efficiently via a synchronous optimization strategy. The proposed method is evaluated on several benchmark image datasets for the task of generalized ZSL. The obtained results demonstrate the superiority of the proposed method over simple regularized methods as well as several previous models.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"140 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131891087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving Action Recognition with the Graph-Neural-Network-based Interaction Reasoning","authors":"Wu Luo, Chongyang Zhang, Xiaoyun Zhang, Haiyan Wu","doi":"10.1109/VCIP47243.2019.8965768","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8965768","url":null,"abstract":"Recent human action recognition methods mainly model a two-stream or 3D convolution deep learning network, with which humans spatial-temporal features can be exploited and utilized effectively. However, due to the ignoring of interaction exploiting, most of these methods cannot get good enough performance. In this paper, we propose a novel action recognition framework with Graph Convolutional Network (GCN) based Interaction Reasoning: Objects and discriminative scene patches are detected using an object detector and class active mapping (CAM), respectively; and then a GCN is introduced to model the interaction among the detected objects and scene patches. Evaluation of two widely used video action benchmarks shows that the proposed work can achieve comparable performance: the accuracy up to 43.6% at EPIC Kitchen, and 47.0% at VLOG benchmark without using optical flow, respectively.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133782282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quality Assessment for Omnidirectional Video with Consideration of Temporal Distortion Variations","authors":"Pengwei Zhang, Pan Gao","doi":"10.1109/VCIP47243.2019.8966002","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8966002","url":null,"abstract":"Omnidirectional video, also known as 360-degree video, offers an immersive visual experience by providing viewers with an ability to look in all directions within a scene. The quality assessment for omnidirectional video is still a quite difficult task compared to 2D video. As the temporal changes of spatial distortions can considerably influence human visual perception, this paper proposes a full reference objective video quality assessment metric by considering both the spatial characteristics of omnidirectional video and the temporal variation of distortions across frames. Firstly, we construct a spatio–temporal quality assessment unit to evaluate the average distortion in temporal dimension at eye fixation level. The smoothed distortion value is then consolidated by the characteristics of temporal variations. Afterwards, a global quality score of the whole video sequence is produced by pooling. Finally, our experimental results show that our proposed VQA method improves the prediction performance of existing VQA methods for omnidirectional video.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115366648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zexia Liu, Chongyang Zhang, Yan Luo, Kai Chen, Qiping Zhou, Yunyu Lai
{"title":"Improving Small-Scale Pedestrian Detection Using Informed Context","authors":"Zexia Liu, Chongyang Zhang, Yan Luo, Kai Chen, Qiping Zhou, Yunyu Lai","doi":"10.1109/VCIP47243.2019.8965786","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8965786","url":null,"abstract":"Finding small objects is fundamentally challenging because there is little signal on the object to exploit. For the small-scale pedestrian detection, one must use image evidence beyond the pedestrian extent, which is often formulated as context. Unlike existing object detection methods that use adjacent regions or whole image as the context simply, we focus on more informed contexts exploiting and utilizing to improve small-scale pedestrian detection: firstly, one relationship network is developed to utilize the correlation among pedestrian instances in one image; secondly, two spatial regions, overhead area and feet bottom area, are taken as spatial context to exploit the relevance between pedestrian and scenes; at last, GRU [7] (Gated Recurrent Units) modules are introduced to take encoded contexts as input to guide the feature selection and fusion of each proposal. Instead of getting all of the outputs at once, we also iterate twice to refine the detection incrementally. Comprehensive experiments on Caltech Pedestrian [8] and SJTU-SPID [9] datasets, indicate that, with more informed context, the detection performance can be improved significantly, especially for the small-scale pedestrians.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"354 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115897091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Weather Data Integrated Mask R-CNN for Automatic Road Surface Condition Monitoring","authors":"Junyong You","doi":"10.1109/VCIP47243.2019.8966014","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8966014","url":null,"abstract":"Monitoring road surface conditions plays a crucial role in driving safety and road maintenance, especially in winter seasons. Traditional methodologies often employ manual inspection and expensive instruments, e.g., NIR cameras. However, image analysis based on normal cameras can provide an economical and efficient solution for road surface monitoring. This paper presents an automatic classification model of road surface conditions using a deep learning approach based on road images and weather measurement. A modified mask R-CNN model has been developed by integrating weather data based on transfer learning. Experimental results with respect to manual judgment of road surface conditions have demonstrated very high accuracy of the developed model.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"162 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114517332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning a Reliable Decision Making Policy for Robust Tracking","authors":"Xiaofeng Huang, Kang-hao Wang, Haibing Yin, Shengsheng Zheng, Xiang Meng, Shengping Zhang","doi":"10.1109/VCIP47243.2019.8965745","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8965745","url":null,"abstract":"Recent years deep learning based visual object trackers have achieved state-of-the-art performance on multiple benchmarks. However, most of these trackers lack an effective mechanism to avoid the wrong template update or re-detect the object when unreliable tracking result appears. In this paper, a novel tracking framework consisting of a tracking network for locating the target and a policy network for decision making is proposed. Firstly, during the off-line training phase, a variant of policy gradient algorithm is adopted, which makes the model converge better and faster. Secondly, current response map and history response map are both fed to the policy network to check the reliability of the tracking result, which effectively distinguishes the response diversity. Finally, an efficient redetection module is proposed to filter a large number of searching areas, which greatly improves the speed. Our proposed algorithm is measured on OTB dataset. Assessment results show that our tracking algorithm improves performance by 5%-6% at the expense of only a small amount of speed.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"551 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117050771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qi Zhao, Nauman Raoof, Shuchang Lyu, Boxue Zhang, W. Feng
{"title":"RSNet: A Compact Relative Squeezing Net for Image Recognition","authors":"Qi Zhao, Nauman Raoof, Shuchang Lyu, Boxue Zhang, W. Feng","doi":"10.1109/VCIP47243.2019.8966024","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8966024","url":null,"abstract":"Convolutional neural networks(CNN) are showing powerful performance on image recognition tasks. However, when CNN is applied to mobile devices, with limited computing and memory resource, it requires more compact design to maintain a relatively high performance. In this paper, we propose Relative Squeezing Net(RSNet) that provides technical insight into CNN structure for designing a compact model. In an endeavor to improve CondenseNet, we introduce Relative-Squeezing bottleneck where output is weighted percentage of input channels. The design of our bottleneck can transmit diverse and most useful features at all stages. We also employ multiple compression layers to constrain the output channels of feature maps which can eliminate superfluous feature maps and transmit powerful representations to next layers. We evaluate our model on two benchmark datasets; CIFAR and ImageNet. Experimental results show that RSNet achieves state-of-the-art results with less parameters and FLOPs and is more efficient than compact architectures such as CondenseNet, MobileNet and ShuffleNet.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122119880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}