{"title":"FICAL: Focal Inter-Class Angular Loss for Image Classification","authors":"Xinran Wei, Dongliang Chang, Jiyang Xie, Yixiao Zheng, Chen Gong, Chuang Zhang, Zhanyu Ma","doi":"10.1109/VCIP47243.2019.8965889","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8965889","url":null,"abstract":"Convolutional Neural Networks (CNNs) have been successfully applied in various image analysis tasks and gradually become one of the most powerful machine learning approaches. In order to improve the capability of the model generalization and performance in image classification, a new trend is to learn more discriminative features via CNNs. The main contribution of this paper is to increase the angles between the categories to extract discriminative features and enlarge the inter-class variance. To this end, we propose a loss function named focal inter-class angular loss (FICAL) which introduces the confusion rate-weighted cosine distance as the similarity measurement between categories. This measurement is dynamically evaluated during each iteration to adapt the model. Compared with other loss functions, experimental results demonstrate that the proposed FICAL achieved best performance among the referred loss functions on two image classificaton datasets.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"2016 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114654450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chaoda Zheng, Yong Xu, Ruotao Xu, Hongyu Chi, Yuhui Quan
{"title":"Multi-view Rank Pooling for 3D Object Recognition**","authors":"Chaoda Zheng, Yong Xu, Ruotao Xu, Hongyu Chi, Yuhui Quan","doi":"10.1109/VCIP47243.2019.8965979","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8965979","url":null,"abstract":"3D shape recognition via deep learning is drawing more and more attention due to huge industry interests. As 3D deep learning methods emerged, the view-based approaches have gained considerable success in object classification. Most of these methods focus on designing a pooling scheme to aggregate CNN features of multi-view images into a single compact one. However, these view-wise pooling techniques suffer from loss of visual information. To deal with this issue, an adaptive rank pooling layer is introduced in this paper. Unlike max-pooling which only considers the maximum or mean-pooling that treats each element indiscriminately, the proposed pooling layer takes all the elements into account and dynamically adjusts their importances during the training. Experiments conducted on ModelNet40 and ModelNet10 shows both efficiency and accuracy gain when inserting such a layer into a baseline CNN architecture.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115030850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Spatio-temporal Hybrid Network for Action Recognition","authors":"Song Li, Zhicheng Zhao, Fei Su","doi":"10.1109/VCIP47243.2019.8965878","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8965878","url":null,"abstract":"Convolutional Neural Networks (CNNs) are powerful in learning spatial information for static images, while they appear to lose their abilities for action recognition in videos because of the neglecting of long-term motion information. Traditional 3D convolution has high computation complexity and the used Global Average Pooling (GAP) on the bottom of network can also lead to unwanted content loss or distortion. To address above problems, we propose a novel action recognition algorithm by effectively fusing 2D and Pseudo-3D CNN to learn spatio-temporal features of video. First, we use Pseudo-3D CNN with proposed Multi-level pooling module to learn spatio-temporal features. Second, the features output by multi-level pooling module are passed through our proposed processing module to make full use of the rich features. Third, a 2D CNN fed with motion vectors is designed to extract motion patterns, which can be regarded as a supplement of Pseudo-3D CNN to make up for the information lost by RGB images. Fourth, a dependency-based fusion method is proposed to fuse the multi-stream features. Finally, the effectiveness of our proposed action recognition algorithm is demonstrated on public UCF101 and HMDB51 datasets.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"463 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122559414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Weakly Supervised Learning for Blind Image Quality Assessment","authors":"Weiquan He, Xinbo Gao, Wen Lu, R. Guan","doi":"10.1109/VCIP47243.2019.8965868","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8965868","url":null,"abstract":"The blind image quality assessment (BIQA) metric based on deep neural network (DNN) achieves the best evaluation accuracy at present, and the depth of neural networks plays a crucial role for deep learning-based BIQA metric. However, training a DNN for quality assessment is known to be hard because of the lack of labeled data, and getting quality labels for a large number of images is very time consuming and costly. Therefore, training a deep BIQA metric directly will lead to over-fitting in all likelihood. In order to solve this problem, we introduced a weakly supervised approach for learning a deep BIQA metric. First, we pre-trained a novel encoder-decoder architecture by using the training data with weak quality annotations. The annotation is the error map between the distorted image and its undistorted version, which can roughly describes the distribution of distortion and can be easily acquired for training. Next, we fine-tuned the pre-trained encoder on the quality labeled data set. Moreover, we used the group convolution to reduce the parameters of the proposed metric and further reduce the risk of over-fitting. These training strategies, which reducing the risk of over-fitting, enable us to build a very deep neural network for BIQA to have a better performance. Experimental results showed that the proposed model had the state-of-art performance for various images with different distortion types.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122444355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Panu Sjövall, Mikko Teuho, Arto Oinonen, Jarno Vanne, T. Hämäläinen
{"title":"Visualization of Dynamic Resource Allocation for HEVC Encoding in FPGA-Accelerated SDN Cloud","authors":"Panu Sjövall, Mikko Teuho, Arto Oinonen, Jarno Vanne, T. Hämäläinen","doi":"10.1109/VCIP47243.2019.8966042","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8966042","url":null,"abstract":"This paper describes a demonstration setup to visualize dynamic resource allocation for real-time HEVC encoding services in FPGA-accelerated cloud. The demonstrated application is Kvazaar HEVC intra encoder, whose functionality is partitioned between FPGAs and processors. During the demonstration, several encoding services can be invoked with requests to the resource manager, which is responsible for allocation, deallocation, and load balancing of resources in the network. The manager provides JSON data to the visualizer, which uses D3 JavaScript library to visualize 1) the physical network structure; 2) running services; and 3) performance of the network elements. This interactive demonstration allows users to request new video streams, view the encoded streams, observe the visualization of the network and services, and manually turn on/off resources to test the robustness of the system.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122534858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hao Chen, Chongyang Zhang, Yan Luo, Bingkun Zhao, Jiahao Bao
{"title":"Comer-Line-Prediction based Water-tank Detection and Localization","authors":"Hao Chen, Chongyang Zhang, Yan Luo, Bingkun Zhao, Jiahao Bao","doi":"10.1109/VCIP47243.2019.8965977","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8965977","url":null,"abstract":"Water tanks on the roof of buildings require regular labor-costing inspection, and object detection can be used to automate the task. Current detection frameworks have several drawbacks when they are applied: (1) The output horizontal rectangular boxes cannot provide arbitrary quadrilateral detection representations; (2) False positive results may easily appear when key-point based models are used. In this paper, we propose a novel detection framework: Corner-Line-Prediction, which generates tight quadrilateral detection results of the tank blocks. Our model is built on key point detection network to detect corner points precisely. And an original line predictor is integrated to recognize unique tank edges, such that numerous false positive detections can be suppressed. Experimental results show that our Corner-Line-Prediction (CLP) framework outperforms state- of-the-art detection algorithms in average-precision (AP) and produces better localization results, compared with mainstream general detection models.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"44 19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122040550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Part-guided Network for Pedestrian Attribute Recognition","authors":"Ha-eun An, Haonan Fan, Kaiwen Deng, Hai-Miao Hu","doi":"10.1109/VCIP47243.2019.8965957","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8965957","url":null,"abstract":"Pedestrian attribute recognition, which can benefit other tasks such as person re-identification and pedestrian retrieval, is very important in video surveillance related tasks. In this paper, we observe that the existing methods tackle this problem from the perspective of multi-label classification without considering the spatial location constraints, which means that the attributes tend to be recognized at certain body parts. Based on that, we propose a novel Part-guided Network (P-Net), which guides the refined convolutional feature maps to capture different location information for the attributes related to different body parts. The part-guided attention module employs the pix-level classification to produce attention maps which can be interpreted as the probability of each pixel belonging to the 6 pre-defined body parts. Experimental results demonstrate that the proposed network gives superior performances compared to the state-of-the-art techniques.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114086754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Asymmetric Supervised Deep Autoencoder for Depth Image based 3D Model Retrieval","authors":"A. Siddiqua, Guoliang Fan","doi":"10.1109/VCIP47243.2019.8965682","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8965682","url":null,"abstract":"In this paper, we propose a new asymmetric supervised deep autoencoder approach to retrieve 3D shapes based on depth images. The asymmetric supervised autoencoder is trained with real and synthetic depth images together. The novelty of this research lies in the asymmetric structure of a supervised deep autoencoder. The proposed asymmetric deep supervised autoencoder deals with the incompleteness and ambiguity present in the depth images by balancing reconstruction and classification capabilities in a unified way with mixed depth images. We investigate the relationship between the encoder layers and decoder layers, and claim that an asymmetric structure of a supervised deep autoencoder reduces the chance of overfitting by 8% and is capable of extracting more robust features with respect to the variance of input than that of a symmetric structure. The experimental results on the NYUD2 and ModelNet10 datasets demonstrate that the proposed supervised method outperforms the recent approaches for cross modal 3D model retrieval.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"661 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116487486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparative Convolutional Neural Network for Younger Face Identification","authors":"Liangliang Wang, D. Rajan","doi":"10.1109/VCIP47243.2019.8966026","DOIUrl":"https://doi.org/10.1109/VCIP47243.2019.8966026","url":null,"abstract":"We consider the problem of determining whether a pair of face images can be distinguishable in terms of age and if so, which is the younger of the two. We also determine the degree of distinguishability in which age differences are categorized into large, medium, small and tiny. We propose a comparative convolutional neural network combining two parallel deep architectures. Based on the two deep learnt face features, we introduce a comparative layer to represent their mutual relationships, followed by a concatenatation implementation. Softmax is adopted to complete the classification task. To demonstrate our approach, we construct a very large dataset consisting of over 1.7 million face image pairs with young/old labels.","PeriodicalId":388109,"journal":{"name":"2019 IEEE Visual Communications and Image Processing (VCIP)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121539094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}