Yu Cheng, Jian Zhao, Zhecan Wang, Yan Xu, J. Karlekar, Shengmei Shen, Jiashi Feng
{"title":"Know You at One Glance: A Compact Vector Representation for Low-Shot Learning","authors":"Yu Cheng, Jian Zhao, Zhecan Wang, Yan Xu, J. Karlekar, Shengmei Shen, Jiashi Feng","doi":"10.1109/ICCVW.2017.227","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.227","url":null,"abstract":"Low-shot face recognition is a very challenging yet important problem in computer vision. The feature representation of the gallery face sample is one key component in this problem. To this end, we propose an Enforced Softmax optimization approach built upon Convolutional Neural Networks (CNNs) to produce an effective and compact vector representation. The learned feature representation is very helpful to overcome the underlying multi-modality variations and remain the primary key features as close to the mean face of the identity as possible in the high-dimensional feature space, thus making the gallery basis more robust under various conditions, and improving the overall performance for low-shot learning. In particular, we sequentially leverage optimal dropout, selective attenuation, ℓ2 normalization, and model-level optimization to enhance the standard Softmax objective function for to produce a more compact vectorized representation for low-shot learning. Comprehensive evaluations on the MNIST, Labeled Faces in the Wild (LFW), and the challenging MS-Celeb-1M Low-Shot Learning Face Recognition benchmark datasets clearly demonstrate the superiority of our proposed method over state-of-the-arts. By further introducing a heuristic voting strategy for robust multi-view combination, and our proposed method has won the Top-1 place in the MS-Celeb-1M Low-Shot Learning Challenge.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"12 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123795558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Learning Anthropomorphic 3D Point Clouds from a Single Depth Map Camera Viewpoint","authors":"Nolan Lunscher, J. Zelek","doi":"10.1109/ICCVW.2017.87","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.87","url":null,"abstract":"In footwear, fit is highly dependent on foot shape, which is not fully captured by shoe size. Scanners can be used to acquire better sizing information and allow for more personalized footwear matching, however when scanning an object, many images are usually needed for reconstruction. Semantics such as knowing the kind of object in view can be leveraged to determine the full 3D shape given only one input view. Deep learning methods have been shown to be able to reconstruct 3D shape from limited inputs in highly symmetrical objects such as furniture and vehicles. We apply a deep learning approach to the domain of foot scanning, and present a method to reconstruct a 3D point cloud from a single input depth map. Anthropomorphic body parts can be challenging due to their irregular shapes, difficulty for parameterizing and limited symmetries. We train a view synthesis based network and show that our method can produce foot scans with accuracies of 1.55 mm from a single input depth map.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131555611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SymmSLIC: Symmetry Aware Superpixel Segmentation","authors":"R. Nagar, S. Raman","doi":"10.1109/ICCVW.2017.208","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.208","url":null,"abstract":"Over-segmentation of an image into superpixels has become an useful tool for solving various problems in computer vision. Reflection symmetry is quite prevalent in both natural and man-made objects. Existing algorithms for estimating superpixels do not preserve the reflection symmetry of an object which leads to different sizes and shapes of superpixels across the symmetry axis. In this work, we propose an algorithm to over-segment an image through the propagation of reflection symmetry evident at the pixel level to superpixel boundaries. In order to achieve this goal, we exploit the detection of a set of pairs of pixels which are mirror reflections of each other. We partition the image into superpixels while preserving this reflection symmetry information through an iterative algorithm. We compare the proposed method with state-of-the-art superpixel generation methods and show the effectiveness of the method in preserving the size and shape of superpixel boundaries across the reflection symmetry axes. We also present an application called unsupervised symmetric object segmentation to illustrate the effectiveness of the proposed approach.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126249175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-view 6D Object Pose Estimation and Camera Motion Planning Using RGBD Images","authors":"Juil Sock, S. Kasaei, L. Lopes, Tae-Kyun Kim","doi":"10.1109/ICCVW.2017.260","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.260","url":null,"abstract":"Recovering object pose in a crowd is a challenging task due to severe occlusions and clutters. In active scenario, whenever an observer fails to recover the poses of objects from the current view point, the observer is able to determine the next view position and captures a new scene from another view point to improve the knowledge of the environment, which may reduce the 6D pose estimation uncertainty. We propose a complete active multi-view framework to recognize 6DOF pose of multiple object instances in a crowded scene. We include several components in active vision setting to increase the accuracy: Hypothesis accumulation and verification combines single-shot based hypotheses estimated from previous views and extract the most likely set of hypotheses; an entropy-based Next-Best-View prediction generates next camera position to capture new data to increase the performance; camera motion planning plans the trajectory of the camera based on the view entropy and the cost of movement. Different approaches for each component are implemented and evaluated to show the increase in performance.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128276188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Evgeny Smirnov, A. Melnikov, Sergey Novoselov, Eugene Luckyanets, G. Lavrentyeva
{"title":"Doppelganger Mining for Face Representation Learning","authors":"Evgeny Smirnov, A. Melnikov, Sergey Novoselov, Eugene Luckyanets, G. Lavrentyeva","doi":"10.1109/ICCVW.2017.226","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.226","url":null,"abstract":"In this paper we present Doppelganger mining - a method to learn better face representations. The main idea of this method is to maintain a list with the most similar identities for each identity in the training set. This list is used to generate better mini-batches by sampling pairs of similar-looking identities (\"doppelgangers\") together. It is especially useful for methods, based on exemplar-based supervision. Usually hard example mining comes with a price of necessity to use large mini-batches or substantial extra computation and memory cost, particularly for datasets with large numbers of identities. Our method needs only a negligible extra computation and memory. In our experiments on a benchmark dataset with 21,000 persons we show that Doppelganger mining, being inserted in the face representation learning process with joint prototype-based and exemplar-based supervision, significantly improves the discriminative power of learned face representations.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114954694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Class-Specific Reconstruction Transfer Learning via Sparse Low-Rank Constraint","authors":"Shanshan Wang, Lei Zhang, W. Zuo","doi":"10.1109/ICCVW.2017.116","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.116","url":null,"abstract":"Subspace learning and reconstruction have been widely explored in recent transfer learning work and generally a specially designed projection and reconstruction transfer matrix are wanted. However, existing subspace reconstruction based algorithms neglect the class prior such that the learned transfer function is biased, especially when data scarcity of some class is encountered. Different from those previous methods, in this paper, we propose a novel reconstruction-based transfer learning method called Class-specific Reconstruction Transfer Learning (CRTL), which optimizes a well-designed transfer loss function without class bias. Using a class-specific reconstruction matrix to align the source domain with the target domain which provides help for classification with class prior modeling. Furthermore, to keep the intrinsic relationship between data and labels after feature augmentation, a projected Hilbert-Schmidt Independence Criterion (pHSIC), that measures the dependency between two sets, is first proposed by mapping the data from original space to RKHS in transfer learning. In addition, combining low-rank and sparse constraints on the class-specific reconstruction coefficient matrix, the global and local data structures can be effectively preserved. Extensive experiments demonstrate that the proposed method outperforms conventional representation-based domain adaptation methods.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130702268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Variational Robust Subspace Clustering with Mean Update Algorithm","authors":"Sergej Dogadov, A. Masegosa, Shinichi Nakajima","doi":"10.1109/ICCVW.2017.212","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.212","url":null,"abstract":"In this paper, we propose an efficient variational Bayesian (VB) solver for a robust variant of low-rank subspace clustering (LRSC). VB learning offers automatic model selection without parameter tuning. However, it is typically performed by local search with update rules derived from conditional conjugacy, and therefore prone to local minima problem. Instead, we use an approximate global solver for LRSC with an element-wise sparse term to make it robust against spiky noise. In experiment, our method (mean update solver for robust LRSC), outperforms the original LRSC, as well as the robust LRSC with the standard VB solver.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"162 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133846090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mira Valkonen, K. Kartasalo, Kaisa Liimatainen, M. Nykter, Leena Latonen, P. Ruusuvuori
{"title":"Dual Structured Convolutional Neural Network with Feature Augmentation for Quantitative Characterization of Tissue Histology","authors":"Mira Valkonen, K. Kartasalo, Kaisa Liimatainen, M. Nykter, Leena Latonen, P. Ruusuvuori","doi":"10.1109/ICCVW.2017.10","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.10","url":null,"abstract":"We present a dual convolutional neural network (dCNN) architecture for extracting multi-scale features from histological tissue images for the purpose of automated characterization of tissue in digital pathology. The dual structure consists of two identical convolutional neural networks applied to input images with different scales, that are merged together and stacked with two fully connected layers. It has been acknowledged that deep networks can be used to extract higher-order features, and therefore, the network output at final fully connected layer was used as a deep dCNN feature vector. Further, engineered features, shown in previous studies to capture important characteristics of tissue structure and morphology, were integrated to the feature extractor module. The acquired quantitative feature representation can be further utilized to train a discriminative model for classifying tissue types. Machine learning based methods for detection of regions of interest, or tissue type classification will advance the transition to decision support systems and computer aided diagnosis in digital pathology. Here we apply the proposed feature-augmented dCNN method with supervised learning in detecting cancerous tissue from whole slide images. The extracted quantitative representation of tissue histology was used to train a logistic regression model with elastic net regularization. The model was able to accurately discriminate cancerous tissue from normal tissue, resulting in blockwise AUC=0.97, where the total number of analyzed tissue blocks was approximately 8.3 million that constitute the test set of 75 whole slide images.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134264661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. D. Choudhury, Saptarsi Goswami, Srinidhi Bashyam, T. Awada, A. Samal
{"title":"Automated Stem Angle Determination for Temporal Plant Phenotyping Analysis","authors":"S. D. Choudhury, Saptarsi Goswami, Srinidhi Bashyam, T. Awada, A. Samal","doi":"10.1109/ICCVW.2017.237","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.237","url":null,"abstract":"Image-based plant phenotyping analysis refers to the monitoring and quantification of phenotyping traits by analyzing images of the plants captured by different types of cameras at regular intervals in a controlled environment. Extracting meaningful phenotypes for temporal phenotyping analysis by considering individual parts of a plant, e.g., leaves and stem, using computer-vision based techniques remains a critical bottleneck due to constantly increasing complexity in plant architecture with variations in self-occlusions and phyllotaxy. The paper introduces an algorithm to compute the stem angle, a potential measure for plants' susceptibility to lodging, i.e., the bending of stem of the plant. Annual yield losses due to stem lodging in the U.S. range between 5 and 25%. In addition to outright yield losses, grain quality may also decline as a result of stem lodging. The algorithm to compute stem angle involves the identification of leaf-tips and leaf-junctions based on a graph theoretic approach. The efficacy of the proposed method is demonstrated based on experimental analysis on a publicly available dataset called Panicoid Phenomap-1. A time-series clustering analysis is also performed on the values of stem angles for a significant time interval during vegetative stage life cycle of the maize plants. This analysis effectively summarizes the temporal patterns of the stem angles into three main groups, which provides further insight into genotype specific behavior of the plants. A comparison of genotypic purity using time series analysis establishes that the temporal variation of the stem angles is likely to be regulated by genetic variation under similar environmental conditions.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133362332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LPSNet: A Novel Log Path Signature Feature Based Hand Gesture Recognition Framework","authors":"Chenyang Li, Xin Zhang, Lianwen Jin","doi":"10.1109/ICCVW.2017.80","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.80","url":null,"abstract":"Hand gesture recognition is gaining more attentions because it's a natural and intuitive mode of human computer interaction. Hand gesture recognition still faces great challenges for the real-world applications due to the gesture variance and individual difference. In this paper, we propose the LPSNet, an end-to-end deep neural network based hand gesture recognition framework with novel log path signature features. We pioneer a robust feature, path signature (PS) and its compressed version, log path signature (LPS) to extract effective feature of hand gestures. Also, we present a new method based on PS and LPS to effectively combine RGB and depth videos. Further, we propose a statistical method, DropFrame, to enlarge the data set and increase its diversity. By testing on a well-known public dataset, Sheffield Kinect Gesture (SKIG), our method achieves classification rate as 96.7% (only use RGB videos) and 98.7% (combining RGB and Depth videos), which is the best result comparing with state-of-the-art methods.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133428356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}