{"title":"Segmentation and matching: Towards a robust object detection system","authors":"Jing Huang, Suya You","doi":"10.1109/WACV.2014.6836082","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836082","url":null,"abstract":"This paper focuses on detecting parts in laser-scanned data of a cluttered industrial scene. To achieve the goal, we propose a robust object detection system based on segmentation and matching, as well as an adaptive segmentation algorithm and an efficient pose extraction algorithm based on correspondence filtering. We also propose an overlapping-based criterion that exploits more information of the original point cloud than the number-of-matching criterion that only considers key-points. Experiments show how each component works and the results demonstrate the performance of our system compared to the state of the art.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"30 1","pages":"325-332"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80935878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Introspective semantic segmentation","authors":"Gautam Singh, J. Kosecka","doi":"10.1109/WACV.2014.6836032","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836032","url":null,"abstract":"Traditional approaches for semantic segmentation work in a supervised setting assuming a fixed number of semantic categories and require sufficiently large training sets. The performance of various approaches is often reported in terms of average per pixel class accuracy and global accuracy of the final labeling. When applying the learned models in the practical settings on large amounts of unlabeled data, possibly containing previously unseen categories, it is important to properly quantify their performance by measuring a classifier's introspective capability. We quantify the confidence of the region classifiers in the context of a non-parametric k-nearest neighbor (k-NN) framework for semantic segmentation by using the so called strangeness measure. The proposed measure is evaluated by introducing confidence based image ranking and showing its feasibility on a dataset containing a large number of previously unseen categories.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"12 1","pages":"714-720"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82218125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Mak, Mauricio Hess-Flores, S. Recker, John Douglas Owens, K. Joy
{"title":"GPU-accelerated and efficient multi-view triangulation for scene reconstruction","authors":"J. Mak, Mauricio Hess-Flores, S. Recker, John Douglas Owens, K. Joy","doi":"10.1109/WACV.2014.6836117","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836117","url":null,"abstract":"This paper presents a framework for GPU-accelerated N-view triangulation in multi-view reconstruction that improves processing time and final reprojection error with respect to methods in the literature. The framework uses an algorithm based on optimizing an angular error-based L1 cost function and it is shown how adaptive gradient descent can be applied for convergence. The triangulation algorithm is mapped onto the GPU and two approaches for parallelization are compared: one thread per track and one thread block per track. The better performing approach depends on the number of tracks and the lengths of the tracks in the dataset. Furthermore, the algorithm uses statistical sampling based on confidence levels to successfully reduce the quantity of feature track positions needed to triangulate an entire track. Sampling aids in load balancing for the GPU's SIMD architecture and for exploiting the GPU's memory hierarchy. When compared to a serial implementation, a typical performance increase of 3-4× can be achieved on a 4-core CPU. On a GPU, large track numbers are favorable and an increase of up to 40× can be achieved. Results on real and synthetic data prove that reprojection errors are similar to the best performing current triangulation methods but costing only a fraction of the computation time, allowing for efficient and accurate triangulation of large scenes.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"110 1","pages":"61-68"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88247327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Long Sha, P. Lucey, S. Sridharan, S. Morgan, D. Pease
{"title":"Understanding and analyzing a large collection of archived swimming videos","authors":"Long Sha, P. Lucey, S. Sridharan, S. Morgan, D. Pease","doi":"10.1109/WACV.2014.6836037","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836037","url":null,"abstract":"In elite sports, nearly all performances are captured on video. Despite the massive amounts of video that has been captured in this domain over the last 10-15 years, most of it remains in an “unstructured” or “raw” form, meaning it can only be viewed or manually annotated/tagged with higher-level event labels which is time consuming and subjective. As such, depending on the detail or depth of annotation, the value of the collected repositories of archived data is minimal as it does not lend itself to large-scale analysis and retrieval. One such example is swimming, where each race of a swimmer is captured on a camcorder and in-addition to the split-times (i.e., the time it takes for each lap), stroke rate and stroke-lengths are manually annotated. In this paper, we propose a vision-based system which effectively “digitizes” a large collection of archived swimming races by estimating the location of the swimmer in each frame, as well as detecting the stroke rate. As the videos are captured from moving hand-held cameras which are located at different positions and angles, we show our hierarchical-based approach to tracking the swimmer and their different parts is robust to these issues and allows us to accurately estimate the swimmer location and stroke rates.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"29 1","pages":"674-681"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82766189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Real time action recognition using histograms of depth gradients and random decision forests","authors":"H. Rahmani, A. Mahmood, D. Huynh, A. Mian","doi":"10.1109/WACV.2014.6836044","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836044","url":null,"abstract":"We propose an algorithm which combines the discriminative information from depth images as well as from 3D joint positions to achieve high action recognition accuracy. To avoid the suppression of subtle discriminative information and also to handle local occlusions, we compute a vector of many independent local features. Each feature encodes spatiotemporal variations of depth and depth gradients at a specific space-time location in the action volume. Moreover, we encode the dominant skeleton movements by computing a local 3D joint position difference histogram. For each joint, we compute a 3D space-time motion volume which we use as an importance indicator and incorporate in the feature vector for improved action discrimination. To retain only the discriminant features, we train a random decision forest (RDF). The proposed algorithm is evaluated on three standard datasets and compared with nine state-of-the-art algorithms. Experimental results show that, on the average, the proposed algorithm outperform all other algorithms in accuracy and have a processing speed of over 112 frames/second.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"1 1","pages":"626-633"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88786006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient dense subspace clustering","authors":"Pan Ji, M. Salzmann, Hongdong Li","doi":"10.1109/WACV.2014.6836065","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836065","url":null,"abstract":"In this paper, we tackle the problem of clustering data points drawn from a union of linear (or affine) subspaces. To this end, we introduce an efficient subspace clustering algorithm that estimates dense connections between the points lying in the same subspace. In particular, instead of following the standard compressive sensing approach, we formulate subspace clustering as a Frobenius norm minimization problem, which inherently yields denser con- nections between the data points. While in the noise-free case we rely on the self-expressiveness of the observations, in the presence of noise we simultaneously learn a clean dictionary to represent the data. Our formulation lets us address the subspace clustering problem efficiently. More specifically, the solution can be obtained in closed-form for outlier-free observations, and by performing a series of linear operations in the presence of outliers. Interestingly, we show that our Frobenius norm formulation shares the same solution as the popular nuclear norm minimization approach when the data is free of any noise, or, in the case of corrupted data, when a clean dictionary is learned. Our experimental evaluation on motion segmentation and face clustering demonstrates the benefits of our algorithm in terms of clustering accuracy and efficiency.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"49 1","pages":"461-468"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87395999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generalized feature learning and indexing for object localization and recognition","authors":"Ning Zhou, A. Angelova, Jianping Fan","doi":"10.1109/WACV.2014.6836100","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836100","url":null,"abstract":"This paper addresses a general feature indexing and retrieval scenario in which a set of features detected in the image can retrieve a relevant class of objects, or classes of objects. The main idea behind those features for general object retrieval is that they are capable of identifying and localizing some small regions or parts of the potential object. We propose a set of criteria which take advantage of the learned features to find regions in the image which likely belong to an object. We further use the features' localization capability to localize the full object of interest and its extents. The proposed approach improves the recognition performance and is very efficient. Moreover, it has the potential to be used in automatic image understanding or annotation since it can uncover regions where the objects can be found in an image.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"51 1","pages":"198-204"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86694802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vlad I. Morariu, Ejaz Ahmed, Venkataraman Santhanam, David Harwood, L. Davis
{"title":"Composite Discriminant Factor analysis","authors":"Vlad I. Morariu, Ejaz Ahmed, Venkataraman Santhanam, David Harwood, L. Davis","doi":"10.1109/WACV.2014.6836052","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836052","url":null,"abstract":"We propose a linear dimensionality reduction method, Composite Discriminant Factor (CDF) analysis, which searches for a discriminative but compact feature subspace that can be used as input to classifiers that suffer from problems such as multi-collinearity or the curse of dimensionality. The subspace selected by CDF maximizes the performance of the entire classification pipeline, and is chosen from a set of candidate subspaces that are each discriminative. Our method is based on Partial Least Squares (PLS) analysis, and can be viewed as a generalization of the PLS1 algorithm, designed to increase discrimination in classification tasks. We demonstrate our approach on the UCF50 action recognition dataset, two object detection datasets (INRIA pedestrians and vehicles from aerial imagery), and machine learning datasets from the UCI Machine Learning repository. Experimental results show that the proposed approach improves significantly in terms of accuracy over linear SVM, and also over PLS in terms of compactness and efficiency, while maintaining or improving accuracy.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"1 1","pages":"564-571"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90376019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving multiview face detection with multi-task deep convolutional neural networks","authors":"Cha Zhang, Zhengyou Zhang","doi":"10.1109/WACV.2014.6835990","DOIUrl":"https://doi.org/10.1109/WACV.2014.6835990","url":null,"abstract":"Multiview face detection is a challenging problem due to dramatic appearance changes under various pose, illumination and expression conditions. In this paper, we present a multi-task deep learning scheme to enhance the detection performance. More specifically, we build a deep convolutional neural network that can simultaneously learn the face/nonface decision, the face pose estimation problem, and the facial landmark localization problem. We show that such a multi-task learning scheme can further improve the classifier's accuracy. On the challenging FDDB data set, our detector achieves over 3% improvement in detection rate at the same false positive rate compared with other state-of-the-art methods.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"15 1","pages":"1036-1041"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73431647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Relative facial action unit detection","authors":"M. Khademi, Louis-Philippe Morency","doi":"10.1109/WACV.2014.6835983","DOIUrl":"https://doi.org/10.1109/WACV.2014.6835983","url":null,"abstract":"This paper presents a subject-independent facial action unit (AU) detection method by introducing the concept of relative AU detection, for scenarios where the neutral face is not provided. We propose a new classification objective function which analyzes the temporal neighborhood of the current frame to decide if the expression recently increased, decreased or showed no change. This approach is a significant change from the conventional absolute method which decides about AU classification using the current frame, without an explicit comparison with its neighboring frames. Our proposed method improves robustness to individual differences such as face scale and shape, age-related wrinkles, and transitions among expressions (e.g., lower intensity of expressions). Our experiments on three publicly available datasets (Extended Cohn-Kanade (CK+), Bosphorus, and DISFA databases) show significant improvement of our approach over conventional absolute techniques.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"18 1","pages":"1090-1095"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84428784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}