{"title":"Hierarchical representation of videos with spatio-temporal fibers","authors":"Ratnesh Kumar, G. Charpiat, M. Thonnat","doi":"10.1109/WACV.2014.6836064","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836064","url":null,"abstract":"We propose a new representation of videos, as spatio-temporal fibers. These fibers are clusters of trajectories that are meshed spatially in the image domain. They form a hierarchical partition of the video into regions that are coherent in time and space. They can be seen as dense, spatially-organized, long-term optical flow. Their robustness to noise and ambiguities is ensured by taking into account the reliability of each source of information. As fibers allow users to handle easily moving objects in videos, they prove useful for video editing, as demonstrated in a video inpainting example.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"22 1","pages":"469-476"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86636857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vicente Ordonez, V. Jagadeesh, Wei Di, Anurag Bhardwaj, Robinson Piramuthu
{"title":"Furniture-geek: Understanding fine-grained furniture attributes from freely associated text and tags","authors":"Vicente Ordonez, V. Jagadeesh, Wei Di, Anurag Bhardwaj, Robinson Piramuthu","doi":"10.1109/WACV.2014.6836083","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836083","url":null,"abstract":"As the amount of user generated content on the internet grows, it becomes ever more important to come up with vision systems that learn directly from weakly annotated and noisy data. We leverage a large scale collection of user generated content comprising of images, tags and title/captions of furniture inventory from an e-commerce website to discover and categorize learnable visual attributes. Furniture categories have long been the quintessential example of why computer vision is hard, and we make one of the first attempts to understand them through a large scale weakly annotated dataset. We focus on a handful of furniture categories that are associated with a large number of fine-grained attributes. We propose a set of localized feature representations built on top of state-of-the-art computer vision representations originally designed for fine-grained object categorization. We report a thorough empirical characterization on the visual identifiability of various fine-grained attributes using these representations and show encouraging results on finding iconic images and on multi-attribute prediction.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"11 1","pages":"317-324"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84157800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data-driven exemplar model selection","authors":"Ishan Misra, Abhinav Shrivastava, M. Hebert","doi":"10.1109/WACV.2014.6836080","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836080","url":null,"abstract":"We consider the problem of discovering discriminative exemplars suitable for object detection. Due to the diversity in appearance in real world objects, an object detector must capture variations in scale, viewpoint, illumination etc. The current approaches do this by using mixtures of models, where each mixture is designed to capture one (or a few) axis of variation. Current methods usually rely on heuristics to capture these variations; however, it is unclear which axes of variation exist and are relevant to a particular task. Another issue is the requirement of a large set of training images to capture such variations. Current methods do not scale to large training sets either because of training time complexity [31] or test time complexity [26]. In this work, we explore the idea of compactly capturing task-appropriate variation from the data itself. We propose a two stage data-driven process, which selects and learns a compact set of exemplar models for object detection. These selected models have an inherent ranking, which can be used for anytime/budgeted detection scenarios. Another benefit of our approach (beyond the computational speedup) is that the selected set of exemplar models performs better than the entire set.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"10 1","pages":"339-346"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84218738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Rotation estimation from cloud tracking","authors":"Sangwoo Cho, Enrique Dunn, Jan-Michael Frahm","doi":"10.1109/WACV.2014.6836006","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836006","url":null,"abstract":"We address the problem of online relative orientation estimation from streaming video captured by a sky-facing camera on a mobile device. Namely, we rely on the detection and tracking of visual features attained from cloud structures. Our proposed method achieves robust and efficient operation by combining realtime visual odometry modules, learning based feature classification, and Kalman filtering within a robustness-driven data management framework, while achieving framerate processing on a mobile device. The relatively large 3D distance between the camera and the observed cloud features is leveraged to simplify our processing pipeline. First, as an efficiency driven optimization, we adopt a homography based motion model and focus on estimating relative rotations across adjacent keyframes. To this end, we rely on efficient feature extraction, KLT tracking, and RANSAC based model fitting. Second, to ensure the validity of our simplified motion model, we segregate detected cloud features from scene features through SVM classification. Finally, to make tracking more robust, we employ predictive Kalman filtering to enable feature persistence through temporary occlusions and manage feature spatial distribution to foster tracking robustness. Results exemplify the accuracy and robustness of the proposed approach and highlight its potential as a passive orientation sensor.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"96 1","pages":"917-924"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85886402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jose Rivera-Rubio, Saad Idrees, I. Alexiou, Lucas Hadjilucas, A. Bharath
{"title":"Small Hand-held Object Recognition Test (SHORT)","authors":"Jose Rivera-Rubio, Saad Idrees, I. Alexiou, Lucas Hadjilucas, A. Bharath","doi":"10.1109/WACV.2014.6836057","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836057","url":null,"abstract":"The ubiquity of smartphones with high quality cameras and fast network connections will spawn many new applications. One of these is visual object recognition, an emerging smartphone feature which could play roles in high-street shopping, price comparisons and similar uses. There are also potential roles for such technology in assistive applications, such as for people who have visual impairment. We introduce the Small Hand-held Object Recognition Test (SHORT), a new dataset that aims to benchmark the performance of algorithms for recognising hand-held objects from either snapshots or videos acquired using hand-held or wearable cameras. We show that SHORT provides a set of images and ground truth that help assess the many factors that affect recognition performance. SHORT is designed to be focused on the assistive systems context, though it can provide useful information on more general aspects of recognition performance for hand-held objects. We describe the present state of the dataset, comprised of a small set of high quality training images and a large set of nearly 135,000 smartphone-captured test images of 30 grocery products. In this version, SHORT addresses another context not covered by traditional datasets, in which high quality catalogue images are being compared with variable quality user-captured images; this makes the matching more challenging in SHORT than other datasets. Images of similar quality are often not present in “database” and “query” datasets, a situation being increasingly encountered in commercial applications. Finally, we compare the results of popular object recognition algorithms of different levels of complexity when tested against SHORT and discuss the research challenges arising from the particularities of visual object recognition from objects that are being held by users.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"9 1","pages":"524-531"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88417994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Is my new tracker really better than yours?","authors":"Luka Cehovin, M. Kristan, A. Leonardis","doi":"10.1109/WACV.2014.6836055","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836055","url":null,"abstract":"The problem of visual tracking evaluation is sporting an abundance of performance measures, which are used by various authors, and largely suffers from lack of consensus about which measures should be preferred. This is hampering the cross-paper tracker comparison and faster advancement of the field. In this paper we provide an overview of the popular measures and performance visualizations and their critical theoretical and experimental analysis. We show that several measures are equivalent from the point of information they provide for tracker comparison and, crucially, that some are more brittle than the others. Based on our analysis we narrow down the set of potential measures to only two complementary ones that can be intuitively interpreted and visualized, thus pushing towards homogenization of the tracker evaluation methodology.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"15 1","pages":"540-547"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90943230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Offline learning of prototypical negatives for efficient online Exemplar SVM","authors":"Masato Takami, Peter Bell, B. Ommer","doi":"10.1109/WACV.2014.6836075","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836075","url":null,"abstract":"Online searches in big image databases require sufficient results in feasible time. Digitization campaigns have simplified the access to a huge number of images in the field of art history, which can be analyzed by detecting duplicates and similar objects in the dataset. A high recall is essential for the evaluation and therefore the search method has to be robust against minor changes due to smearing or aging effects of the documents. At the same time the computational time has to be short to allow a practical use of the online search. By using an Exemplar SVM based classifier [12] a high recall can be achieved, but the mining of negatives and the multiple rounds of retraining for every search makes the method too time-consuming. An even bigger problem is that by allowing arbitrary query regions, it is not possible to provide a training set, which would be necessary to create a classifier. To solve this, we created a pool of general negatives offline in advance, which can be used by any arbitrary input in the online search step and requires only one short training round without the time-consuming mining. In a second step, this classifier is improved by using positive detections in an additional training round. This results in a classifier for the online search in unlabeled datasets, which provides high recall in short calculation time respectively.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"81 1","pages":"377-384"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77308573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. Rybok, Boris Schauerte, Ziad Al-Halah, R. Stiefelhagen
{"title":"“Important stuff, everywhere!” Activity recognition with salient proto-objects as context","authors":"L. Rybok, Boris Schauerte, Ziad Al-Halah, R. Stiefelhagen","doi":"10.1109/WACV.2014.6836041","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836041","url":null,"abstract":"Object information is an important cue to discriminate between activities that draw part of their meaning from context. Most of current work either ignores this information or relies on specific object detectors. However, such object detectors require a significant amount of training data and complicate the transfer of the action recognition framework to novel domains with different objects and object-action relationships. Motivated by recent advances in saliency detection, we propose to employ salient proto-objects for unsupervised discovery of object- and object-part candidates and use them as a contextual cue for activity recognition. Our experimental evaluation on three publicly available data sets shows that the integration of proto-objects and simple motion features substantially improves recognition performance, outperforming the state-of-the-art.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"24 1","pages":"646-651"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79262611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Object co-labeling in multiple images","authors":"Xi Chen, Arpit Jain, L. Davis","doi":"10.1109/WACV.2014.6836031","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836031","url":null,"abstract":"We introduce a new problem called object co-labeling where the goal is to jointly annotate multiple images of the same scene which do not have temporal consistency. We present an adaptive framework for joint segmentation and recognition to solve this problem. We propose an objective function that considers not only appearance but also appearance and context consistency across images of the scene. A relaxed form of the cost function is minimized using an efficient quadratic programming solver. Our approach improves labeling performance compared to labeling each image individually. We also show the application of our co-labeling framework to other recognition problems such as label propagation in videos and object recognition in similar scenes. Experimental results demonstrates the efficacy of our approach.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"29 1","pages":"721-728"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77957816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi class boosted random ferns for adapting a generic object detector to a specific video","authors":"Pramod Sharma, R. Nevatia","doi":"10.1109/WACV.2014.6836028","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836028","url":null,"abstract":"Detector adaptation is a challenging problem and several methods have been proposed in recent years. We propose multi class boosted random ferns for detector adaptation. First we collect online samples in an unsupervised manner and collected positive online samples are divided into different categories for different poses of the object. Then we train a multi-class boosted random fern adaptive classifier. Our adaptive classifier training focuses on two aspects: discriminability and efficiency. Boosting provides discriminative random ferns. For efficiency, our boosting procedure focuses on sharing the same feature among different classes and multiple strong classifiers are trained in a single boosting framework. Experiments on challenging public datasets demonstrate effectiveness of our approach.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"33 1","pages":"745-752"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84756556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}