{"title":"Real-time video decolorization using bilateral filtering","authors":"Yibing Song, Linchao Bao, Qingxiong Yang","doi":"10.1109/WACV.2014.6836106","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836106","url":null,"abstract":"This paper presents a real-time decolorization method. Given the human visual systems preference for luminance information, the luminance should be preserved as much as possible during decolorization. As a result, the proposed decolorization method measures the amount of color contrast/detail lost when converting color to luminance. The detail loss is estimated by computing the difference between two intermediate images: one obtained by applying bilateral filter to the original color image, and the other obtained by applying joint bilateral filter to the original color image with its luminance as the guidance image. The estimated detail loss is then mapped to a grayscale image named residual image by minimizing the difference between the image gradients of the input color image and the objective grayscale image that is the sum of the residual image and the luminance. Apparently, the residual image will contain pixels with all zero values (that is the two intermediate images will be the same) only when no visual detail is missing in the luminance. Unlike most previous methods, the proposed decolorization method preserves both contrast in the color image and the luminance. Quantitative evaluation shows that it is the top performer on the standard test suite. Meanwhile it is very robust and can be directly used to convert videos while maintaining the temporal coherence. Specifically it can convert a high-resolution video (1280 × 720) in real time (about 28 Hz) on a 3.4 GHz i7 CPU.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"55 1","pages":"159-166"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90052446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hamidreza Odabai Fard, M. Chaouch, Q. Pham, A. Vacavant, T. Chateau
{"title":"Joint hierarchical learning for efficient multi-class object detection","authors":"Hamidreza Odabai Fard, M. Chaouch, Q. Pham, A. Vacavant, T. Chateau","doi":"10.1109/WACV.2014.6836090","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836090","url":null,"abstract":"In addition to multi-class classification, the multi-class object detection task consists further in classifying a dominating background label. In this work, we present a novel approach where relevant classes are ranked higher and background labels are rejected. To this end, we arrange the classes into a tree structure where the classifiers are trained in a joint framework combining ranking and classification constraints. Our convex problem formulation naturally allows to apply a tree traversal algorithm that searches for the best class label and progressively rejects background labels. We evaluate our approach on the PASCAL VOC 2007 dataset and show a considerable speed-up of the detection time with increased detection performance.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"58 1","pages":"261-268"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90557973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mining discriminative 3D Poselet for cross-view action recognition","authors":"Jiang Wang, Xiaohan Nie, Yin Xia, Ying Wu","doi":"10.1109/WACV.2014.6836043","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836043","url":null,"abstract":"This paper presents a novel approach to cross-view action recognition. Traditional cross-view action recognition methods typically rely on local appearance/motion features. In this paper, we take advantage of the recent developments of depth cameras to build a more discriminative cross-view action representation. In this representation, an action is characterized by the spatio-temporal configuration of 3D Poselets, which are discriminatively discovered with a novel Poselet mining algorithm and can be detected with view-invariant 3D Poselet detectors. The Kinect skeleton is employed to facilitate the 3D Poselet mining and 3D Poselet detectors learning, but the recognition is solely based on 2D video input. Extensive experiments have demonstrated that this new action representation significantly improves the accuracy and robustness for cross-view action recognition.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"69 1","pages":"634-639"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77063414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Praveen Kulkarni, Gaurav Sharma, J. Zepeda, Louis Chevallier
{"title":"Transfer learning via attributes for improved on-the-fly classification","authors":"Praveen Kulkarni, Gaurav Sharma, J. Zepeda, Louis Chevallier","doi":"10.1109/WACV.2014.6836097","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836097","url":null,"abstract":"Retrieving images for an arbitrary user query, provided in textual form, is a challenging problem. A recently proposed method addresses this by constructing a visual classifier with images returned by an internet image search engine, based on the user query, as positive images while using a fixed pool of negative images. However, in practice, not all the images obtained from internet image search are always pertinent to the query; some might contain abstract or artistic representation of the content and some might have artifacts. Such images degrade the performance of on-the-fly constructed classifier. We propose a method for improving the performance of on-the-fly classifiers by using transfer learning via attributes. We first map the textual query to a set of known attributes and then use those attributes to prune the set of images downloaded from the internet. This pruning step can be seen as zero-shot learning of the visual classifier for the textual user query, which transfers knowledge from the attribute domain to the query domain. We also use the attributes along with the on-the-fly classifier to score the database images and obtain a hybrid ranking. We show interesting qualitative results and demonstrate by experiments with standard datasets that the proposed method improves upon the baseline on-the-fly classification system.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"168 1","pages":"220-226"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86887252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optical filter selection for automatic visual inspection","authors":"Matthias Richter, J. Beyerer","doi":"10.1109/WACV.2014.6836110","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836110","url":null,"abstract":"The color of a material is one of the most frequently used features in automated visual inspection systems. While this is sufficient for many “easy” tasks, mixed and organic materials usually require more complex features. Spectral signatures, especially in the near infrared range, have been proven useful in many cases. However, hyperspectral imaging devices are still very costly and too slow to use them in practice. As a work-around, off-the-shelve cameras and optical filters are used to extract few characteristic features from the spectra. Often, these filters are selected by a human expert in a time consuming and error prone process; surprisingly few works are concerned with automatic selection of suitable filters. We approach this problem by stating filter selection as feature selection problem. In contrast to existing techniques that are mainly concerned with filter design, our approach explicitly selects the best out of a large set of given filters. Our method becomes most appealing for use in an industrial setting, when this selection represents (physically) available filters. We show the application of our technique by implementing six different selection strategies and applying each to two real-world sorting problems.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"2021 1","pages":"123-128"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87954008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sid Ying-Ze Bao, A. Furlan, Li Fei-Fei, S. Savarese
{"title":"Understanding the 3D layout of a cluttered room from multiple images","authors":"Sid Ying-Ze Bao, A. Furlan, Li Fei-Fei, S. Savarese","doi":"10.1109/WACV.2014.6836035","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836035","url":null,"abstract":"We present a novel framework for robustly understanding the geometrical and semantic structure of a cluttered room from a small number of images captured from different viewpoints. The tasks we seek to address include: i) estimating the 3D layout of the room - that is, the 3D configuration of floor, walls and ceiling; ii) identifying and localizing all the foreground objects in the room. We jointly use multiview geometry constraints and image appearance to identify the best room layout configuration. Extensive experimental evaluation demonstrates that our estimation results are more complete and accurate in estimating 3D room structure and recognizing objects than alternative state-of-the-art algorithms. In addition, we show an augmented reality mobile application to highlight the high accuracy of our method, which may be beneficial to many computer vision applications.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"27 1","pages":"690-697"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89065362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wenbin Li, Yang Chen, JeeHang Lee, Gang Ren, D. Cosker
{"title":"Robust optical flow estimation for continuous blurred scenes using RGB-motion imaging and directional filtering","authors":"Wenbin Li, Yang Chen, JeeHang Lee, Gang Ren, D. Cosker","doi":"10.1109/WACV.2014.6836022","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836022","url":null,"abstract":"Optical flow estimation is a difficult task given real-world video footage with camera and object blur. In this paper, we combine a 3D pose&position tracker with an RGB sensor allowing us to capture video footage together with 3D camera motion. We show that the additional camera motion information can be embedded into a hybrid optical flow framework by interleaving an iterative blind deconvolution and warping based minimization scheme. Such a hybrid framework significantly improves the accuracy of optical flow estimation in scenes with strong blur. Our approach yields improved overall performance against three state-of-the-art baseline methods applied to our proposed ground truth sequences, as well as in several other real-world sequences captured by our novel imaging system.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"108 1","pages":"792-799"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87611216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Benchmarking large-scale Fine-Grained Categorization","authors":"A. Angelova, Philip M. Long","doi":"10.1109/WACV.2014.6836056","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836056","url":null,"abstract":"This paper presents a systematic evaluation of recent methods in the fine-grained categorization domain, which have shown significant promise. More specifically, we investigate an automatic segmentation algorithm, a region pooling algorithm which is akin to pose-normalized pooling [31] [28], and a multi-class optimization method. We considered the largest and most popular datasets for fine-grained categorization available in the field: the Caltech-UCSD 200 Birds dataset [27], the Oxford 102 Flowers dataset [19], the Stanford 120 Dogs dataset [16], and the Oxford 37 Cats and Dogs dataset [21]. We view this work from a practitioner's perspective, answering the question: what are the methods that can create the best possible fine-grained recognition system which can be applied in practice? Our experiments provide insights of the relative merit of these methods. More importantly, after combining the methods, we achieve the top results in the field, outperforming the state-of-the-art methods by 4.8% and 10.3% for birds and dogs datasets, respectively. Additionally, our method achieves a mAP of 37.92 on the of 2012 Imagenet Fine-Grained Categorization Challenge [1], which outperforms the winner of this challenge by 5.7 points.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"83 1","pages":"532-539"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89952993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast dense 3D reconstruction using an adaptive multiscale discrete-continuous variational method","authors":"Z. Kang, G. Medioni","doi":"10.1109/WACV.2014.6836118","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836118","url":null,"abstract":"We present a system for fast dense 3D reconstruction with a hand-held camera. Walking around a target object, we shoot sequential images using continuous shooting mode. High-quality camera poses are obtained offline using structure-from-motion (SfM) algorithm with Bundle Adjustment. Multi-view stereo is solved using a new, efficient adaptive multiscale discrete-continuous variational method to generate depth maps with sub-pixel accuracy. Depth maps are then fused into a 3D model using volumetric integration with truncated signed distance function (TSDF). Our system is accurate, efficient and flexible: accurate depth maps are estimated with sub-pixel accuracy in stereo matching; dense models can be achieved within minutes as major algorithms parallelized on multi-core processor and GPU; various tasks can be handled (e.g. reconstruction of objects in both indoor and outdoor environment with different scales) without specific hand-tuning parameters. We evaluate our system quantitatively and qualitatively on Middlebury benchmark and another dataset collected with a smartphone camera.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"5 9 1","pages":"53-60"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80468986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Segmentation and tracking of partial planar templates","authors":"Abdelsalam Masoud, W. Hoff","doi":"10.1109/WACV.2014.6835731","DOIUrl":"https://doi.org/10.1109/WACV.2014.6835731","url":null,"abstract":"We present an algorithm that can segment and track partial planar templates, from a sequence of images taken from a moving camera. By “partial planar template”, we mean that the template is the projection of a surface patch that is only partially planar; some of the points may correspond to other surfaces. The algorithm segments each image template to identify the pixels that belong to the dominant plane, and determines the three dimensional structure of that plane. We show that our algorithm can track such patches over a larger visual angle, compared to algorithms that assume that patches arise from a single planar surface. The new tracking algorithm is expected to improve the accuracy of visual simultaneous localization and mapping, especially in outdoor natural scenes where planar features are rare.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"432 1","pages":"1128-1133"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77509204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}