{"title":"Segment-Tree Based Cost Aggregation for Stereo Matching","authors":"Peng Yao, Hua Zhang, Yanbing Xue, Mian Zhou, Guangping Xu, Zan Gao, Shengyong Chen","doi":"10.1109/ICASSP.2017.7952512","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7952512","url":null,"abstract":"This paper presents a novel tree-based cost aggregation method for dense stereo matching. Instead of employing the minimum spanning tree (MST) and its variants, a new tree structure, \"Segment-Tree\", is proposed for non-local matching cost aggregation. Conceptually, the segment-tree is constructed in a three-step process: first, the pixels are grouped into a set of segments with the reference color or intensity image, second, a tree graph is created for each segment, and in the final step, these independent segment graphs are linked to form the segment-tree structure. In practice, this tree can be efficiently built in time nearly linear to the number of the image pixels. Compared to MST where the graph connectivity is determined with local edge weights, our method introduces some 'non-local' decision rules: the pixels in one perceptually consistent segment are more likely to share similar disparities, and therefore their connectivity within the segment should be first enforced in the tree construction process. The matching costs are then aggregated over the tree within two passes. Performance evaluation on 19 Middlebury data sets shows that the proposed method is comparable to previous state-of-the-art aggregation methods in disparity accuracy and processing speed. Furthermore, the tree structure can be refined with the estimated disparities, which leads to consistent scene segmentation and significantly better aggregation results.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"76 1","pages":"313-320"},"PeriodicalIF":0.0,"publicationDate":"2017-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87048196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tag Taxonomy Aware Dictionary Learning for Region Tagging","authors":"Jingjing Zheng, Zhuolin Jiang","doi":"10.1109/CVPR.2013.54","DOIUrl":"https://doi.org/10.1109/CVPR.2013.54","url":null,"abstract":"Tags of image regions are often arranged in a hierarchical taxonomy based on their semantic meanings. In this paper, using the given tag taxonomy, we propose to jointly learn multi-layer hierarchical dictionaries and corresponding linear classifiers for region tagging. Specifically, we generate a node-specific dictionary for each tag node in the taxonomy, and then concatenate the node-specific dictionaries from each level to construct a level-specific dictionary. The hierarchical semantic structure among tags is preserved in the relationship among node-dictionaries. Simultaneously, the sparse codes obtained using the level-specific dictionaries are summed up as the final feature representation to design a linear classifier. Our approach not only makes use of sparse codes obtained from higher levels to help learn the classifiers for lower levels, but also encourages the tag nodes from lower levels that have the same parent tag node to implicitly share sparse codes obtained from higher levels. Experimental results using three benchmark datasets show that the proposed approach yields the best performance over recently proposed methods.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"66 1","pages":"369-376"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73491711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Simultaneous Super-Resolution of Depth and Images Using a Single Camera","authors":"Hee Seok Lee, Kyoung Mu Lee","doi":"10.1109/CVPR.2013.43","DOIUrl":"https://doi.org/10.1109/CVPR.2013.43","url":null,"abstract":"In this paper, we propose a convex optimization framework for simultaneous estimation of super-resolved depth map and images from a single moving camera. The pixel measurement error in 3D reconstruction is directly related to the resolution of the images at hand. In turn, even a small measurement error can cause significant errors in reconstructing 3D scene structure or camera pose. Therefore, enhancing image resolution can be an effective solution for securing the accuracy as well as the resolution of 3D reconstruction. In the proposed method, depth map estimation and image super-resolution are formulated in a single energy minimization framework with a convex function and solved efficiently by a first-order primal-dual algorithm. Explicit inter-frame pixel correspondences are not required for our super-resolution procedure, thus we can avoid a huge computation time and obtain improved depth map in the accuracy and resolution as well as high-resolution images with reasonable time. The superiority of our algorithm is demonstrated by presenting the improved depth map accuracy, image super-resolution results, and camera pose estimation.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"5 1","pages":"281-288"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75669127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Baoyuan Liu, Fereshteh Sadeghi, M. Tappen, O. Shamir, Ce Liu
{"title":"Probabilistic Label Trees for Efficient Large Scale Image Classification","authors":"Baoyuan Liu, Fereshteh Sadeghi, M. Tappen, O. Shamir, Ce Liu","doi":"10.1109/CVPR.2013.114","DOIUrl":"https://doi.org/10.1109/CVPR.2013.114","url":null,"abstract":"Large-scale recognition problems with thousands of classes pose a particular challenge because applying the classifier requires more computation as the number of classes grows. The label tree model integrates classification with the traversal of the tree so that complexity grows logarithmically. In this paper, we show how the parameters of the label tree can be found using maximum likelihood estimation. This new probabilistic learning technique produces a label tree with significantly improved recognition accuracy.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"62 1","pages":"843-850"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75696811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Subspace Interpolation via Dictionary Learning for Unsupervised Domain Adaptation","authors":"Jie Ni, Qiang Qiu, R. Chellappa","doi":"10.1109/CVPR.2013.95","DOIUrl":"https://doi.org/10.1109/CVPR.2013.95","url":null,"abstract":"Domain adaptation addresses the problem where data instances of a source domain have different distributions from that of a target domain, which occurs frequently in many real life scenarios. This work focuses on unsupervised domain adaptation, where labeled data are only available in the source domain. We propose to interpolate subspaces through dictionary learning to link the source and target domains. These subspaces are able to capture the intrinsic domain shift and form a shared feature representation for cross domain recognition. Further, we introduce a quantitative measure to characterize the shift between two domains, which enables us to select the optimal domain to adapt to the given multiple source domains. We present experiments on face recognition across pose, illumination and blur variations, cross dataset object recognition, and report improved performance over the state of the art.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"12 9","pages":"692-699"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72608948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Ramalingam, Jaishanker K. Pillai, Arpit Jain, Yuichi Taguchi
{"title":"Manhattan Junction Catalogue for Spatial Reasoning of Indoor Scenes","authors":"S. Ramalingam, Jaishanker K. Pillai, Arpit Jain, Yuichi Taguchi","doi":"10.1109/CVPR.2013.394","DOIUrl":"https://doi.org/10.1109/CVPR.2013.394","url":null,"abstract":"Junctions are strong cues for understanding the geometry of a scene. In this paper, we consider the problem of detecting junctions and using them for recovering the spatial layout of an indoor scene. Junction detection has always been challenging due to missing and spurious lines. We work in a constrained Manhattan world setting where the junctions are formed by only line segments along the three principal orthogonal directions. Junctions can be classified into several categories based on the number and orientations of the incident line segments. We provide a simple and efficient voting scheme to detect and classify these junctions in real images. Indoor scenes are typically modeled as cuboids and we formulate the problem of the cuboid layout estimation as an inference problem in a conditional random field. Our formulation allows the incorporation of junction features and the training is done using structured prediction techniques. We outperform other single view geometry estimation methods on standard datasets.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"35 1","pages":"3065-3072"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74802054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junjie Yan, Xucong Zhang, Zhen Lei, Shengcai Liao, S. Li
{"title":"Robust Multi-resolution Pedestrian Detection in Traffic Scenes","authors":"Junjie Yan, Xucong Zhang, Zhen Lei, Shengcai Liao, S. Li","doi":"10.1109/CVPR.2013.390","DOIUrl":"https://doi.org/10.1109/CVPR.2013.390","url":null,"abstract":"The serious performance decline with decreasing resolution is the major bottleneck for current pedestrian detection techniques. In this paper, we take pedestrian detection in different resolutions as different but related problems, and propose a Multi-Task model to jointly consider their commonness and differences. The model contains resolution aware transformations to map pedestrians in different resolutions to a common space, where a shared detector is constructed to distinguish pedestrians from background. For model learning, we present a coordinate descent procedure to learn the resolution aware transformations and deformable part model (DPM) based detector iteratively. In traffic scenes, there are many false positives located around vehicles, therefore, we further build a context model to suppress them according to the pedestrian-vehicle relationship. The context model can be learned automatically even when the vehicle annotations are not available. Our method reduces the mean miss rate to 60% for pedestrians taller than 30 pixels on the Caltech Pedestrian Benchmark, which noticeably outperforms previous state-of-the-art (71%).","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"1 1","pages":"3033-3040"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72675360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Blur Processing Using Double Discrete Wavelet Transform","authors":"Yi Zhang, Keigo Hirakawa","doi":"10.1109/CVPR.2013.145","DOIUrl":"https://doi.org/10.1109/CVPR.2013.145","url":null,"abstract":"We propose a notion of double discrete wavelet transform (DDWT) that is designed to sparsify the blurred image and the blur kernel simultaneously. DDWT greatly enhances our ability to analyze, detect, and process blur kernels and blurry images-the proposed framework handles both global and spatially varying blur kernels seamlessly, and unifies the treatment of blur caused by object motion, optical defocus, and camera shake. To illustrate the potential of DDWT in computer vision and image processing, we develop example applications in blur kernel estimation, deblurring, and near-blur-invariant image feature extraction.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"933 1","pages":"1091-1098"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72683702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dense Non-rigid Point-Matching Using Random Projections","authors":"Raffay Hamid, D. DeCoste, Chih-Jen Lin","doi":"10.1109/CVPR.2013.375","DOIUrl":"https://doi.org/10.1109/CVPR.2013.375","url":null,"abstract":"We present a robust and efficient technique for matching dense sets of points undergoing non-rigid spatial transformations. Our main intuition is that the subset of points that can be matched with high confidence should be used to guide the matching procedure for the rest. We propose a novel algorithm that incorporates these high-confidence matches as a spatial prior to learn a discriminative subspace that simultaneously encodes both the feature similarity as well as their spatial arrangement. Conventional subspace learning usually requires spectral decomposition of the pair-wise distance matrix across the point-sets, which can become inefficient even for moderately sized problems. To this end, we propose the use of random projections for approximate subspace learning, which can provide significant time improvements at the cost of minimal precision loss. This efficiency gain allows us to iteratively find and remove high-confidence matches from the point sets, resulting in high recall. To show the effectiveness of our approach, we present a systematic set of experiments and results for the problem of dense non-rigid image-feature matching.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"17 1","pages":"2914-2921"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79342251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Axially Symmetric 3D Pots Configuration System Using Axis of Symmetry and Break Curve","authors":"Kilho Son, Eduardo B. Almeida, D. Cooper","doi":"10.1109/CVPR.2013.40","DOIUrl":"https://doi.org/10.1109/CVPR.2013.40","url":null,"abstract":"This paper introduces a novel approach for reassembling pot sherds found at archaeological excavation sites, for the purpose of reconstructing clay pots that had been made on a wheel. These pots and the sherds into which they have broken are axially symmetric. The reassembly process can be viewed as 3D puzzle solving or generalized cylinder learning from broken fragments. The estimation exploits both local and semi-global geometric structure, thus making it a fundamental problem of geometry estimation from noisy fragments in computer vision and pattern recognition. The data used are densely digitized 3D laser scans of each fragment's outer surface. The proposed reassembly system is automatic and functions when the pile of available fragments is from one or multiple pots, and even when pieces are missing from any pot. The geometric structure used are curves on the pot along which the surface had broken and the silhouette of a pot with respect to an axis, called axis-profile curve (APC). For reassembling multiple pots with or without missing pieces, our algorithm estimates the APC from each fragment, then reassembles into configurations the ones having distinctive APC. Further growth of configurations is based on adding remaining fragments such that their APC and break curves are consistent with those of a configuration. The method is novel, more robust and handles the largest numbers of fragments to date.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"32 1","pages":"257-264"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85487967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}