{"title":"Bayesian Optimization with an Empirical Hardness Model for approximate Nearest Neighbour Search","authors":"Julieta Martinez, J. Little, Nando de Freitas","doi":"10.1109/WACV.2014.6836049","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836049","url":null,"abstract":"Nearest Neighbour Search in high-dimensional spaces is a common problem in Computer Vision. Although no algorithm better than linear search is known, approximate algorithms are commonly used to tackle this problem. The drawback of using such algorithms is that their performance depends highly on parameter tuning. While this process can be automated using standard empirical optimization techniques, tuning is still time-consuming. In this paper, we propose to use Empirical Hardness Models to reduce the number of parameter configurations that Bayesian Optimization has to try, speeding up the optimization process. Evaluation on standard benchmarks of SIFT and GIST descriptors shows the viability of our approach.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"77 1","pages":"588-595"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80764441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A lp-norm MTMKL framework for simultaneous detection of multiple facial action units","authors":"Xiao Zhang, M. Mahoor, S. Mavadati, J. Cohn","doi":"10.1109/WACV.2014.6835735","DOIUrl":"https://doi.org/10.1109/WACV.2014.6835735","url":null,"abstract":"Facial action unit (AU) detection is a challenging topic in computer vision and pattern recognition. Most existing approaches design classifiers to detect AUs individually or AU combinations without considering the intrinsic relations among AUs. This paper presents a novel method, lp-norm multi-task multiple kernel learning (MTMKL), that jointly learns the classifiers for detecting the absence and presence of multiple AUs. lp-norm MTMKL is an extension of the regularized multi-task learning, which learns shared kernels from a given set of base kernels among all the tasks within Support Vector Machines (SVM). Our approach has several advantages over existing methods: (1) AU detection work is transformed to a MTL problem, where given a specific frame, multiple AUs are detected simultaneously by exploiting their inter-relations; (2) lp-norm multiple kernel learning is applied to increase the discriminant power of classifiers. Our experimental results on the CK+ and DISFA databases show that the proposed method outperforms the state-of-the-art methods for AU detection.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"17 1","pages":"1104-1111"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73480128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Coupling video segmentation and action recognition","authors":"Amir Ghodrati, M. Pedersoli, T. Tuytelaars","doi":"10.1109/WACV.2014.6836045","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836045","url":null,"abstract":"Recently a lot of progress has been made in the field of video segmentation. The question then arises whether and how these results can be exploited for this other video processing challenge, action recognition. In this paper we show that a good segmentation is actually very important for recognition. We propose and evaluate several ways to integrate and combine the two tasks: i) recognition using a standard, bottom-up segmentation, ii) using a top-down segmentation geared towards actions, iii) using a segmentation based on inter-video similarities (co-segmentation), and iv) tight integration of recognition and segmentation via iterative learning. Our results clearly show that, on the one hand, the two tasks are interdependent and therefore an iterative optimization of the two makes sense and gives better results. On the other hand, comparable results can also be obtained with two separate steps but mapping the feature-space with a non-linear kernel.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"35 1","pages":"618-625"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72753658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Finger-knuckle-print verification based on vector consistency of corresponding interest points","authors":"Min-Ki Kim, P. Flynn","doi":"10.1109/WACV.2014.6835996","DOIUrl":"https://doi.org/10.1109/WACV.2014.6835996","url":null,"abstract":"This paper proposes a novel finger-knuckle-print (FKP) verification method based on vector consistency among corresponding interest points (CIPs) detected from aligned finger images. We used two different approaches for reliable detection of CIPs; one method employs SIFT features and captures gradient directionality, and the other method employs phase correlation to represent the intensity field surrounding an interest point. The consistency of interframe displacements between pairs of matching CIPs in a match pair is used as a matching score. Such displacements will show consistency in a genuine match but not in an impostor match. Experimental results show that the proposed approach is effective in FKP verification.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"15 1","pages":"992-997"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76430927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Linear Local Distance coding for classification of HEp-2 staining patterns","authors":"Xiang Xu, F. Lin, Carol Ng, K. Leong","doi":"10.1109/WACV.2014.6836073","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836073","url":null,"abstract":"Indirect Immunofluorescence (IIF) on Human Epithelial-2 (HEp-2) cells is the recommended methodology for detecting some specific autoimmune diseases by searching for antinuclear antibodies (ANAs) within a patient's serum. Due to the limitations of IIF such as subjective evaluation, automated Computer-Aided Diagnosis (CAD) system is required for diagnostic purposes. In particular, staining patterns classification of HEp-2 cells is a challenging task. In this paper, we adopt a feature extraction-coding-pooling framework which has shown impressive performance in image classification tasks, because it can obtain discriminative and effective image representation. However, the information loss is inevitable in the coding process. Therefore, we propose a Linear Local Distance (LLD) coding method to capture more discriminative information. LLD transforms original local feature to local distance vector by searching for local nearest few neighbors of local feature in the class-specific manifolds. The obtained local distance vector is further encoded and pooled together to get salient image representation. We demonstrate the effectiveness of LLD method on a public HEp-2 cells dataset containing six major staining patterns. Experimental results show that our approach has a superior performance to the state-of-the-art coding methods for staining patterns classification of HEp-2 cells.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"8 1","pages":"393-400"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81445168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Structure-aware keypoint tracking for partial occlusion handling","authors":"W. Bouachir, Guillaume-Alexandre Bilodeau","doi":"10.1109/WACV.2014.6836011","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836011","url":null,"abstract":"This paper introduces a novel keypoint-based method for visual object tracking. To represent the target, we use a new model combining color distribution with keypoints. The appearance model also incorporates the spatial layout of the keypoints, encoding the object structure learned during tracking. With this multi-feature appearance model, our Structure-Aware Tracker (SAT) estimates accurately the target location using three main steps. First, the search space is reduced to the most likely image regions with a probabilistic approach. Second, the target location is estimated in the reduced search space using deterministic keypoint matching. Finally, the location prediction is corrected by exploiting the keypoint structural model with a voting-based method. By applying our SAT on several tracking problems, we show that location correction based on structural constraints is a key technique to improve prediction in moderately crowded scenes, even if only a small part of the target is visible. We also conduct comparison with a number of state-of-the-art trackers and demonstrate the competitiveness of the proposed method.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"28 1","pages":"877-884"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90531344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-view action recognition one camera at a time","authors":"Scott Spurlock, Richard Souvenir","doi":"10.1109/WACV.2014.6836047","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836047","url":null,"abstract":"For human action recognition methods, there is often a trade-off between classification accuracy and computational efficiency. Methods that include 3D information from multiple cameras are often computationally expensive and not suitable for real-time application. 2D, frame-based methods are generally more efficient, but suffer from lower recognition accuracies. In this paper, we present a hybrid keypose-based method that operates in a multi-camera environment, but uses only a single camera at a time. We learn, for each keypose, the relative utility of a particular viewpoint compared with switching to a different available camera in the network for future classification. On a benchmark multi-camera action recognition dataset, our method outperforms approaches that incorporate all available cameras.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"91 1","pages":"604-609"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79517106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Interactively test driving an object detector: Estimating performance on unlabeled data","authors":"Rushil Anirudh, P. Turaga","doi":"10.1109/WACV.2014.6836104","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836104","url":null,"abstract":"In this paper, we study the problem of `test-driving' a detector, i.e. allowing a human user to get a quick sense of how well the detector generalizes to their specific requirement. To this end, we present the first system that estimates detector performance interactively without extensive ground truthing using a human in the loop. We approach this as a problem of estimating proportions and show that it is possible to make accurate inferences on the proportion of classes or groups within a large data collection by observing only 5 - 10% of samples from the data. In estimating the false detections (for precision), the samples are chosen carefully such that the overall characteristics of the data collection are preserved. Next, inspired by its use in estimating disease propagation we apply pooled testing approaches to estimate missed detections (for recall) from the dataset. The estimates thus obtained are close to the ones obtained using ground truth, thus reducing the need for extensive labeling which is expensive and time consuming.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"91 1","pages":"175-182"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87849863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparison of face detection and image classification for detecting front seat passengers in vehicles","authors":"Y. Artan, P. Paul, F. Perronnin, A. Burry","doi":"10.1109/WACV.2014.6835994","DOIUrl":"https://doi.org/10.1109/WACV.2014.6835994","url":null,"abstract":"Due to the high volume of traffic on modern roadways, transportation agencies have proposed High Occupancy Vehicle (HOV) lanes and High Occupancy Tolling (HOT) lanes to promote car pooling. However, enforcement of the rules of these lanes is currently performed by roadside enforcement officers using visual observation. Manual roadside enforcement is known to be inefficient, costly, potentially dangerous, and ultimately ineffective. Violation rates up to 50%-80% have been reported, while manual enforcement rates of less than 10% are typical. Therefore, there is a need for automated vehicle occupancy detection to support HOV/HOT lane enforcement. A key component of determining vehicle occupancy is to determine whether or not the vehicle's front passenger seat is occupied. In this paper, we examine two methods of determining vehicle front seat occupancy using a near infrared (NIR) camera system pointed at the vehicle's front windshield. The first method examines a state-of-the-art deformable part model (DPM) based face detection system that is robust to facial pose. The second method examines state-of-the-art local aggregation based image classification using bag-of-visual-words (BOW) and Fisher vectors (FV). A dataset of 3000 images was collected on a public roadway and is used to perform the comparison. From these experiments it is clear that the image classification approach is superior for this problem.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"1 1","pages":"1006-1012"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74341594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Weichao Qiu, Xinggang Wang, X. Bai, A. Yuille, Z. Tu
{"title":"Scale-Space SIFT flow","authors":"Weichao Qiu, Xinggang Wang, X. Bai, A. Yuille, Z. Tu","doi":"10.1109/WACV.2014.6835734","DOIUrl":"https://doi.org/10.1109/WACV.2014.6835734","url":null,"abstract":"The state-of-the-art SIFT flow has been widely adopted for the general image matching task, especially in dealing with image pairs from similar scenes but with different object configurations. However, the way in which the dense SIFT features are computed at a fixed scale in the SIFT flow method limits its capability of dealing with scenes of large scale changes. In this paper, we propose a simple, intuitive, and very effective approach, Scale-Space SIFT flow, to deal with the large scale differences in different image locations. We introduce a scale field to the SIFT flow function to automatically explore the scale deformations. Our approach achieves similar performance as the SIFT flow method on general natural scenes but obtains significant improvement on the images with large scale differences. Compared with a recent method that addresses the similar problem, our approach shows its clear advantage being more effective, and significantly less demanding in memory and time requirement.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"4 1","pages":"1112-1119"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75073359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}