{"title":"Deep learning the dynamic appearance and shape of facial action units","authors":"S. Jaiswal, M. Valstar","doi":"10.1109/WACV.2016.7477625","DOIUrl":"https://doi.org/10.1109/WACV.2016.7477625","url":null,"abstract":"Spontaneous facial expression recognition under uncontrolled conditions is a hard task. It depends on multiple factors including shape, appearance and dynamics of the facial features, all of which are adversely affected by environmental noise and low intensity signals typical of such conditions. In this work, we present a novel approach to Facial Action Unit detection using a combination of Convolutional and Bi-directional Long Short-Term Memory Neural Networks (CNN-BLSTM), which jointly learns shape, appearance and dynamics in a deep learning manner. In addition, we introduce a novel way to encode shape features using binary image masks computed from the locations of facial landmarks. We show that the combination of dynamic CNN features and Bi-directional Long Short-Term Memory excels at modelling the temporal information. We thoroughly evaluate the contributions of each component in our system and show that it achieves state-of-the-art performance on the FERA-2015 Challenge dataset.","PeriodicalId":124363,"journal":{"name":"2016 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128910832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Region graph based method for multi-object detection and tracking using depth cameras","authors":"Sachin Mehta, B. Prabhakaran","doi":"10.1109/WACV.2016.7477568","DOIUrl":"https://doi.org/10.1109/WACV.2016.7477568","url":null,"abstract":"In this paper, we propose a multi-object detection and tracking method using depth cameras. Depth maps are very noisy and obscure in object detection. We first propose a region-based method to suppress high magnitude noise which cannot be filtered using spatial filters. Second, the proposed method detect Region of Interests by temporal learning which are then tracked using weighted graph-based approach. We demonstrate the performance of the proposed method on standard depth camera datasets with and without object occlusions. Experimental results show that the proposed, method is able to suppress high magnitude noise in depth maps and detect/track the objects (with and without occlusion).","PeriodicalId":124363,"journal":{"name":"2016 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129555365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Pose tracking by efficiently exploiting global features","authors":"Ratnesh Kumar, Dhruv Batra","doi":"10.1109/WACV.2016.7477563","DOIUrl":"https://doi.org/10.1109/WACV.2016.7477563","url":null,"abstract":"Typical pose tracking algorithms first obtain a set of plausible pose hypotheses in all image frames of a video and subsequently stitch compatible detections across time to form a pose-track. This approach to tracking is commonly termed tracking-by-detections, and has been very successful in other areas such as multiple object tracking, video segmentation using object proposals. Often models in this category can only incorporate local spatio-temporal evidence due to exponentially increased cost when using global information. Local spatio-temporal evidence can be ambiguous, thus leading to an inferior objective modeling. To deal with ambiguities in local information it is necessary to incorporate global information over multiple frames into a model. Based on the recent advances in generating multiple solutions from a probabilistic model, we first generate multiple plausible pose-track hypotheses, and subsequently employ a mixture of local and global features to express the quality of these solutions with high fidelity. We perform extensive experiments and competitive results across varied datasets demonstrate the robustness of our approach.","PeriodicalId":124363,"journal":{"name":"2016 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123065027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Variational multi-phase segmentation using high-dimensional local features","authors":"N. Mevenkamp, B. Berkels","doi":"10.1109/WACV.2016.7477729","DOIUrl":"https://doi.org/10.1109/WACV.2016.7477729","url":null,"abstract":"We propose a novel method for multi-phase segmentation of images based on high-dimensional local feature vectors. While the method was developed for the segmentation of extremely noisy crystal images based on localized Fourier transforms, the resulting framework is not tied to specific feature descriptors. For instance, using local spectral histograms as features, it allows for robust texture segmentation. The segmentation itself is based on the multi-phase Mumford-Shah model. Initializing the high-dimensional mean features directly is computationally too demanding and ill-posed in practice. This is resolved by projecting the features onto a low-dimensional space using principle component analysis. The resulting objective functional is minimized using a convexification and the Chambolle-Pock algorithm. Numerical results are presented, illustrating that the algorithm is very competitive in texture segmentation with state-of-the-art performance on the Prague benchmark and provides new possibilities in crystal segmentation, being robust to extreme noise and requiring no prior knowledge of the crystal structure.","PeriodicalId":124363,"journal":{"name":"2016 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"10 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123174345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
W. Leow, Guodong Li, J. Lai, T. Sim, Vaishali Sharma
{"title":"Hide and seek: Uncovering facial occlusion with variable-threshold robust PCA","authors":"W. Leow, Guodong Li, J. Lai, T. Sim, Vaishali Sharma","doi":"10.1109/WACV.2016.7477579","DOIUrl":"https://doi.org/10.1109/WACV.2016.7477579","url":null,"abstract":"Face images are very important in human social activities, which can be severely hampered when they are corrupted by occluders such as eyeglasses, face marks, and scarfs. Existing methods for removing occlusions in face images can be grouped into three broad categories, namely PCA, robust PCA (RPCA), and sparse coding. The major weaknesses of these methods are inconsistent performance across test conditions and possible corruption of unoccluded part of the recovered target image. This paper presents variable-threshold RPCA (VRPCA) method based on RPCA with variable thresholding. Comprehensive tests show that VRPCA is able to preserve the unoccluded parts of the target image with practically zero error. Compared to existing methods, it is more accurate, reliable, and consistent across various test conditions.","PeriodicalId":124363,"journal":{"name":"2016 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125928724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accurate and efficient pulse measurement from facial videos on smartphones","authors":"Chong Huang, Xin Yang, K. Cheng","doi":"10.1109/WACV.2016.7477669","DOIUrl":"https://doi.org/10.1109/WACV.2016.7477669","url":null,"abstract":"Non-contact measurement of cardiac pulse signals has attracted high interests due to its convenience and cost effectiveness. However, extracting pulse signals on mobile handheld devices (e.g. smartphones) based on face videos captured by mobile cameras usually suffers from low measurement accuracy due to misalignment errors in face tracking and inevitable illumination changes in a mobile scenario, and low efficiency due to a handheld's limited computing power. We propose two techniques to address these limitations: 1) an accurate and efficient face tracking method based on an Active Shape Model (ASM) and the LDB (Local Difference Binary) feature description; 2) an adaptive temporal filtering method which can detect, and in turn denoise, sharp intensity changes in the source trace. Experimental results demonstrate that the proposed solution can achieve a speedup of 6.2X and is robust to noises in common mobile scenarios.","PeriodicalId":124363,"journal":{"name":"2016 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125970723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sebastian Stabinger, A. Rodríguez-Sánchez, J. Piater
{"title":"Monocular obstacle avoidance for blind people using probabilistic focus of expansion estimation","authors":"Sebastian Stabinger, A. Rodríguez-Sánchez, J. Piater","doi":"10.1109/WACV.2016.7477608","DOIUrl":"https://doi.org/10.1109/WACV.2016.7477608","url":null,"abstract":"Visually impaired people have a much higher chance of head injuries in daily life because of obstacles that cannot be reliably detected using conventional aids. We present part of a solution to this problem, using only one head mounted camera and optical flow techniques. As part of the system, a novel method to estimate the focus of expansion is presented, which also provides a metric for the quality of the estimate. The final result is a real time capable software system, which can detect obstacles at eye level.","PeriodicalId":124363,"journal":{"name":"2016 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"426 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123272520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Bakry, Tarek El-Gaaly, Mohamed Elhoseiny, A. Elgammal
{"title":"Joint object recognition and pose estimation using a nonlinear view-invariant latent generative model","authors":"A. Bakry, Tarek El-Gaaly, Mohamed Elhoseiny, A. Elgammal","doi":"10.1109/WACV.2016.7477655","DOIUrl":"https://doi.org/10.1109/WACV.2016.7477655","url":null,"abstract":"Object recognition and pose estimation are two fundamental problems in the field of computer vision. Recognizing objects and their poses/viewpoints are critical components of ample vision and robotic systems. Multiple viewpoints of an object lie on an intrinsic low-dimensional manifold in the input space (i.e. descriptor space). Different objects captured from the same set of viewpoints have manifolds with a common topology. In this paper we utilize this common topology between object manifolds by learning a low-dimensional latent space which non-linearly maps between a common unified manifold and the object manifold in the input space. Using a supervised embedding approach, the latent space is computed and used to jointly infer the category and pose of objects. We empirically validate our model by using multiple inference approaches and testing on multiple challenging datasets. We compare our results with the state-of-the-art and present our increased category recognition and pose estimation accuracy.","PeriodicalId":124363,"journal":{"name":"2016 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122108208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Is alice chasing or being chased?: Determining subject and object of activities in videos","authors":"Teng Zhang, Liangchen Liu, A. Wiliem, B. Lovell","doi":"10.1109/WACV.2016.7477710","DOIUrl":"https://doi.org/10.1109/WACV.2016.7477710","url":null,"abstract":"Recent progress in video description has shown promising results by combining object/action recognition and natural language processing techniques. However, even the most simplest form of the generated sentence, the SVO triplet (Subject/Verb/Object), can be misleading for its lack of role relationship analysis. When the system detects keywords \"person\", \"baby\" and \"feed\", we do not want the system to generate \"a person feeding a baby\" when the actual screen is a scene where the baby is trying to share the food. In this paper, we explore role relationships between objects/persons and their usage in generating a more meaningful video description. More specifically, we confine ourselves on the following problem: identifying subject and object roles in two-person activities. We argue that the subject and object roles have consistent properties across different activities. To that end, we cast this problem as a domain adaptation problem. A novel Youtube SVO dataset is proposed for evaluating methods developed for this problem. The performance of the proposed method is compared against several baseline methods.","PeriodicalId":124363,"journal":{"name":"2016 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121395469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multiscale fully convolutional network with application to industrial inspection","authors":"Xiao Bian, Ser-Nam Lim, Ning Zhou","doi":"10.1109/WACV.2016.7477595","DOIUrl":"https://doi.org/10.1109/WACV.2016.7477595","url":null,"abstract":"In recent years, deep learning, particularly Convolutional Neural Network (CNN), has shown great efficacy for solving various vision tasks. In image segmentation, it has been demonstrated that a CNN can greatly outperform other approaches. However, special attention has to be paid towards setting various parameters in the CNN that affects the scale of the feature map generated at the last convolutional layer, where scale here refers to the ratio of the number of pixels in the original input image that correspond to each pixel in the feature map. Quite often, the optimal settings are tied to the specific problem on hand and can be fairly challenging to determine. To overcome such an issue, this paper proposes a multiscale Fully Convolutional Network (FCN) that combines networks trained at various scales, thereby allowing for conducting segmentation more generically. Moreover, such a multiscale architecture allows for incremental fine-tuning as more training images become available later on and new networks can be trained and added to the combined network. Such flexibility has great utility in applications such as industrial inspection, where training images may not be readily available initially, but yet requires a high level of accuracy. This paper will validate our findings by reporting the results that we have obtained by applying multiscale FCN to the inspection of aircraft engine part.","PeriodicalId":124363,"journal":{"name":"2016 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128106236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}