{"title":"Facial Expression Neutralization With StoicNet","authors":"W. Carver, Ifeoma Nwogu","doi":"10.1109/WACVW52041.2021.00026","DOIUrl":"https://doi.org/10.1109/WACVW52041.2021.00026","url":null,"abstract":"Expression neutralization is the process of synthetically altering an image of a face so as to remove any facial expression from it without changing the face’s identity. Facial expression neutralization could have a variety of applications, particularly in the realms of facial recognition, in action unit analysis, or even improving the quality of identification pictures for various types of documents. Our proposed model, StoicNet, combines the robust encoding capacity of variational autoencoders, the generative power of generative adversarial networks, and the enhancing capabilities of super resolution networks with a learned encoding transformation to achieve compelling expression neutralization, while preserving the identity of the input face. Objective experiments demonstrate that StoicNet successfully generates realistic, identity-preserved faces with neutral expressions, regardless of the emotion or expression intensity of the input face.","PeriodicalId":313062,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116977076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Florence Carton, David Filliat, Jaonary Rabarisoa, Q. Pham
{"title":"Using Semantic Information to Improve Generalization of Reinforcement Learning Policies for Autonomous Driving","authors":"Florence Carton, David Filliat, Jaonary Rabarisoa, Q. Pham","doi":"10.1109/WACVW52041.2021.00020","DOIUrl":"https://doi.org/10.1109/WACVW52041.2021.00020","url":null,"abstract":"The problem of generalization of reinforcement learning policies to new environments is seldom addressed but essential in practical applications. We focus on this problem in an autonomous driving context using the CARLA simulator and first show that semantic information is the key to a good generalization for this task. We then explore and compare different ways to exploit semantic information at training time in order to improve generalization in an unseen environment without fine-tuning, showing that using semantic segmentation as an auxiliary task is the most efficient approach.","PeriodicalId":313062,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134427620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reliability of GAN Generated Data to Train and Validate Perception Systems for Autonomous Vehicles","authors":"Weihuang Xu, Nasim Souly, P. Brahma","doi":"10.1109/WACVW52041.2021.00023","DOIUrl":"https://doi.org/10.1109/WACVW52041.2021.00023","url":null,"abstract":"Autonomous systems deployed in the real world have to deal with potential problem causing situations that they have never seen during their training phases. Due to the long-tail nature of events, collecting a large amount of data for such corner cases is a difficult task. While simulation is one plausible solution, recent developments in the field of Generative Adversarial Networks (GANs) make them a promising tool to generate and augment realistic data without exhibiting a domain shift from actual real data. In this manuscript, we empirically analyze and propose novel solutions for the trust that we can place on GAN generated data for training and validation of vision-based perception modules like object detection and scenario classification.","PeriodicalId":313062,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129731050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DriveGuard: Robustification of Automated Driving Systems with Deep Spatio-Temporal Convolutional Autoencoder","authors":"A. Papachristodoulou, C. Kyrkou, T. Theocharides","doi":"10.1109/WACVW52041.2021.00016","DOIUrl":"https://doi.org/10.1109/WACVW52041.2021.00016","url":null,"abstract":"Autonomous vehicles increasingly rely on cameras to provide the input for perception and scene understanding and the ability of these models to classify their environment and objects, under adverse conditions and image noise is crucial. When the input is, either unintentionally or through targeted attacks, deteriorated, the reliability of autonomous vehicle is compromised. In order to mitigate such phenomena, we propose DriveGuard, a lightweight spatio-temporal autoencoder, as a solution to robustify the image segmentation process for autonomous vehicles. By first processing camera images with DriveGuard, we offer a more universal solution than having to re-train each perception model with noisy input. We explore the space of different autoencoder architectures and evaluate them on a diverse dataset created with real and synthetic images demonstrating that by exploiting spatio-temporal information combined with multi-component loss we significantly increase robustness against adverse image effects reaching within 5-6% of that of the original model on clean images.","PeriodicalId":313062,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134394796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Indu Joshi, R. Kothari, Ayush Utkarsh, V. Kurmi, A. Dantcheva, Sumantra Dutta Roy, P. Kalra
{"title":"Explainable Fingerprint ROI Segmentation Using Monte Carlo Dropout","authors":"Indu Joshi, R. Kothari, Ayush Utkarsh, V. Kurmi, A. Dantcheva, Sumantra Dutta Roy, P. Kalra","doi":"10.1109/WACVW52041.2021.00011","DOIUrl":"https://doi.org/10.1109/WACVW52041.2021.00011","url":null,"abstract":"A fingerprint Region of Interest (ROI) segmentation module is one of the most crucial components in the fingerprint pre-processing pipeline. It separates the foreground finger-print and background region due to which feature extraction and matching is restricted to ROI instead of entire finger-print image. However, state-of-the-art segmentation algorithms act like a black box and do not indicate model confidence. In this direction, we propose an explainable finger-print ROI segmentation model which indicates the pixels on which the model is uncertain. Towards this, we benchmark four state-of-the-art models for semantic segmentation on fingerprint ROI segmentation. Furthermore, we demonstrate the effectiveness of model uncertainty as an attention mechanism to improve the segmentation performance of the best performing model. Experiments on publicly available Fingerprint Verification Challenge (FVC) databases show-case the effectiveness of the proposed model.","PeriodicalId":313062,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125167139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Godil, Yooyoung Lee, J. Fiscus, Andrew Delgado, Eliot Godard, Baptiste Chocot, Lukas L. Diduch, Jim Golden, Jesse Zhang
{"title":"2020 Sequestered Data Evaluation for Known Activities in Extended Video: Summary and Results","authors":"A. Godil, Yooyoung Lee, J. Fiscus, Andrew Delgado, Eliot Godard, Baptiste Chocot, Lukas L. Diduch, Jim Golden, Jesse Zhang","doi":"10.1109/WACVW52041.2021.00010","DOIUrl":"https://doi.org/10.1109/WACVW52041.2021.00010","url":null,"abstract":"This paper presents a summary and results for the ActEV’20 SDL (Activities in Extended Video Sequestered Data Leaderboard) challenge that was held under the CVPR’20 ActivityNet workshop [38]. The primary goal of the challenge was to provide an impetus for advancing research and capabilities in the field of human activity detection in untrimmed multi-camera videos. Advancements in activity detection will help with a wide range of public safety applications. The challenge was administered by the National Institute of Standards and Technology (NIST), where anyone could submit their system which run on sequestered data with the resulting score posted to a public leaderboard. Ten teams submitted their systems for the ActEV’20 SDL competition on the Multiview Extended Video with Activities (MEVA) test set with 37 target activities. The performance metric for the leaderboard ranking is the partial, normalized Area Under the Detection Error Tradeoff (DET) curve (nAUDC). The top rank on activity detection was by UCF at 37%, followed by CMU at 39% and OPPO at 41%.","PeriodicalId":313062,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"7 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130850815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Feyisayo Olalere, Metehan Doyran, R. Poppe, A. A. Salah
{"title":"Geeks and guests: Estimating player’s level of experience from board game behaviors","authors":"Feyisayo Olalere, Metehan Doyran, R. Poppe, A. A. Salah","doi":"10.1109/WACVW52041.2021.00007","DOIUrl":"https://doi.org/10.1109/WACVW52041.2021.00007","url":null,"abstract":"Board games have become promising tools for observing and studying social behaviors in multi-person settings. While traditional methods such as self-report questionnaires are used to analyze game-induced behaviors, there is a growing need to automate such analyses. In this paper, we focus on estimating the levels of board game experience by analyzing a player’s confidence and anxiety from visual cues. We use a board game setting to induce relevant interactions, and investigate facial expressions during critical game events. For our analysis, we annotated the critical game events in a multiplayer cooperative board game, using the publicly available MUMBAI board game corpus. Using off-the-shelf tools, we encoded facial behavior in dyadic interactions and built classifiers to predict each player’s level of experience. Our results show that considering the experience level of both parties involved in the interaction simultaneously improves the prediction results.","PeriodicalId":313062,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"105 19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127457213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Idoia Ruiz, L. Porzi, S. R. Bulò, P. Kontschieder, J. Serrat
{"title":"Weakly Supervised Multi-Object Tracking and Segmentation","authors":"Idoia Ruiz, L. Porzi, S. R. Bulò, P. Kontschieder, J. Serrat","doi":"10.1109/WACVW52041.2021.00018","DOIUrl":"https://doi.org/10.1109/WACVW52041.2021.00018","url":null,"abstract":"We introduce the problem of weakly supervised Multi-Object Tracking and Segmentation, i.e. joint weakly supervised instance segmentation and multi-object tracking, in which we do not provide any kind of mask annotation. To address it, we design a novel synergistic training strategy by taking advantage of multi-task learning, i.e. classification and tracking tasks guide the training of the unsupervised instance segmentation. For that purpose, we extract weak foreground localization information, provided by Grad-CAM heatmaps, to generate a partial ground truth to learn from. Additionally, RGB image level information is employed to refine the mask prediction at the edges of the objects. We evaluate our method on KITTI MOTS, the most representative benchmark for this task, reducing the performance gap on the MOTSP metric between the fully supervised and weakly supervised approach to just 12% and 12.7 % for cars and pedestrians, respectively.","PeriodicalId":313062,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131104364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Explainable Attention-Guided Iris Presentation Attack Detector","authors":"Cunjian Chen, A. Ross","doi":"10.1109/WACVW52041.2021.00015","DOIUrl":"https://doi.org/10.1109/WACVW52041.2021.00015","url":null,"abstract":"Convolutional Neural Networks (CNNs) are being increasingly used to address the problem of iris presentation attack detection. In this work, we propose an explainable attention-guided iris presentation attack detector (AG-PAD) to augment CNNs with attention mechanisms and to provide visual explanations of model predictions. Two types of attention modules are independently placed on top of the last convolutional layer of the backbone network. Specifically, the channel attention module is used to model the inter-channel relationship between features, while the position attention module is used to model inter-spatial relationship between features. An element-wise sum is employed to fuse these two attention modules. Further, a novel hierarchical attention mechanism is introduced. Experiments involving both a JHU-APL proprietary dataset and the benchmark LivDet-Iris-2017 dataset suggest that the proposed method achieves promising detection results while explaining occurrences of salient regions for discriminative feature learning. To the best of our knowledge, this is the first work that exploits the use of attention mechanisms in iris presentation attack detection.","PeriodicalId":313062,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127119777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-Scale Voxel Class Balanced ASPP for LIDAR Pointcloud Semantic Segmentation","authors":"K. Kumar, S. Al-Stouhi","doi":"10.1109/WACVW52041.2021.00017","DOIUrl":"https://doi.org/10.1109/WACVW52041.2021.00017","url":null,"abstract":"This paper explores efficient techniques to improve PolarNet model performance to address the real-time semantic segmentation of LiDAR point clouds. The core framework consists of an encoder network, Atrous spatial pyramid pooling (ASPP)/Dense Atrous spatial pyramid pooling (DenseASPP) followed by a decoder network. Encoder extracts multi-scale voxel information in a top-down manner while decoder fuses multiple feature maps from various scales in a bottom-up manner. In between encoder and decoder block, an ASPP/DenseASPP block is inserted to enlarge receptive fields in a very dense manner. In contrast to PolarNet model, we use weighted cross entropy in conjunction with Lovasz-softmax loss to improve segmentation accuracy. Also this paper accelerates training mechanism of PolarNet model by incorporating learning-rate schedulers in conjunction with Adam optimizer for faster convergence with fewer epochs without degrading accuracy. Extensive experiments conducted on challenging SemanticKITTI dataset shows that our high-resolution-grid model obtains competitive state-of-art result of 60.6 mIOU @21fps whereas our low-resolution-grid model obtains 54.01 mIOU @35fps thereby balancing accuracy/speed trade-off.","PeriodicalId":313062,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129103521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}