{"title":"Trading-off Information Modalities in Zero-shot Classification","authors":"Jorge Sánchez, Matías Molina","doi":"10.1109/WACV51458.2022.00174","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00174","url":null,"abstract":"Zero-shot classification is the task of learning predictors for classes not seen during training. A practical way to deal with the lack of annotations for the target categories is to encode not only the inputs (images) but also the outputs (object classes) into a suitable representation space. We can use these representations to measure the degree at which images and categories agree by fitting a compatibility measure using the information available during training. One way to define such a measure is by a two step process in which we first project the elements of either space (visual or semantic) onto the other and then compute a similarity score in the target space. Although projections onto the visual space has shown better general performance, little attention has been paid to the degree at which the visual and semantic information contribute to the final predictions. In this paper, we build on this observation and propose two different formulations that allow us to explicitly trade-off the relative importance of the visual and semantic spaces for classification in a zero-shot setting. Our formulations are based on redefinition of the similarity scoring and loss function used to learn the projections. Experiments on six different datasets show that our approach lead to improve performance compared to similar methods. Moreover, combined with synthetic features, our approach competes favorably with the state of the art on both the standard and generalized settings.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"483 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127565300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Samuel Black, Abby Stylianou, Robert Pless, Richard Souvenir
{"title":"Visualizing Paired Image Similarity in Transformer Networks","authors":"Samuel Black, Abby Stylianou, Robert Pless, Richard Souvenir","doi":"10.1109/WACV51458.2022.00160","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00160","url":null,"abstract":"Transformer architectures have shown promise for a wide range of computer vision tasks, including image embedding. As was the case with convolutional neural networks and other models, explainability of the predictions is a key concern, but visualization approaches tend to be architecture-specific. In this paper, we introduce a new method for producing interpretable visualizations that, given a pair of images encoded with a Transformer, show which regions contributed to their similarity. Additionally, for the task of image retrieval, we compare the performance of Transformer and ResNet models of similar capacity and show that while they have similar performance in aggregate, the retrieved results and the visual explanations for those results are quite different. Code is available at https://github.com/vidarlab/xformer-paired-viz.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127212174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Amor Ben Tanfous, Aimen Zerroug, Drew A. Linsley, Thomas Serre
{"title":"How and What to Learn: Taxonomizing Self-Supervised Learning for 3D Action Recognition","authors":"Amor Ben Tanfous, Aimen Zerroug, Drew A. Linsley, Thomas Serre","doi":"10.1109/WACV51458.2022.00294","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00294","url":null,"abstract":"There are two competing standards for self-supervised learning in action recognition from 3D skeletons. Su et al., 2020 [31] used an auto-encoder architecture and an image reconstruction objective function to achieve state-of-the-art performance on the NTU60 C-View benchmark. Rao et al., 2020 [23] used Contrastive learning in the latent space to achieve state-of-the-art performance on the NTU60 C-Sub benchmark. Here, we reconcile these disparate approaches by developing a taxonomy of self-supervised learning for action recognition. We observe that leading approaches generally use one of two types of objective functions: those that seek to reconstruct the input from a latent representation (\"Attractive\" learning) versus those that also try to maximize the representations distinctiveness (\"Contrastive\" learning). Independently, leading approaches also differ in how they implement these objective functions: there are those that optimize representations in the decoder output space and those which optimize representations in the network’s latent space (encoder output). We find that combining these approaches leads to larger gains in performance and tolerance to transformation than is achievable by any individual method, leading to state-of-the-art performance on three standard action recognition datasets. We include links to our code and data.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124034734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Coupled Training for Multi-Source Domain Adaptation","authors":"Ohad Amosy, Gal Chechik","doi":"10.1109/WACV51458.2022.00114","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00114","url":null,"abstract":"Unsupervised domain adaptation is often addressed by learning a joint representation of labeled samples from a source domain and unlabeled samples from a target domain. Unfortunately, hard sharing of representation may hurt adaptation because of negative transfer, where features that are useful for source domains are learned even if they hurt inference on the target domain. Here, we propose an alternative, soft sharing scheme. We train separate but weakly-coupled models for the source and the target data, while encouraging their predictions to agree. Training the two coupled models jointly effectively exploits the distribution over unlabeled target data and achieves high accuracy on the target. Specifically, we show analytically and empirically that the decision boundaries of the target model converge to low-density \"valleys\" of the target distribution. We evaluate our approach on four multi-source domain adaptation (MSDA) benchmarks, digits, amazon text reviews, Office-Caltech and images (DomainNet). We find that it consistently outperforms current MSDA SoTA, sometimes by a very large margin.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126722370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuan Cheng, Rui Lin, Peining Zhen, Tianshu Hou, C. Ng, Hai-Bao Chen, Hao Yu, Ngai Wong
{"title":"FASSST: Fast Attention Based Single-Stage Segmentation Net for Real-Time Instance Segmentation","authors":"Yuan Cheng, Rui Lin, Peining Zhen, Tianshu Hou, C. Ng, Hai-Bao Chen, Hao Yu, Ngai Wong","doi":"10.1109/WACV51458.2022.00277","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00277","url":null,"abstract":"Real-time instance segmentation is crucial in various AI applications. This work designs a network named Fast Attention based Single-Stage Segmentation NeT (FASSST) that performs instance segmentation with video-grade speed. Using an instance attention module (IAM), FASSST quickly locates target instances and segments with region of interest (ROI) feature fusion (RFF) aggregating ROI features from pyramid mask layers. The module employs an efficient single-stage feature regression, straight from features to instance coordinates and class probabilities. Experiments on COCO and CityScapes datasets show that FASSST achieves state-of-the-art performance under competitive accuracy: real-time inference of 47.5FPS on a GTX1080Ti GPU and 5.3FPS on a Jetson Xavier NX board with only 71.6 GFLOPs.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117323515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Matching and Recovering 3D People from Multiple Views","authors":"Alejandro Pérez-Yus, Antonio Agudo","doi":"10.1109/WACV51458.2022.00125","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00125","url":null,"abstract":"This paper introduces an approach to simultaneously match and recover 3D people from multiple calibrated cameras. To this end, we present an affinity measure between 2D detections across different views that enforces an uncertainty geometric consistency. This similarity is then exploited by a novel multi-view matching algorithm to cluster the detections, being robust against partial observations as well as bad detections and without assuming any prior about the number of people in the scene. After that, the multi-view correspondences are used in order to efficiently infer the 3D pose of each body by means of a 3D pictorial structure model in combination with physico-geometric constraints. Our algorithm is thoroughly evaluated on challenging scenarios where several human bodies are performing different activities which involve complex motions, producing large occlusions in some views and noisy observations. We outperform state-of-the-art results in terms of matching and 3D reconstruction.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127900637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tejan Karmali, Abhinav Atrishi, Sai Sree Harsha, Susmit Agrawal, Varun Jampani, R. Venkatesh Babu
{"title":"LEAD: Self-Supervised Landmark Estimation by Aligning Distributions of Feature Similarity","authors":"Tejan Karmali, Abhinav Atrishi, Sai Sree Harsha, Susmit Agrawal, Varun Jampani, R. Venkatesh Babu","doi":"10.1109/WACV51458.2022.00310","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00310","url":null,"abstract":"In this work, we introduce LEAD, an approach to dis-cover landmarks from an unannotated collection of category-specific images. Existing works in self-supervised landmark detection are based on learning dense (pixel-level) feature representations from an image, which are further used to learn landmarks in a semi-supervised manner. While there have been advances in self-supervised learning of image features for instance-level tasks like classification, these methods do not ensure dense equivariant representations. The property of equivariance is of interest for dense prediction tasks like landmark estimation. In this work, we introduce an approach to enhance the learning of dense equivariant representations in a self-supervised fashion. We follow a two-stage training approach: first, we train a network using the BYOL [13] objective which operates at an instance level. The correspondences obtained through this network are further used to train a dense and compact representation of the image using a lightweight network. We show that having such a prior in the feature extractor helps in landmark detection, even under drastically limited number of annotations while also improving generalization across scale variations.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133319495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generative Adversarial Attack on Ensemble Clustering","authors":"Chetan Kumar, Deepak Kumar, Ming Shao","doi":"10.1109/WACV51458.2022.00389","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00389","url":null,"abstract":"Adversarial attack on learning tasks has attracted substantial attention in recent years; however, most existing works focus on supervised learning. Recently, research has shown that unsupervised learning, such as clustering, tends to be vulnerable due to adversarial attack. In this paper, we focus on a clustering algorithm widely used in the real-world environment, namely, ensemble clustering (EC). EC algorithms usually leverage basic partition (BP) and ensemble techniques to improve the clustering performance collaboratively. Each BP may stem from one trial of clustering, feature segment, or part of data stored on the cloud. We have observed that the attack tends to be less perceivable when only a few BPs are compromised. To explore plausible attack strategies, we propose a novel generative adversarial attack (GA2) model for EC, titled GA2EC. First, we show that not all BPs are equally important, and some of them are more vulnerable under adversarial attack. Second, we develop a generative adversarial model to mimic the attack on EC. In particular, the generative model will simulate behaviors of both clean BPs and perturbed key BPs, and their derived graphs, and thus can launch effective attacks with less attention. We have conducted extensive experiments on eleven clustering benchmarks and have demonstrated that our approach is effective in attacking EC under both transductive and inductive settings.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134153368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Durability Estimation of Bioprosthetic Heart Valves Via Motion Symmetry Analysis","authors":"M. Alizadeh, Melissa Cote, A. Albu","doi":"10.1109/WACV51458.2022.00176","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00176","url":null,"abstract":"This paper addresses bioprosthetic heart valve (BHV) durability estimation via computer vision (CV)-based analyses of the visual symmetry of valve leaflet motion. BHVs are routinely implanted in patients suffering from valvular heart diseases. Valve designs are rigorously tested using cardiovascular equipment, but once implanted, more than 50% of BHVs encounter a structural failure within 15 years. We investigate the correlation between the visual dynamic symmetry of BHV leaflets and the functional symmetry of the valves. We hypothesize that an asymmetry in the valve leaflet motion will generate an asymmetry in the flow patterns, resulting in added local stress and forces on some of the leaflets, which can accelerate the failure of the valve. We propose two different pair-wise leaflet symmetry scores based on the diagonals of orthogonal projection matrices (DOPM) and on dynamic time warping (DTW), computed from videos recorded during pulsatile flow tests. We compare the symmetry score profiles with those of fluid dynamic parameters (velocity and vorticity values) at the leaflet borders, obtained from valve-specific numerical simulations. Experiments on four cases that include three different tricuspid BHVs yielded promising results, with the DTW scores showing a good coherence with respect to the simulations. With a link between visual and functional symmetries established, this approach paves the way towards BHV durability estimation using CV techniques.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134379015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jonathan Freer, K. M. Yi, Wei Jiang, Jongwon Choi, H. Chang
{"title":"Novel-View Synthesis of Human Tourist Photos","authors":"Jonathan Freer, K. M. Yi, Wei Jiang, Jongwon Choi, H. Chang","doi":"10.1109/WACV51458.2022.00093","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00093","url":null,"abstract":"We present a novel framework for performing novel-view synthesis on human tourist photos. Given a tourist photo from a known scene, we reconstruct the photo in 3D space through modeling the human and the background independently. We generate a deep buffer from a novel viewpoint of the reconstruction and utilize a deep network to translate the buffer into a photo-realistic rendering of the novel view. We additionally present a method to relight the renderings, allowing for relighting of both human and background to match either the provided input image or any other. The key contributions of our paper are: 1) a framework for performing novel view synthesis on human tourist photos, 2) an appearance transfer method for relighting of humans to match synthesized backgrounds, and 3) a method for estimating lighting properties from a single human photo. We demonstrate the proposed framework on photos from two different scenes of various tourists.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"219 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134452992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}