2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)最新文献

筛选
英文 中文
Trading-off Information Modalities in Zero-shot Classification 零射击分类中的交换信息模式
2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00174
Jorge Sánchez, Matías Molina
{"title":"Trading-off Information Modalities in Zero-shot Classification","authors":"Jorge Sánchez, Matías Molina","doi":"10.1109/WACV51458.2022.00174","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00174","url":null,"abstract":"Zero-shot classification is the task of learning predictors for classes not seen during training. A practical way to deal with the lack of annotations for the target categories is to encode not only the inputs (images) but also the outputs (object classes) into a suitable representation space. We can use these representations to measure the degree at which images and categories agree by fitting a compatibility measure using the information available during training. One way to define such a measure is by a two step process in which we first project the elements of either space (visual or semantic) onto the other and then compute a similarity score in the target space. Although projections onto the visual space has shown better general performance, little attention has been paid to the degree at which the visual and semantic information contribute to the final predictions. In this paper, we build on this observation and propose two different formulations that allow us to explicitly trade-off the relative importance of the visual and semantic spaces for classification in a zero-shot setting. Our formulations are based on redefinition of the similarity scoring and loss function used to learn the projections. Experiments on six different datasets show that our approach lead to improve performance compared to similar methods. Moreover, combined with synthetic features, our approach competes favorably with the state of the art on both the standard and generalized settings.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"483 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127565300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Visualizing Paired Image Similarity in Transformer Networks 变压器网络中成对图像相似度的可视化
2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00160
Samuel Black, Abby Stylianou, Robert Pless, Richard Souvenir
{"title":"Visualizing Paired Image Similarity in Transformer Networks","authors":"Samuel Black, Abby Stylianou, Robert Pless, Richard Souvenir","doi":"10.1109/WACV51458.2022.00160","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00160","url":null,"abstract":"Transformer architectures have shown promise for a wide range of computer vision tasks, including image embedding. As was the case with convolutional neural networks and other models, explainability of the predictions is a key concern, but visualization approaches tend to be architecture-specific. In this paper, we introduce a new method for producing interpretable visualizations that, given a pair of images encoded with a Transformer, show which regions contributed to their similarity. Additionally, for the task of image retrieval, we compare the performance of Transformer and ResNet models of similar capacity and show that while they have similar performance in aggregate, the retrieved results and the visual explanations for those results are quite different. Code is available at https://github.com/vidarlab/xformer-paired-viz.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127212174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
How and What to Learn: Taxonomizing Self-Supervised Learning for 3D Action Recognition 如何学习和学习什么:用于3D动作识别的分类自监督学习
2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00294
Amor Ben Tanfous, Aimen Zerroug, Drew A. Linsley, Thomas Serre
{"title":"How and What to Learn: Taxonomizing Self-Supervised Learning for 3D Action Recognition","authors":"Amor Ben Tanfous, Aimen Zerroug, Drew A. Linsley, Thomas Serre","doi":"10.1109/WACV51458.2022.00294","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00294","url":null,"abstract":"There are two competing standards for self-supervised learning in action recognition from 3D skeletons. Su et al., 2020 [31] used an auto-encoder architecture and an image reconstruction objective function to achieve state-of-the-art performance on the NTU60 C-View benchmark. Rao et al., 2020 [23] used Contrastive learning in the latent space to achieve state-of-the-art performance on the NTU60 C-Sub benchmark. Here, we reconcile these disparate approaches by developing a taxonomy of self-supervised learning for action recognition. We observe that leading approaches generally use one of two types of objective functions: those that seek to reconstruct the input from a latent representation (\"Attractive\" learning) versus those that also try to maximize the representations distinctiveness (\"Contrastive\" learning). Independently, leading approaches also differ in how they implement these objective functions: there are those that optimize representations in the decoder output space and those which optimize representations in the network’s latent space (encoder output). We find that combining these approaches leads to larger gains in performance and tolerance to transformation than is achievable by any individual method, leading to state-of-the-art performance on three standard action recognition datasets. We include links to our code and data.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124034734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Coupled Training for Multi-Source Domain Adaptation 多源域自适应的耦合训练
2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00114
Ohad Amosy, Gal Chechik
{"title":"Coupled Training for Multi-Source Domain Adaptation","authors":"Ohad Amosy, Gal Chechik","doi":"10.1109/WACV51458.2022.00114","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00114","url":null,"abstract":"Unsupervised domain adaptation is often addressed by learning a joint representation of labeled samples from a source domain and unlabeled samples from a target domain. Unfortunately, hard sharing of representation may hurt adaptation because of negative transfer, where features that are useful for source domains are learned even if they hurt inference on the target domain. Here, we propose an alternative, soft sharing scheme. We train separate but weakly-coupled models for the source and the target data, while encouraging their predictions to agree. Training the two coupled models jointly effectively exploits the distribution over unlabeled target data and achieves high accuracy on the target. Specifically, we show analytically and empirically that the decision boundaries of the target model converge to low-density \"valleys\" of the target distribution. We evaluate our approach on four multi-source domain adaptation (MSDA) benchmarks, digits, amazon text reviews, Office-Caltech and images (DomainNet). We find that it consistently outperforms current MSDA SoTA, sometimes by a very large margin.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126722370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
FASSST: Fast Attention Based Single-Stage Segmentation Net for Real-Time Instance Segmentation FASSST:基于快速注意力的单阶段实时实例分割网络
2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00277
Yuan Cheng, Rui Lin, Peining Zhen, Tianshu Hou, C. Ng, Hai-Bao Chen, Hao Yu, Ngai Wong
{"title":"FASSST: Fast Attention Based Single-Stage Segmentation Net for Real-Time Instance Segmentation","authors":"Yuan Cheng, Rui Lin, Peining Zhen, Tianshu Hou, C. Ng, Hai-Bao Chen, Hao Yu, Ngai Wong","doi":"10.1109/WACV51458.2022.00277","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00277","url":null,"abstract":"Real-time instance segmentation is crucial in various AI applications. This work designs a network named Fast Attention based Single-Stage Segmentation NeT (FASSST) that performs instance segmentation with video-grade speed. Using an instance attention module (IAM), FASSST quickly locates target instances and segments with region of interest (ROI) feature fusion (RFF) aggregating ROI features from pyramid mask layers. The module employs an efficient single-stage feature regression, straight from features to instance coordinates and class probabilities. Experiments on COCO and CityScapes datasets show that FASSST achieves state-of-the-art performance under competitive accuracy: real-time inference of 47.5FPS on a GTX1080Ti GPU and 5.3FPS on a Jetson Xavier NX board with only 71.6 GFLOPs.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117323515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Matching and Recovering 3D People from Multiple Views 从多个视图中匹配和恢复3D人物
2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00125
Alejandro Pérez-Yus, Antonio Agudo
{"title":"Matching and Recovering 3D People from Multiple Views","authors":"Alejandro Pérez-Yus, Antonio Agudo","doi":"10.1109/WACV51458.2022.00125","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00125","url":null,"abstract":"This paper introduces an approach to simultaneously match and recover 3D people from multiple calibrated cameras. To this end, we present an affinity measure between 2D detections across different views that enforces an uncertainty geometric consistency. This similarity is then exploited by a novel multi-view matching algorithm to cluster the detections, being robust against partial observations as well as bad detections and without assuming any prior about the number of people in the scene. After that, the multi-view correspondences are used in order to efficiently infer the 3D pose of each body by means of a 3D pictorial structure model in combination with physico-geometric constraints. Our algorithm is thoroughly evaluated on challenging scenarios where several human bodies are performing different activities which involve complex motions, producing large occlusions in some views and noisy observations. We outperform state-of-the-art results in terms of matching and 3D reconstruction.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127900637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
LEAD: Self-Supervised Landmark Estimation by Aligning Distributions of Feature Similarity LEAD:基于特征相似度对齐分布的自监督地标估计
2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00310
Tejan Karmali, Abhinav Atrishi, Sai Sree Harsha, Susmit Agrawal, Varun Jampani, R. Venkatesh Babu
{"title":"LEAD: Self-Supervised Landmark Estimation by Aligning Distributions of Feature Similarity","authors":"Tejan Karmali, Abhinav Atrishi, Sai Sree Harsha, Susmit Agrawal, Varun Jampani, R. Venkatesh Babu","doi":"10.1109/WACV51458.2022.00310","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00310","url":null,"abstract":"In this work, we introduce LEAD, an approach to dis-cover landmarks from an unannotated collection of category-specific images. Existing works in self-supervised landmark detection are based on learning dense (pixel-level) feature representations from an image, which are further used to learn landmarks in a semi-supervised manner. While there have been advances in self-supervised learning of image features for instance-level tasks like classification, these methods do not ensure dense equivariant representations. The property of equivariance is of interest for dense prediction tasks like landmark estimation. In this work, we introduce an approach to enhance the learning of dense equivariant representations in a self-supervised fashion. We follow a two-stage training approach: first, we train a network using the BYOL [13] objective which operates at an instance level. The correspondences obtained through this network are further used to train a dense and compact representation of the image using a lightweight network. We show that having such a prior in the feature extractor helps in landmark detection, even under drastically limited number of annotations while also improving generalization across scale variations.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133319495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Generative Adversarial Attack on Ensemble Clustering 集成聚类的生成对抗攻击
2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00389
Chetan Kumar, Deepak Kumar, Ming Shao
{"title":"Generative Adversarial Attack on Ensemble Clustering","authors":"Chetan Kumar, Deepak Kumar, Ming Shao","doi":"10.1109/WACV51458.2022.00389","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00389","url":null,"abstract":"Adversarial attack on learning tasks has attracted substantial attention in recent years; however, most existing works focus on supervised learning. Recently, research has shown that unsupervised learning, such as clustering, tends to be vulnerable due to adversarial attack. In this paper, we focus on a clustering algorithm widely used in the real-world environment, namely, ensemble clustering (EC). EC algorithms usually leverage basic partition (BP) and ensemble techniques to improve the clustering performance collaboratively. Each BP may stem from one trial of clustering, feature segment, or part of data stored on the cloud. We have observed that the attack tends to be less perceivable when only a few BPs are compromised. To explore plausible attack strategies, we propose a novel generative adversarial attack (GA2) model for EC, titled GA2EC. First, we show that not all BPs are equally important, and some of them are more vulnerable under adversarial attack. Second, we develop a generative adversarial model to mimic the attack on EC. In particular, the generative model will simulate behaviors of both clean BPs and perturbed key BPs, and their derived graphs, and thus can launch effective attacks with less attention. We have conducted extensive experiments on eleven clustering benchmarks and have demonstrated that our approach is effective in attacking EC under both transductive and inductive settings.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134153368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards Durability Estimation of Bioprosthetic Heart Valves Via Motion Symmetry Analysis 基于运动对称分析的生物人工心脏瓣膜耐久性评估
2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00176
M. Alizadeh, Melissa Cote, A. Albu
{"title":"Towards Durability Estimation of Bioprosthetic Heart Valves Via Motion Symmetry Analysis","authors":"M. Alizadeh, Melissa Cote, A. Albu","doi":"10.1109/WACV51458.2022.00176","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00176","url":null,"abstract":"This paper addresses bioprosthetic heart valve (BHV) durability estimation via computer vision (CV)-based analyses of the visual symmetry of valve leaflet motion. BHVs are routinely implanted in patients suffering from valvular heart diseases. Valve designs are rigorously tested using cardiovascular equipment, but once implanted, more than 50% of BHVs encounter a structural failure within 15 years. We investigate the correlation between the visual dynamic symmetry of BHV leaflets and the functional symmetry of the valves. We hypothesize that an asymmetry in the valve leaflet motion will generate an asymmetry in the flow patterns, resulting in added local stress and forces on some of the leaflets, which can accelerate the failure of the valve. We propose two different pair-wise leaflet symmetry scores based on the diagonals of orthogonal projection matrices (DOPM) and on dynamic time warping (DTW), computed from videos recorded during pulsatile flow tests. We compare the symmetry score profiles with those of fluid dynamic parameters (velocity and vorticity values) at the leaflet borders, obtained from valve-specific numerical simulations. Experiments on four cases that include three different tricuspid BHVs yielded promising results, with the DTW scores showing a good coherence with respect to the simulations. With a link between visual and functional symmetries established, this approach paves the way towards BHV durability estimation using CV techniques.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134379015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Novel-View Synthesis of Human Tourist Photos 人类旅游照片的新视角合成
2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00093
Jonathan Freer, K. M. Yi, Wei Jiang, Jongwon Choi, H. Chang
{"title":"Novel-View Synthesis of Human Tourist Photos","authors":"Jonathan Freer, K. M. Yi, Wei Jiang, Jongwon Choi, H. Chang","doi":"10.1109/WACV51458.2022.00093","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00093","url":null,"abstract":"We present a novel framework for performing novel-view synthesis on human tourist photos. Given a tourist photo from a known scene, we reconstruct the photo in 3D space through modeling the human and the background independently. We generate a deep buffer from a novel viewpoint of the reconstruction and utilize a deep network to translate the buffer into a photo-realistic rendering of the novel view. We additionally present a method to relight the renderings, allowing for relighting of both human and background to match either the provided input image or any other. The key contributions of our paper are: 1) a framework for performing novel view synthesis on human tourist photos, 2) an appearance transfer method for relighting of humans to match synthesized backgrounds, and 3) a method for estimating lighting properties from a single human photo. We demonstrate the proposed framework on photos from two different scenes of various tourists.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"219 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134452992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信