2021 IEEE Winter Conference on Applications of Computer Vision (WACV)最新文献_第9页

DB-GAN: Boosting Object Recognition Under Strong Lighting Conditions DB-GAN:增强强光条件下的物体识别

2021 IEEE Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2021-01-01 DOI: 10.1109/WACV48630.2021.00298

Luca Minciullo, Fabian Manhardt, Federico Tombari

{"title":"DB-GAN: Boosting Object Recognition Under Strong Lighting Conditions","authors":"Luca Minciullo, Fabian Manhardt, Federico Tombari","doi":"10.1109/WACV48630.2021.00298","DOIUrl":"https://doi.org/10.1109/WACV48630.2021.00298","url":null,"abstract":"Driven by deep learning, object recognition has recently made a tremendous leap forward. Nonetheless, its accuracy often still suffers from several sources of variation that can be found in real-world images. Some of the most challenging variations are induced by changing lighting conditions. This paper presents a novel approach for tackling brightness variation in the domain of 2D object detection and 6D object pose estimation. Existing works aiming at improving robustness towards different lighting conditions are often grounded on classical computer vision contrast normalisation techniques or the acquisition of large amounts of annotated data in order to achieve invariance during training. While the former cannot generalise well to a wide range of illumination conditions, the latter is neither practical nor scalable. Hence, We propose the usage of Generative Adversarial Networks in order to learn how to normalise the illumination of an input image. Thereby, the generator is explicitly designed to normalise illumination in images so to enhance the object recognition performance. Extensive evaluations demonstrate that leveraging the generated data can significantly enhance the detection performance, outperforming all other state-of-the-art methods. We further constitute a natural extension focusing on white balance variations and introduce a new dataset for evaluation.","PeriodicalId":236300,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125196373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Embedded Dense Camera Trajectories in Multi-Video Image Mosaics by Geodesic Interpolation-based Reintegration 基于测地线插值的多视频图像拼接中嵌入密集摄像机轨迹

2021 IEEE Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2021-01-01 DOI: 10.1109/WACV48630.2021.00189

Lars Haalck, B. Risse

{"title":"Embedded Dense Camera Trajectories in Multi-Video Image Mosaics by Geodesic Interpolation-based Reintegration","authors":"Lars Haalck, B. Risse","doi":"10.1109/WACV48630.2021.00189","DOIUrl":"https://doi.org/10.1109/WACV48630.2021.00189","url":null,"abstract":"Dense registrations of huge image sets are still challenging due to exhaustive matchings and computationally expensive optimisations. Moreover, the resultant image mosaics often suffer from structural errors such as drift. Here, we propose a novel algorithm to generate global large-scale registrations from thousands of images extracted from multiple videos to derive high-resolution image mosaics which include full frame rate camera trajectories. Our algorithm does not require any initialisations and ensures the effective integration of all available image data by combining efficient and highly parallelised key-frame and loop-closure mechanisms with a novel geodesic interpolation-based reintegration strategy. As a consequence, global refinement can be done in a fraction of iterations compared to traditional optimisation strategies, while effectively avoiding drift and convergence towards inappropriate solutions. We compared our registration strategy with state-of-the-art algorithms and quantitative evaluations revealed millimetre spatial and high angular accuracy. Applicability is demonstrated by registering more than 110,000 frames from multiple scan recordings and provide dense camera trajectories in a globally referenced coordinate system as used for drone-based mappings, ecological studies, object tracking and land surveys.","PeriodicalId":236300,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"125 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125472564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Weakly Supervised Deep Reinforcement Learning for Video Summarization With Semantically Meaningful Reward 基于语义有意义奖励的视频摘要弱监督深度强化学习

2021 IEEE Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2021-01-01 DOI: 10.1109/WACV48630.2021.00328

Zu-Hua Li, Lei Yang

{"title":"Weakly Supervised Deep Reinforcement Learning for Video Summarization With Semantically Meaningful Reward","authors":"Zu-Hua Li, Lei Yang","doi":"10.1109/WACV48630.2021.00328","DOIUrl":"https://doi.org/10.1109/WACV48630.2021.00328","url":null,"abstract":"Conventional unsupervised video summarization algorithms are usually developed in a frame level clustering manner For example, frame level diversity and representativeness are two typical clustering criteria used for unsupervised reinforcement learning-based video summarization. Inspired by recent progress in video representation techniques, we further introduce the similarity of video representations to construct a semantically meaningful reward for this task. We consider that a good summarization should also be semantically identical to its original source, which means that the semantic similarity can be regarded as an additional criterion for summarization. Through combining a novel video semantic reward with other unsupervised rewards for training, we can easily upgrade an unsupervised reinforcement learning-based video summarization method to its weakly supervised version. In practice, we first train a video classification sub-network (VCSN) to extract video semantic representations based on a category-labeled video dataset. Then we fix this VCSN and train a summary generation sub-network (SGSN) using unlabeled video data in a reinforcement learning way. Experimental results demonstrate that our work significantly surpasses other unsupervised and even supervised methods. To the best of our knowledge, our method achieves state-of-the-art performance in terms of the correlation coefficients, Kendall’s and Spearman’s .","PeriodicalId":236300,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125549892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Hierarchical Generative Adversarial Networks for Single Image Super-Resolution 单幅图像超分辨率的分层生成对抗网络

2021 IEEE Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2021-01-01 DOI: 10.1109/WACV48630.2021.00040

Weimin Chen, Yuqing Ma, Xianglong Liu, Yijia Yuan

{"title":"Hierarchical Generative Adversarial Networks for Single Image Super-Resolution","authors":"Weimin Chen, Yuqing Ma, Xianglong Liu, Yijia Yuan","doi":"10.1109/WACV48630.2021.00040","DOIUrl":"https://doi.org/10.1109/WACV48630.2021.00040","url":null,"abstract":"Recently, deep convolutional neural network (CNN) have achieved promising performance for single image super-resolution (SISR). However, they usually extract features on a single scale and lack sufficient supervision information, leading to undesired artifacts and unpleasant noise in super-resolution (SR) images. To address this problem, we first propose a hierarchical feature extraction module (HFEM) to extract the features in multiple scales, which helps concentrate on both local textures and global semantics. Then, a hierarchical guided reconstruction module (HGRM) is introduced to reconstruct more natural structural textures in SR images via intermediate supervisions in a progressive manner. Finally, we integrate HFEM and HGRM in a simple yet efficient end-to-end framework named hierarchical generative adversarial networks (HSR-GAN) to recover consistent details, and thus obtain the semantically reasonable and visually realistic results. Extensive experiments on five common datasets demonstrate that our method shows favorable visual quality and superior quantitative performance compared to state-of-the-art methods for SISR.","PeriodicalId":236300,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127167699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Breaking Shortcuts by Masking for Robust Visual Reasoning 通过屏蔽打破快捷键实现稳健的视觉推理

2021 IEEE Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2021-01-01 DOI: 10.1109/WACV48630.2021.00356

Keren Ye, Mingda Zhang, Adriana Kovashka

{"title":"Breaking Shortcuts by Masking for Robust Visual Reasoning","authors":"Keren Ye, Mingda Zhang, Adriana Kovashka","doi":"10.1109/WACV48630.2021.00356","DOIUrl":"https://doi.org/10.1109/WACV48630.2021.00356","url":null,"abstract":"Visual reasoning is a challenging but important task that is gaining momentum. Examples include reasoning about what will happen next in film, or interpreting what actions an image advertisement prompts. Both tasks are \"puzzles\" which invite the viewer to combine knowledge from prior experience, to find the answer. Intuitively, providing external knowledge to a model should be helpful, but it does not necessarily result in improved reasoning ability. An algorithm can learn to find answers to the prediction task yet not perform generalizable reasoning. In other words, models can leverage \"shortcuts\" between inputs and desired outputs, to bypass the need for reasoning. We develop a technique to effectively incorporate external knowledge, in a way that is both interpretable, and boosts the contribution of external knowledge for multiple complementary metrics. In particular, we mask evidence in the image and in retrieved external knowledge. We show this masking successfully focuses the method’s attention on patterns that generalize. To properly understand how our method utilizes external knowledge, we propose a novel side evaluation task. We find that with our masking technique, the model can learn to select useful knowledge pieces to rely on.1","PeriodicalId":236300,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129569428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

2D to 3D Medical Image Colorization 2D到3D医学图像着色

2021 IEEE Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2021-01-01 DOI: 10.1109/WACV48630.2021.00289

Aradhya Neeraj Mathur, Apoorv Khattar, Ojaswa Sharma

引用次数: 3

Vid2Int: Detecting Implicit Intention from Long Dialog Videos Vid2Int:从长对话视频中检测隐含意图

2021 IEEE Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2021-01-01 DOI: 10.1109/WACV48630.2021.00334

Xiaoli Xu, Yao Lu, Zhiwu Lu, T. Xiang

{"title":"Vid2Int: Detecting Implicit Intention from Long Dialog Videos","authors":"Xiaoli Xu, Yao Lu, Zhiwu Lu, T. Xiang","doi":"10.1109/WACV48630.2021.00334","DOIUrl":"https://doi.org/10.1109/WACV48630.2021.00334","url":null,"abstract":"Detecting subtle intention such as deception and subtext of a person in a long dialog video, or implicit intention detection (IID), is a challenging problem. The transcript (textual cues) often reveals little, so audio-visual cues including voice tone as well as facial and body behaviour are the main focuses for automated IID. Contextual cues are also crucial, since a person’s implicit intentions are often correlated and context-dependent when the person moves from one question-answer pair to the next. However, no such dataset exists which contains fine-grained questionanswer pair (video segment) level annotation. The first contribution of this work is thus a new benchmark dataset, called Vid2Int-Deception to fill this gap. A novel multigrain representation model is also proposed to capture the subtle movement changes of eyes, face, and body (relevant for inferring intention) from a long dialog video. Moreover, to model the temporal correlation between the implicit intentions across video segments, we propose a Videoto-Intention network (Vid2Int) based on attentive recurrent neural network (RNN). Extensive experiments show that our model achieves state-of-the-art results.","PeriodicalId":236300,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120989701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Transductive Zero-Shot Learning by Decoupled Feature Generation 解耦特征生成的换能性零射学习

2021 IEEE Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2021-01-01 DOI: 10.1109/WACV48630.2021.00315

Federico Marmoreo, Jacopo Cavazza, Vittorio Murino

{"title":"Transductive Zero-Shot Learning by Decoupled Feature Generation","authors":"Federico Marmoreo, Jacopo Cavazza, Vittorio Murino","doi":"10.1109/WACV48630.2021.00315","DOIUrl":"https://doi.org/10.1109/WACV48630.2021.00315","url":null,"abstract":"In this paper, we address zero-shot learning (ZSL), the problem of recognizing categories for which no labeled visual data are available during training. We focus on the transductive setting, in which unlabelled visual data from unseen classes is available. State-of-the-art paradigms in ZSL typically exploit generative adversarial networks to synthesize visual features from semantic attributes. We posit that the main limitation of these approaches is to adopt a single model to face two problems: 1) generating realistic visual features, and 2) translating semantic attributes into visual cues. Differently, we propose to decouple such tasks, solving them separately. In particular, we train an unconditional generator to solely capture the complexity of the distribution of visual data and we subsequently pair it with a conditional generator devoted to enrich the prior knowledge of the data distribution with the semantic content of the class embeddings. We present a detailed ablation study to dissect the effect of our proposed decoupling approach, while demonstrating its superiority over the related state-of-the-art.","PeriodicalId":236300,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"633 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115113677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Focus and retain: Complement the Broken Pose in Human Image Synthesis 聚焦和保留:补充人体图像合成中的破碎姿势

2021 IEEE Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2021-01-01 DOI: 10.1109/WACV48630.2021.00341

Pu Ge, Qiushi Huang, Wei Xiang, Xue Jing, Yule Li, Yiyong Li, Zhun Sun

{"title":"Focus and retain: Complement the Broken Pose in Human Image Synthesis","authors":"Pu Ge, Qiushi Huang, Wei Xiang, Xue Jing, Yule Li, Yiyong Li, Zhun Sun","doi":"10.1109/WACV48630.2021.00341","DOIUrl":"https://doi.org/10.1109/WACV48630.2021.00341","url":null,"abstract":"Given a target pose, how to generate an image of a specific style with that target pose remains an ill-posed and thus complicated problem. Most recent works treat the human pose synthesis tasks as an image spatial transformation problem using flow warping techniques. However, we observe that, due to the inherent ill-posed nature of many complicated human poses, former methods fail to generate body parts. To tackle this problem, we propose a feature-level flow attention module and an Enhancer Network. The flow attention module produces a flow attention mask to guide the combination of the flow-warped features and the structural pose features. Then, we apply the Enhancer Network to re-fine the coarse image by injecting the pose information. We present our experimental evaluation both qualitatively and quantitatively on DeepFashion, Market-1501, and Youtube dance datasets. Quantitative results show that our method has 12.995 FID at DeepFashion, 25.459 FID at Market-1501, 14.516 FID at Youtube dance datasets, which outperforms some state-of-the-arts including Guide-Pixe2Pixe, Global-Flow-Local-Attn, and CocosNet.","PeriodicalId":236300,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122646443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Novel View Synthesis via Depth-guided Skip Connections 通过深度引导跳跃连接的新型视图合成

2021 IEEE Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2021-01-01 DOI: 10.1109/WACV48630.2021.00316

Yuxin Hou, A. Solin, Juho Kannala

引用次数: 7