2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)最新文献_第8页

Synthetic Data Generation using Imitation Training 模拟训练合成数据生成

2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) Pub Date : 2021-10-01 DOI: 10.1109/ICCVW54120.2021.00342

Aman Kishore, T. Choe, J. Kwon, M. Park, Pengfei Hao, Akshita Mittel

引用次数: 9

A Framework for Semi-automatic Collection of Temporal Satellite Imagery for Analysis of Dynamic Regions 一种用于动态区域分析的时相卫星影像半自动采集框架

2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) Pub Date : 2021-10-01 DOI: 10.1109/ICCVW54120.2021.00084

Nicholas Kashani Motlagh, Aswathnarayan Radhakrishnan, Jim Davis, R. Ilin

{"title":"A Framework for Semi-automatic Collection of Temporal Satellite Imagery for Analysis of Dynamic Regions","authors":"Nicholas Kashani Motlagh, Aswathnarayan Radhakrishnan, Jim Davis, R. Ilin","doi":"10.1109/ICCVW54120.2021.00084","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00084","url":null,"abstract":"Analyzing natural and anthropogenic activities using re-mote sensing data has become a problem of increasing interest. However, this generally involves tediously labeling extensive imagery, perhaps on a global scale. The lack of a streamlined method to collect and label imagery over time makes it challenging to tackle these problems using popular, supervised deep learning approaches. We address this need by presenting a framework to semi-automatically collect and label dynamic regions in satellite imagery using crowd-sourced OpenStreetMap data and available satellite imagery resources. The generated labels can be quickly verified to ease the burden of full manual labeling. We leverage this framework for the ability to gather image sequences of areas that have label reclassification over time. One possible application of our framework is demonstrated to collect and classify construction vs. non-construction sites. Overall, the proposed framework can be adapted for similar change detection or classification tasks in various re-mote sensing applications.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129789080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Causal affect prediction model using a past facial image sequence 基于过去面部图像序列的因果影响预测模型

2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) Pub Date : 2021-10-01 DOI: 10.1109/ICCVW54120.2021.00395

Geesung Oh, Euiseok Jeong, Sejoon Lim

{"title":"Causal affect prediction model using a past facial image sequence","authors":"Geesung Oh, Euiseok Jeong, Sejoon Lim","doi":"10.1109/ICCVW54120.2021.00395","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00395","url":null,"abstract":"Among human affective behavior research, facial expression recognition research is improving in performance along with the development of deep learning. For improved performance, not only past images but also future images should be used along with corresponding facial images, but there are obstacles to the application of this technique to real-time environments. In this paper, we propose the causal affect prediction network (CAPNet), which uses only past facial images to predict corresponding affective valence and arousal. We train CAPNet to learn causal inference between past images and corresponding affective valence and arousal through supervised learning by pairing the sequence of past images with the current label using the Aff-Wild2 dataset. We show through experiments that the well-trained CAPNet outperforms the baseline of the second challenge of the Affective Behavior Analysis in-the-wild (ABAW2) Competition by predicting affective valence and arousal only with past facial images one-third of a second earlier. Therefore, in real-time application, CAPNet can reliably predict affective valence and arousal only with past data.The code is publicly available. 1","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129819614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

A Manifold Learning based Video Prediction approach for Deep Motion Transfer 基于流形学习的深度运动转移视频预测方法

2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) Pub Date : 2021-10-01 DOI: 10.1109/ICCVW54120.2021.00470

Yuliang Cai, S. Mohan, Adithya Niranjan, Nilesh Jain, A. Cloninger, Srinjoy Das

{"title":"A Manifold Learning based Video Prediction approach for Deep Motion Transfer","authors":"Yuliang Cai, S. Mohan, Adithya Niranjan, Nilesh Jain, A. Cloninger, Srinjoy Das","doi":"10.1109/ICCVW54120.2021.00470","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00470","url":null,"abstract":"We propose a novel manifold learning based end-to-end prediction and video synthesis framework for bandwidth reduction in motion transfer enabled applications such as video conferencing. In our workflow we use keypoint based representations of video frames where image and motion specific information are encoded in a completely unsupervised manner. Prediction of future keypoints is then performed using the manifold of a variational recurrent neural network (VRNN) following which output video frames are synthesized using an optical flow estimator and a conditional image generator in the motion transfer pipeline. The proposed architecture which combines keypoint based representation of video frames with manifold learning based prediction enables significant additional bandwidth savings over motion transfer based video conferencing systems which are implemented solely using keypoint detection. We demonstrate the superiority of our technique using two representative datasets for both video reconstruction and transfer and show that prediction using VRNN has superior performance as compared to a non manifold based technique such as RNN.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130766205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

JanusNet: Detection of Moving Objects from UAV Platforms JanusNet:从无人机平台上检测移动物体

2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) Pub Date : 2021-10-01 DOI: 10.1109/ICCVW54120.2021.00436

Yuxiang Zhao, K. Shafique, Z. Rasheed, Maoxu Li

{"title":"JanusNet: Detection of Moving Objects from UAV Platforms","authors":"Yuxiang Zhao, K. Shafique, Z. Rasheed, Maoxu Li","doi":"10.1109/ICCVW54120.2021.00436","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00436","url":null,"abstract":"In this paper, we present JanusNet, an efficient CNN model that can perform online background subtraction and robustly detect moving targets using resource-constrained computational hardware on-board unmanned aerial vehicles (UAVs). Most of the existing work on background sub-traction either assume that the camera is stationary or make limiting assumptions about the motion of the camera, the structure of the scene under observation, or the apparent motion of the background in video. JanusNet does not have these limitations and therefore, is applicable to a variety of UAV applications. JanusNet learns to extract and combine motion and appearance features to separate background and foreground to generate accurate pixel-wise masks of the moving objects. The network is trained using a simulated video dataset (generated using Unreal Engine 4) with ground-truth labels. Results on UCF Aerial and Kaggle Drone videos datasets show that the learned model transfers well to real UAV videos and can robustly detect moving targets in a wide variety of scenarios. Moreover, experiments on CDNet dataset demonstrate that even without explicitly assuming that the camera is stationary, the performance of JanusNet is comparable to traditional background subtraction methods.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130465888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Multi-Perspective Features Learning for Face Anti-Spoofing 人脸防欺骗的多视角特征学习

2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) Pub Date : 2021-10-01 DOI: 10.1109/ICCVW54120.2021.00457

Zhuming Wang, Yaowen Xu, Lifang Wu, Hu Han, Yukun Ma, Guozhang Ma

{"title":"Multi-Perspective Features Learning for Face Anti-Spoofing","authors":"Zhuming Wang, Yaowen Xu, Lifang Wu, Hu Han, Yukun Ma, Guozhang Ma","doi":"10.1109/ICCVW54120.2021.00457","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00457","url":null,"abstract":"Face anti-spoofing (FAS) is important to securing face recognition. Most of the existing methods regard FAS as a binary classification problem between bona fide (real) and spoof images, training their models from only the perspective of Real vs. Spoof. It is not beneficial for a comprehensive description of real samples and leads to degraded performance after extending attack types. In fact, the spoofing clues in various attacks can be significantly different. Furthermore, some attacks have characteristics similar to the real faces but different from other attacks. For example, both real faces and video attacks have dynamic features, and both mask attacks and real faces have depth features. In this paper, a Multi-Perspective Feature Learning Network (MPFLN) is proposed to extract representative features from the perspectives of Real + Mask vs. Photo + Video and Real + Video vs. Photo + Mask. And using these features, a binary classification network is designed to perform FAS. Experimental results show that the proposed method can effectively alleviate the above issue of the decline in the discrimination of extracted features and achieve comparable performance with state-of-the-art methods.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125294482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Leveraging Batch Normalization for Vision Transformers 利用批归一化视觉变压器

2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) Pub Date : 2021-10-01 DOI: 10.1109/ICCVW54120.2021.00050

Zhuliang Yao, Yue Cao, Yutong Lin, Ze Liu, Zheng Zhang, Han Hu

{"title":"Leveraging Batch Normalization for Vision Transformers","authors":"Zhuliang Yao, Yue Cao, Yutong Lin, Ze Liu, Zheng Zhang, Han Hu","doi":"10.1109/ICCVW54120.2021.00050","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00050","url":null,"abstract":"Transformer-based vision architectures have attracted great attention because of the strong performance over the convolutional neural networks (CNNs). Inherited from the NLP tasks, the architectures take Layer Normalization (LN) as a default normalization technique. On the other side, previous vision models, i.e., CNNs, treat Batch Normalization (BN) as a de facto standard, with the merits of faster inference than other normalization layers due to an avoidance of calculating the mean and variance statistics during inference, as well as better regularization effects during training.In this paper, we aim to introduce Batch Normalization to Transformer-based vision architectures. Our initial exploration reveals frequent crashes in model training when directly replacing all LN layers with BN, contributing to the un-normalized feed forward network (FFN) blocks. We therefore propose to add a BN layer in-between the two linear layers in the FFN block where stabilized training statistics are observed, resulting in a pure BN-based architecture. Our experiments proved that our resulting approach is as effective as the LN-based counterpart and is about 20% faster.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127059289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Emotional Features of Interactions with Empathic Agents 与共情主体互动的情绪特征

2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) Pub Date : 2021-10-01 DOI: 10.1109/ICCVW54120.2021.00246

C. Greco, Carmela Buono, G. Cordasco, Sergio Escalera, A. Esposito, Daria Kyslitska, M. Stylianou, Cristina Palmero, Jofre Tenorio Laranga, Anna Torp Johansen, M. I. Torres

{"title":"Emotional Features of Interactions with Empathic Agents","authors":"C. Greco, Carmela Buono, G. Cordasco, Sergio Escalera, A. Esposito, Daria Kyslitska, M. Stylianou, Cristina Palmero, Jofre Tenorio Laranga, Anna Torp Johansen, M. I. Torres","doi":"10.1109/ICCVW54120.2021.00246","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00246","url":null,"abstract":"The current study is part of the EMPATHIC project, whose aim is to develop an Empathic Virtual Coach (VC) capable of promoting healthy and independent aging. To this end, the VC needs to be capable of perceiving the emotional states of users and adjusting its behaviour during the interactions according to what the users are experiencing in terms of emotions and comfort. Thus, the present work focuses on some sessions where elderly users of three different countries interact with a simulated system. Audio and video information extracted from these sessions were examined by external observers to assess participants’ emotional experience with the EMPATHIC-VC in terms of categorical and dimensional assessment of emotions. Analyses were conducted on the emotional labels assigned by the external observers while participants were engaged in two different scenarios: a generic one, where the interaction was carried out with no intention to discuss a specific topic, and a nutrition one, aimed to accomplish a conversation on users’ nutritional habits. Results of analyses performed on both audio and video data revealed that the EMPATHIC coach did not elicit negative feelings in the users. Indeed, users from all countries have shown relaxed and positive behavior when interacting with the simulated VC during both scenarios. Overall, the EMPATHIC-VC was capable to offer an enjoyable experience without eliciting negative feelings in the users. This supports the hypothesis that an Empathic Virtual Coach capable of considering users’ expectations and emotional states could support elderly people in daily life activities and help them to remain independent.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"2017 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127541750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Object Detection in Cluttered Environments with Sparse Keypoint Selection 基于稀疏关键点选择的杂乱环境中的目标检测

2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) Pub Date : 2021-10-01 DOI: 10.1109/ICCVW54120.2021.00282

Viktor Seib, D. Paulus

引用次数: 0

Simple baselines can fool 360° saliency metrics 简单的基线可以骗过360°显著性指标

2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) Pub Date : 2021-10-01 DOI: 10.1109/ICCVW54120.2021.00418

Yasser Abdelaziz Dahou Djilali, Kevin McGuinness, N. O’Connor

引用次数: 4