2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)最新文献

筛选
英文 中文
Synthetic Data Generation using Imitation Training 模拟训练合成数据生成
2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) Pub Date : 2021-10-01 DOI: 10.1109/ICCVW54120.2021.00342
Aman Kishore, T. Choe, J. Kwon, M. Park, Pengfei Hao, Akshita Mittel
{"title":"Synthetic Data Generation using Imitation Training","authors":"Aman Kishore, T. Choe, J. Kwon, M. Park, Pengfei Hao, Akshita Mittel","doi":"10.1109/ICCVW54120.2021.00342","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00342","url":null,"abstract":"We propose a strategic approach to generate synthetic data in order to improve machine learning algorithms such as Deep Neural Networks (DNN). Utilization of synthetic data has shown promising results yet there are no specific rules or recipes on how to generate and cook synthetic data. We propose imitation training as a guideline of synthetic data generation to add more underrepresented entities and balance the data distribution for DNN to handle corner cases and resolve long tail problems. The proposed imitation training has a circular process with three main steps: First, the existing system is evaluated and failure cases such as false positive and false negative detections are sorted out; Secondly, synthetic data imitating such failure cases is created with domain randomization; Thirdly, we train a net-work with the existing data and the newly added synthetic data; We repeat these three steps until the evaluation metric converges. We validated the approach by experimenting on object detection in autonomous driving.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121708864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
A Framework for Semi-automatic Collection of Temporal Satellite Imagery for Analysis of Dynamic Regions 一种用于动态区域分析的时相卫星影像半自动采集框架
2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) Pub Date : 2021-10-01 DOI: 10.1109/ICCVW54120.2021.00084
Nicholas Kashani Motlagh, Aswathnarayan Radhakrishnan, Jim Davis, R. Ilin
{"title":"A Framework for Semi-automatic Collection of Temporal Satellite Imagery for Analysis of Dynamic Regions","authors":"Nicholas Kashani Motlagh, Aswathnarayan Radhakrishnan, Jim Davis, R. Ilin","doi":"10.1109/ICCVW54120.2021.00084","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00084","url":null,"abstract":"Analyzing natural and anthropogenic activities using re-mote sensing data has become a problem of increasing interest. However, this generally involves tediously labeling extensive imagery, perhaps on a global scale. The lack of a streamlined method to collect and label imagery over time makes it challenging to tackle these problems using popular, supervised deep learning approaches. We address this need by presenting a framework to semi-automatically collect and label dynamic regions in satellite imagery using crowd-sourced OpenStreetMap data and available satellite imagery resources. The generated labels can be quickly verified to ease the burden of full manual labeling. We leverage this framework for the ability to gather image sequences of areas that have label reclassification over time. One possible application of our framework is demonstrated to collect and classify construction vs. non-construction sites. Overall, the proposed framework can be adapted for similar change detection or classification tasks in various re-mote sensing applications.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129789080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Causal affect prediction model using a past facial image sequence 基于过去面部图像序列的因果影响预测模型
2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) Pub Date : 2021-10-01 DOI: 10.1109/ICCVW54120.2021.00395
Geesung Oh, Euiseok Jeong, Sejoon Lim
{"title":"Causal affect prediction model using a past facial image sequence","authors":"Geesung Oh, Euiseok Jeong, Sejoon Lim","doi":"10.1109/ICCVW54120.2021.00395","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00395","url":null,"abstract":"Among human affective behavior research, facial expression recognition research is improving in performance along with the development of deep learning. For improved performance, not only past images but also future images should be used along with corresponding facial images, but there are obstacles to the application of this technique to real-time environments. In this paper, we propose the causal affect prediction network (CAPNet), which uses only past facial images to predict corresponding affective valence and arousal. We train CAPNet to learn causal inference between past images and corresponding affective valence and arousal through supervised learning by pairing the sequence of past images with the current label using the Aff-Wild2 dataset. We show through experiments that the well-trained CAPNet outperforms the baseline of the second challenge of the Affective Behavior Analysis in-the-wild (ABAW2) Competition by predicting affective valence and arousal only with past facial images one-third of a second earlier. Therefore, in real-time application, CAPNet can reliably predict affective valence and arousal only with past data.The code is publicly available. 1","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129819614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
A Manifold Learning based Video Prediction approach for Deep Motion Transfer 基于流形学习的深度运动转移视频预测方法
2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) Pub Date : 2021-10-01 DOI: 10.1109/ICCVW54120.2021.00470
Yuliang Cai, S. Mohan, Adithya Niranjan, Nilesh Jain, A. Cloninger, Srinjoy Das
{"title":"A Manifold Learning based Video Prediction approach for Deep Motion Transfer","authors":"Yuliang Cai, S. Mohan, Adithya Niranjan, Nilesh Jain, A. Cloninger, Srinjoy Das","doi":"10.1109/ICCVW54120.2021.00470","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00470","url":null,"abstract":"We propose a novel manifold learning based end-to-end prediction and video synthesis framework for bandwidth reduction in motion transfer enabled applications such as video conferencing. In our workflow we use keypoint based representations of video frames where image and motion specific information are encoded in a completely unsupervised manner. Prediction of future keypoints is then performed using the manifold of a variational recurrent neural network (VRNN) following which output video frames are synthesized using an optical flow estimator and a conditional image generator in the motion transfer pipeline. The proposed architecture which combines keypoint based representation of video frames with manifold learning based prediction enables significant additional bandwidth savings over motion transfer based video conferencing systems which are implemented solely using keypoint detection. We demonstrate the superiority of our technique using two representative datasets for both video reconstruction and transfer and show that prediction using VRNN has superior performance as compared to a non manifold based technique such as RNN.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130766205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
JanusNet: Detection of Moving Objects from UAV Platforms JanusNet:从无人机平台上检测移动物体
2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) Pub Date : 2021-10-01 DOI: 10.1109/ICCVW54120.2021.00436
Yuxiang Zhao, K. Shafique, Z. Rasheed, Maoxu Li
{"title":"JanusNet: Detection of Moving Objects from UAV Platforms","authors":"Yuxiang Zhao, K. Shafique, Z. Rasheed, Maoxu Li","doi":"10.1109/ICCVW54120.2021.00436","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00436","url":null,"abstract":"In this paper, we present JanusNet, an efficient CNN model that can perform online background subtraction and robustly detect moving targets using resource-constrained computational hardware on-board unmanned aerial vehicles (UAVs). Most of the existing work on background sub-traction either assume that the camera is stationary or make limiting assumptions about the motion of the camera, the structure of the scene under observation, or the apparent motion of the background in video. JanusNet does not have these limitations and therefore, is applicable to a variety of UAV applications. JanusNet learns to extract and combine motion and appearance features to separate background and foreground to generate accurate pixel-wise masks of the moving objects. The network is trained using a simulated video dataset (generated using Unreal Engine 4) with ground-truth labels. Results on UCF Aerial and Kaggle Drone videos datasets show that the learned model transfers well to real UAV videos and can robustly detect moving targets in a wide variety of scenarios. Moreover, experiments on CDNet dataset demonstrate that even without explicitly assuming that the camera is stationary, the performance of JanusNet is comparable to traditional background subtraction methods.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130465888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Multi-Perspective Features Learning for Face Anti-Spoofing 人脸防欺骗的多视角特征学习
2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) Pub Date : 2021-10-01 DOI: 10.1109/ICCVW54120.2021.00457
Zhuming Wang, Yaowen Xu, Lifang Wu, Hu Han, Yukun Ma, Guozhang Ma
{"title":"Multi-Perspective Features Learning for Face Anti-Spoofing","authors":"Zhuming Wang, Yaowen Xu, Lifang Wu, Hu Han, Yukun Ma, Guozhang Ma","doi":"10.1109/ICCVW54120.2021.00457","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00457","url":null,"abstract":"Face anti-spoofing (FAS) is important to securing face recognition. Most of the existing methods regard FAS as a binary classification problem between bona fide (real) and spoof images, training their models from only the perspective of Real vs. Spoof. It is not beneficial for a comprehensive description of real samples and leads to degraded performance after extending attack types. In fact, the spoofing clues in various attacks can be significantly different. Furthermore, some attacks have characteristics similar to the real faces but different from other attacks. For example, both real faces and video attacks have dynamic features, and both mask attacks and real faces have depth features. In this paper, a Multi-Perspective Feature Learning Network (MPFLN) is proposed to extract representative features from the perspectives of Real + Mask vs. Photo + Video and Real + Video vs. Photo + Mask. And using these features, a binary classification network is designed to perform FAS. Experimental results show that the proposed method can effectively alleviate the above issue of the decline in the discrimination of extracted features and achieve comparable performance with state-of-the-art methods.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125294482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Leveraging Batch Normalization for Vision Transformers 利用批归一化视觉变压器
2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) Pub Date : 2021-10-01 DOI: 10.1109/ICCVW54120.2021.00050
Zhuliang Yao, Yue Cao, Yutong Lin, Ze Liu, Zheng Zhang, Han Hu
{"title":"Leveraging Batch Normalization for Vision Transformers","authors":"Zhuliang Yao, Yue Cao, Yutong Lin, Ze Liu, Zheng Zhang, Han Hu","doi":"10.1109/ICCVW54120.2021.00050","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00050","url":null,"abstract":"Transformer-based vision architectures have attracted great attention because of the strong performance over the convolutional neural networks (CNNs). Inherited from the NLP tasks, the architectures take Layer Normalization (LN) as a default normalization technique. On the other side, previous vision models, i.e., CNNs, treat Batch Normalization (BN) as a de facto standard, with the merits of faster inference than other normalization layers due to an avoidance of calculating the mean and variance statistics during inference, as well as better regularization effects during training.In this paper, we aim to introduce Batch Normalization to Transformer-based vision architectures. Our initial exploration reveals frequent crashes in model training when directly replacing all LN layers with BN, contributing to the un-normalized feed forward network (FFN) blocks. We therefore propose to add a BN layer in-between the two linear layers in the FFN block where stabilized training statistics are observed, resulting in a pure BN-based architecture. Our experiments proved that our resulting approach is as effective as the LN-based counterpart and is about 20% faster.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127059289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Emotional Features of Interactions with Empathic Agents 与共情主体互动的情绪特征
2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) Pub Date : 2021-10-01 DOI: 10.1109/ICCVW54120.2021.00246
C. Greco, Carmela Buono, G. Cordasco, Sergio Escalera, A. Esposito, Daria Kyslitska, M. Stylianou, Cristina Palmero, Jofre Tenorio Laranga, Anna Torp Johansen, M. I. Torres
{"title":"Emotional Features of Interactions with Empathic Agents","authors":"C. Greco, Carmela Buono, G. Cordasco, Sergio Escalera, A. Esposito, Daria Kyslitska, M. Stylianou, Cristina Palmero, Jofre Tenorio Laranga, Anna Torp Johansen, M. I. Torres","doi":"10.1109/ICCVW54120.2021.00246","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00246","url":null,"abstract":"The current study is part of the EMPATHIC project, whose aim is to develop an Empathic Virtual Coach (VC) capable of promoting healthy and independent aging. To this end, the VC needs to be capable of perceiving the emotional states of users and adjusting its behaviour during the interactions according to what the users are experiencing in terms of emotions and comfort. Thus, the present work focuses on some sessions where elderly users of three different countries interact with a simulated system. Audio and video information extracted from these sessions were examined by external observers to assess participants’ emotional experience with the EMPATHIC-VC in terms of categorical and dimensional assessment of emotions. Analyses were conducted on the emotional labels assigned by the external observers while participants were engaged in two different scenarios: a generic one, where the interaction was carried out with no intention to discuss a specific topic, and a nutrition one, aimed to accomplish a conversation on users’ nutritional habits. Results of analyses performed on both audio and video data revealed that the EMPATHIC coach did not elicit negative feelings in the users. Indeed, users from all countries have shown relaxed and positive behavior when interacting with the simulated VC during both scenarios. Overall, the EMPATHIC-VC was capable to offer an enjoyable experience without eliciting negative feelings in the users. This supports the hypothesis that an Empathic Virtual Coach capable of considering users’ expectations and emotional states could support elderly people in daily life activities and help them to remain independent.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"2017 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127541750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Object Detection in Cluttered Environments with Sparse Keypoint Selection 基于稀疏关键点选择的杂乱环境中的目标检测
2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) Pub Date : 2021-10-01 DOI: 10.1109/ICCVW54120.2021.00282
Viktor Seib, D. Paulus
{"title":"Object Detection in Cluttered Environments with Sparse Keypoint Selection","authors":"Viktor Seib, D. Paulus","doi":"10.1109/ICCVW54120.2021.00282","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00282","url":null,"abstract":"In cases such as mobile robotic applications with limited computational resources, traditional approaches might be preferred over neural networks. However, open source solutions using traditional computer vision are harder to find than neural network implementations. In this work we address the task of object detection in cluttered environments in point clouds from RGB-D cameras. We compare several open source implementation available in the Point Cloud Library and present a novel and superior solution for this task. We further propose a novel sparse key-point selection approach that combines the advantages of uniform sampling and a dedicated keypoint detection algorithm. Our extensive evaluation shows the validity of our approach, which also improves the results of the compared methods. All code is available on our project repository: https://github.com/vseib/point-cloud-donkey.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"6 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113958667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Simple baselines can fool 360° saliency metrics 简单的基线可以骗过360°显著性指标
2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) Pub Date : 2021-10-01 DOI: 10.1109/ICCVW54120.2021.00418
Yasser Abdelaziz Dahou Djilali, Kevin McGuinness, N. O’Connor
{"title":"Simple baselines can fool 360° saliency metrics","authors":"Yasser Abdelaziz Dahou Djilali, Kevin McGuinness, N. O’Connor","doi":"10.1109/ICCVW54120.2021.00418","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00418","url":null,"abstract":"Evaluating a model’s capacity to predict human fixations in 360° scenes is a challenging task. 360° saliency requires different assumptions compared to 2D as a result of the way the saliency maps are collected and pre-processed to account for the difference in statistical bias (Equator vs Center bias). However, the same classical metrics from the 2D saliency literature are typically used to evaluate 360° models. In this paper, we show that a simple constant predictor, i.e. the average map across Salient360 and Sitzman training sets can fool existing metrics and achieve results on par with specialized models. Thus, we propose a new probabilistic metric based on the independent Bernoullis assumption that is more suited to the 360° saliency task.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"217 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113968916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信