2021 IEEE Winter Conference on Applications of Computer Vision (WACV)最新文献

筛选
英文 中文
Have Fun Storming the Castle(s)! 玩得开心攻占城堡!
2021 IEEE Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2021-01-01 DOI: 10.1109/WACV48630.2021.00375
Connor Anderson, Adam Teuscher, Elizabeth Anderson, Alysia Larsen, Josh Shirley, Ryan Farrell
{"title":"Have Fun Storming the Castle(s)!","authors":"Connor Anderson, Adam Teuscher, Elizabeth Anderson, Alysia Larsen, Josh Shirley, Ryan Farrell","doi":"10.1109/WACV48630.2021.00375","DOIUrl":"https://doi.org/10.1109/WACV48630.2021.00375","url":null,"abstract":"In recent years, large-scale datasets, each typically tailored to a particular problem, have become a critical factor towards fueling rapid progress in the field of computer vision. This paper describes a valuable new dataset that should accelerate research efforts on problems such as fine-grained classification, instance recognition and retrieval, and geolocalization. The dataset, comprised of more than 2400 individual castles, palaces and fortresses from more than 90 countries, contains more than 770K images in total. This paper details the dataset’s construction process, the characteristics including annotations such as location (geotagged latlong and country label), construction date, Google Maps link and estimated per-class and per-image difficulty. An experimental section provides baseline experiments for important vision tasks including classification, instance retrieval and geolocalization (estimating global location from an image’s visual appearance). The dataset is publicly available at vision.cs.byu.edu/castles.","PeriodicalId":236300,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115746228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
How to Make a BLT Sandwich? Learning VQA towards Understanding Web Instructional Videos 如何制作BLT三明治?学习VQA以理解网络教学视频
2021 IEEE Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2021-01-01 DOI: 10.1109/WACV48630.2021.00117
Shaojie Wang, Wentian Zhao, Ziyi Kou, Jing Shi, Chenliang Xu
{"title":"How to Make a BLT Sandwich? Learning VQA towards Understanding Web Instructional Videos","authors":"Shaojie Wang, Wentian Zhao, Ziyi Kou, Jing Shi, Chenliang Xu","doi":"10.1109/WACV48630.2021.00117","DOIUrl":"https://doi.org/10.1109/WACV48630.2021.00117","url":null,"abstract":"Understanding web instructional videos is an essential branch of video understanding in two aspects. First, most existing video methods focus on short-term actions for a-few-second-long video clips; these methods are not directly applicable to long videos. Second, unlike unconstrained long videos, e.g., movies, instructional videos are more structured in that they have step-by-step procedures constraining the understanding task. In this work, we study problem-solving on instructional videos via Visual Question Answering (VQA). Surprisingly, it has not been an emphasis for the video community despite its rich applications. We thereby introduce YouCookQA, an annotated QA dataset for instructional videos based on YouCook2 [27]. The questions in YouCookQA are not limited to cues on a single frame but relations among multiple frames in the temporal dimension. Observing the lack of effective representations for modeling long videos, we propose a set of carefully designed models including a Recurrent Graph Convolutional Network (RGCN) that captures both temporal order and relational information. Furthermore, we study multiple modalities including descriptions and transcripts for the purpose of boosting video understanding. Extensive experiments on YouCookQA suggest that RGCN performs the best in terms of QA accuracy and better performance is gained by introducing human-annotated descriptions. YouCookQA dataset is available at https://github.com/Jossome/YoucookQA.","PeriodicalId":236300,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127433719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Deflation based Fast and Robust Preconditioner for Bundle Adjustment 一种基于压缩的束调整快速鲁棒预调节器
2021 IEEE Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2021-01-01 DOI: 10.1109/WACV48630.2021.00182
Shrutimoy Das, Siddhant Katyan, Pawan Kumar
{"title":"A Deflation based Fast and Robust Preconditioner for Bundle Adjustment","authors":"Shrutimoy Das, Siddhant Katyan, Pawan Kumar","doi":"10.1109/WACV48630.2021.00182","DOIUrl":"https://doi.org/10.1109/WACV48630.2021.00182","url":null,"abstract":"The bundle adjustment (BA) problem is formulated as a non linear least squares problem which, requires the solution of a linear system. For solving this system, we present the design and implementation of a fast preconditioned solver. The proposed preconditioner is based on the deflation of the largest eigenvalues of the Hessian. We also derive an estimate of the condition number of the preconditioned system. Numerical experiments on problems from the BAL dataset [3] suggest that our solver is the fastest, sometimes, by a factor of five, when compared to the current state-of-the-art solvers for bundle adjustment.","PeriodicalId":236300,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122296964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Are These from the Same Place? Seeing the Unseen in Cross-View Image Geo-Localization 这些是从同一个地方来的吗?在交叉视点图像地理定位中看不见的东西
2021 IEEE Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2021-01-01 DOI: 10.1109/WACV48630.2021.00380
Royston Rodrigues, Masahiro Tani
{"title":"Are These from the Same Place? Seeing the Unseen in Cross-View Image Geo-Localization","authors":"Royston Rodrigues, Masahiro Tani","doi":"10.1109/WACV48630.2021.00380","DOIUrl":"https://doi.org/10.1109/WACV48630.2021.00380","url":null,"abstract":"In an era where digital maps act as gateways to exploring the world, the availability of large scale geo-tagged imagery has inspired a number of visual navigation techniques. One promising approach to visual navigation is cross-view image geo-localization. Here, the images whose location needs to be determined are matched against a database of geo-tagged aerial imagery. The methods based on this approach sought to resolve view point changes. But scenes also vary temporally, during which new landmarks might appear or existing ones might disappear. One cannot guarantee storage of aerial imagery across all time instants and hence a technique robust to temporal variation in scenes becomes of paramount importance. In this paper, we address the temporal gap between scenes by proposing a two step approach. First, we propose a semantically driven data augmentation technique that gives Siamese networks the ability to hallucinate unseen objects. Then we present the augmented samples to a multi-scale attentive embedding network to perform matching tasks. Experiments on standard benchmarks demonstrate the integration of the proposed approach with existing frameworks improves top-1 image recall rate on the CVUSA data-set from 89.84 % to 93.09 %, and from 81.03 % to 87.21 % on the CVACT data-set.","PeriodicalId":236300,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114394410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Coarse Temporal Attention Network (CTA-Net) for Driver’s Activity Recognition 基于粗糙时间注意力网络(CTA-Net)的驾驶员活动识别
2021 IEEE Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2021-01-01 DOI: 10.1109/WACV48630.2021.00132
Zachary Wharton, Ardhendu Behera, Yonghuai Liu, Nikolaos Bessis
{"title":"Coarse Temporal Attention Network (CTA-Net) for Driver’s Activity Recognition","authors":"Zachary Wharton, Ardhendu Behera, Yonghuai Liu, Nikolaos Bessis","doi":"10.1109/WACV48630.2021.00132","DOIUrl":"https://doi.org/10.1109/WACV48630.2021.00132","url":null,"abstract":"There is significant progress in recognizing traditional human activities from videos focusing on highly distinctive actions involving discriminative body movements, body-object and/or human-human interactions. Driver’s activities are different since they are executed by the same subject with similar body parts movements, resulting in subtle changes. To address this, we propose a novel framework by exploiting the spatiotemporal attention to model the subtle changes. Our model is named Coarse Temporal Attention Network (CTA-Net), in which coarse temporal branches are introduced in a trainable glimpse network. The goal is to allow the glimpse to capture high-level temporal relationships, such as ‘during’, ‘before’ and ‘after’ by focusing on a specific part of a video. These branches also respect the topology of the temporal dynamics in the video, ensuring that different branches learn meaningful spatial and temporal changes. The model then uses an innovative attention mechanism to generate high-level action specific contextual information for activity recognition by exploring the hidden states of an LSTM. The attention mechanism helps in learning to decide the importance of each hidden state for the recognition task by weighing them when constructing the representation of the video. Our approach is evaluated on four publicly accessible datasets and significantly outperforms the state-of-the-art by a considerable margin with only RGB video as input.","PeriodicalId":236300,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128663748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Data-free Knowledge Distillation for Object Detection 面向对象检测的无数据知识蒸馏
2021 IEEE Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2021-01-01 DOI: 10.1109/WACV48630.2021.00333
Akshay Chawla, Hongxu Yin, Pavlo Molchanov, J. Álvarez
{"title":"Data-free Knowledge Distillation for Object Detection","authors":"Akshay Chawla, Hongxu Yin, Pavlo Molchanov, J. Álvarez","doi":"10.1109/WACV48630.2021.00333","DOIUrl":"https://doi.org/10.1109/WACV48630.2021.00333","url":null,"abstract":"We present DeepInversion for Object Detection (DIODE) to enable data-free knowledge distillation for neural networks trained on the object detection task. From a data-free perspective, DIODE synthesizes images given only an off-the-shelf pre-trained detection network and without any prior domain knowledge, generator network, or pre-computed activations. DIODE relies on two key components—first, an extensive set of differentiable augmentations to improve image fidelity and distillation effectiveness. Second, a novel automated bounding box and category sampling scheme for image synthesis enabling generating a large number of images with a diverse set of spatial and category objects. The resulting images enable data-free knowledge distillation from a teacher to a student detector, initialized from scratch.In an extensive set of experiments, we demonstrate that DIODE’s ability to match the original training distribution consistently enables more effective knowledge distillation than out-of-distribution proxy datasets, which unavoidably occur in a data-free setup given the absence of the original domain knowledge.","PeriodicalId":236300,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129456350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
The Laughing Machine: Predicting Humor in Video 笑的机器:预测视频中的幽默
2021 IEEE Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2021-01-01 DOI: 10.1109/WACV48630.2021.00212
Yuta Kayatani, Zekun Yang, Mayu Otani, Noa García, Chenhui Chu, Yuta Nakashima, H. Takemura
{"title":"The Laughing Machine: Predicting Humor in Video","authors":"Yuta Kayatani, Zekun Yang, Mayu Otani, Noa García, Chenhui Chu, Yuta Nakashima, H. Takemura","doi":"10.1109/WACV48630.2021.00212","DOIUrl":"https://doi.org/10.1109/WACV48630.2021.00212","url":null,"abstract":"Humor is a very important communication tool; yet, it is an open problem for machines to understand humor. In this paper, we build a new multimodal dataset for humor prediction that includes subtitles and video frames, as well as humor labels associated with video’s timestamps. On top of it, we present a model to predict whether a subtitle causes laughter. Our model uses the visual modality through facial expression and character name recognition, together with the verbal modality, to explore how the visual modality helps. In addition, we use an attention mechanism to adjust the weight for each modality to facilitate humor prediction. Interestingly, our experimental results show that the performance boost by combinations of different modalities, and the attention mechanism and the model mostly relies on the verbal modality.","PeriodicalId":236300,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130498713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
FairFace: Face Attribute Dataset for Balanced Race, Gender, and Age for Bias Measurement and Mitigation FairFace:平衡种族、性别和年龄的面部属性数据集,用于偏见测量和缓解
2021 IEEE Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2021-01-01 DOI: 10.1109/WACV48630.2021.00159
Kimmo Kärkkäinen, Jungseock Joo
{"title":"FairFace: Face Attribute Dataset for Balanced Race, Gender, and Age for Bias Measurement and Mitigation","authors":"Kimmo Kärkkäinen, Jungseock Joo","doi":"10.1109/WACV48630.2021.00159","DOIUrl":"https://doi.org/10.1109/WACV48630.2021.00159","url":null,"abstract":"Existing public face image datasets are strongly biased toward Caucasian faces, and other races (e.g., Latino) are significantly underrepresented. The models trained from such datasets suffer from inconsistent classification accuracy, which limits the applicability of face analytic systems to non-White race groups. To mitigate the race bias problem in these datasets, we constructed a novel face image dataset containing 108,501 images which is balanced on race. We define 7 race groups: White, Black, Indian, East Asian, Southeast Asian, Middle Eastern, and Latino. Images were collected from the YFCC-100M Flickr dataset and labeled with race, gender, and age groups. Evaluations were performed on existing face attribute datasets as well as novel image datasets to measure the generalization performance. We find that the model trained from our dataset is substantially more accurate on novel datasets and the accuracy is consistent across race and gender groups. We also compare several commercial computer vision APIs and report their balanced accuracy across gender, race, and age groups. Our code, data, and models are available at https://github.com/joojs/fairface.","PeriodicalId":236300,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123933686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 359
Spatially Aware Metadata for Raw Reconstruction 用于原始重建的空间感知元数据
2021 IEEE Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2021-01-01 DOI: 10.1109/WACV48630.2021.00026
Abhijith Punnappurath, M. S. Brown
{"title":"Spatially Aware Metadata for Raw Reconstruction","authors":"Abhijith Punnappurath, M. S. Brown","doi":"10.1109/WACV48630.2021.00026","DOIUrl":"https://doi.org/10.1109/WACV48630.2021.00026","url":null,"abstract":"A camera sensor captures a raw-RGB image that is then processed to a standard RGB (sRGB) image through a series of onboard operations performed by the camera’s image signal processor (ISP). Among these processing steps, local tone mapping is one of the most important operations used to enhance the overall appearance of the final rendered sRGB image. For certain applications, it is often desirable to de-render or unprocess the sRGB image back to its original raw-RGB values. This \"raw reconstruction\" is a challenging task because many of the operations performed by the ISP, including local tone mapping, are nonlinear and difficult to invert. Existing raw reconstruction methods that store specialized metadata at capture time to enable raw recovery ignore local tone mapping and assume that a global transformation exists between the raw-RGB and sRGB color spaces. In this work, we advocate a spatially aware metadata-based raw reconstruction method that is robust to local tone mapping, and yields significantly higher raw reconstruction accuracy (6 dB average PSNR improvement) compared to existing raw reconstruction methods. Our method requires only 0.2% samples of the full-sized image as metadata, has negligible computational overhead at capture time, and can be easily integrated into modern ISPs.","PeriodicalId":236300,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123944455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Interpretable and Trustworthy Deepfake Detection via Dynamic Prototypes 基于动态原型的可解释和可信深度伪造检测
2021 IEEE Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2021-01-01 DOI: 10.1109/WACV48630.2021.00202
Loc Trinh, Michael Tsang, Sirisha Rambhatla, Yan Liu
{"title":"Interpretable and Trustworthy Deepfake Detection via Dynamic Prototypes","authors":"Loc Trinh, Michael Tsang, Sirisha Rambhatla, Yan Liu","doi":"10.1109/WACV48630.2021.00202","DOIUrl":"https://doi.org/10.1109/WACV48630.2021.00202","url":null,"abstract":"In this paper we propose a novel human-centered approach for detecting forgery in face images, using dynamic prototypes as a form of visual explanations. Currently, most state-of-the-art deepfake detections are based on black-box models that process videos frame-by-frame for inference, and few closely examine their temporal inconsistencies. However, the existence of such temporal artifacts within deepfake videos is key in detecting and explaining deepfakes to a supervising human. To this end, we propose Dynamic Prototype Network (DPNet) – an interpretable and effective solution that utilizes dynamic representations (i.e., prototypes) to explain deepfake temporal artifacts. Extensive experimental results show that DPNet achieves competitive predictive performance, even on unseen testing datasets such as Google’s DeepFakeDetection, DeeperForensics, and Celeb-DF, while providing easy referential explanations of deepfake dynamics. On top of DPNet’s prototypical framework, we further formulate temporal logic specifications based on these dynamics to check our model’s compliance to desired temporal behaviors, hence providing trustworthiness for such critical detection systems.","PeriodicalId":236300,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123727404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 47
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信