2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)最新文献

Self-Supervised 2D/3D Registration for X-Ray to CT Image Fusion x射线与CT图像融合的自监督2D/3D配准

2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2023-01-01 DOI: 10.1109/wacv56688.2023.00281

Srikrishna Jaganathan, Maximilian Kukla, Jian Wang, Karthik Shetty, Andreas Maier

引用次数: 1

Ego-Vehicle Action Recognition based on Semi-Supervised Contrastive Learning 基于半监督对比学习的自我-车辆动作识别

2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2023-01-01 DOI: 10.1109/wacv56688.2023.00593

Chihiro Noguchi, Toshihiro Tanizawa

{"title":"Ego-Vehicle Action Recognition based on Semi-Supervised Contrastive Learning","authors":"Chihiro Noguchi, Toshihiro Tanizawa","doi":"10.1109/wacv56688.2023.00593","DOIUrl":"https://doi.org/10.1109/wacv56688.2023.00593","url":null,"abstract":"In recent years, many automobiles have been equipped with cameras, which have accumulated an enormous amount of video footage of driving scenes. Autonomous driving demands the highest level of safety, for which even unimaginably rare driving scenes have to be collected in training data to improve the recognition accuracy for specific scenes. However, it is prohibitively costly to find very few specific scenes from an enormous amount of videos. In this article, we show that proper video-to-video distances can be defined by focusing on ego-vehicle actions. It is well known that existing methods based on supervised learning cannot handle videos that do not fall into predefined classes, though they work well in defining video-to-video distances in the embedding space between labeled videos. To tackle this problem, we propose a method based on semi-supervised contrastive learning. We consider two related but distinct contrastive learning: standard graph contrastive learning and our proposed SOIA-based contrastive learning. We observe that the latter approach can provide more sensible video-to-video distances between unlabeled videos. Next, the effectiveness of our method is quantified by evaluating the classification performance of the ego-vehicle action recognition using HDD dataset, which shows that our method including unlabeled data in training significantly outperforms the existing methods using only labeled data in training.","PeriodicalId":497882,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135470414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

TTTFlow: Unsupervised Test-Time Training with Normalizing Flow TTTFlow:使用规范化流进行无监督测试时间训练

2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2023-01-01 DOI: 10.1109/wacv56688.2023.00216

David Osowiechi, Gustavo A. Vargas Hakim, Mehrdad Noori, Milad Cheraghalikhani, Ismail Ben Ayed, Christian Desrosiers

引用次数: 2

Visually explaining 3D-CNN predictions for video classification with an adaptive occlusion sensitivity analysis 可视化地解释3D-CNN预测视频分类与自适应遮挡敏感性分析

2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2023-01-01 DOI: 10.1109/wacv56688.2023.00156

Tomoki Uchiyama, Naoya Sogi, Koichiro Niinuma, Kazuhiro Fukui

{"title":"Visually explaining 3D-CNN predictions for video classification with an adaptive occlusion sensitivity analysis","authors":"Tomoki Uchiyama, Naoya Sogi, Koichiro Niinuma, Kazuhiro Fukui","doi":"10.1109/wacv56688.2023.00156","DOIUrl":"https://doi.org/10.1109/wacv56688.2023.00156","url":null,"abstract":"This paper proposes a method for visually explaining the decision-making process of 3D convolutional neural networks (CNN) with a temporal extension of occlusion sensitivity analysis. The key idea here is to occlude a specific volume of data by a 3D mask in an input 3D temporalspatial data space and then measure the change degree in the output score. The occluded volume data that produces a larger change degree is regarded as a more critical element for classification. However, while the occlusion sensitivity analysis is commonly used to analyze single image classification, it is not so straightforward to apply this idea to video classification as a simple fixed cuboid cannot deal with the motions. To this end, we adapt the shape of a 3D occlusion mask to complicated motions of target objects. Our flexible mask adaptation is performed by considering the temporal continuity and spatial co-occurrence of the optical flows extracted from the input video data. We further propose to approximate our method by using the first-order partial derivative of the score with respect to an input image to reduce its computational cost. We demonstrate the effectiveness of our method through various and extensive comparisons with the conventional methods in terms of the deletion/insertion metric and the pointing metric on the UCF101. The code is available at: https://github.com/uchiyama33/AOSA.","PeriodicalId":497882,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135127083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

X-Align: Cross-Modal Cross-View Alignment for Bird’s-Eye-View Segmentation X-Align:用于鸟瞰分割的跨模态跨视图对齐

2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2023-01-01 DOI: 10.1109/wacv56688.2023.00330

Shubhankar Borse, Marvin Klingner, Varun Ravi Kumar, Hong Cai, Abdulaziz Almuzairee, Senthil Yogamani, Fatih Porikli

{"title":"X-Align: Cross-Modal Cross-View Alignment for Bird’s-Eye-View Segmentation","authors":"Shubhankar Borse, Marvin Klingner, Varun Ravi Kumar, Hong Cai, Abdulaziz Almuzairee, Senthil Yogamani, Fatih Porikli","doi":"10.1109/wacv56688.2023.00330","DOIUrl":"https://doi.org/10.1109/wacv56688.2023.00330","url":null,"abstract":"Bird’s-eye-view (BEV) grid is a typical representation of the perception of road components, e.g., drivable area, in autonomous driving. Most existing approaches rely on cameras only to perform segmentation in BEV space, which is fundamentally constrained by the absence of reliable depth information. The latest works leverage both camera and LiDAR modalities but suboptimally fuse their features using simple, concatenation-based mechanisms.In this paper, we address these problems by enhancing the alignment of the unimodal features in order to aid feature fusion, as well as enhancing the alignment between the cameras’ perspective view (PV) and BEV representations. We propose X-Align, a novel end-to-end cross-modal and cross-view learning framework for BEV segmentation consisting of the following components: (i) a novel CrossModal Feature Alignment (X-FA) loss, (ii) an attentionbased Cross-Modal Feature Fusion (X-FF) module to align multi-modal BEV features implicitly, and (iii) an auxiliary PV segmentation branch with Cross-View Segmentation Alignment (X-SA) losses to improve the PV-to-BEV transformation. We evaluate our proposed method across two commonly used benchmark datasets, i.e., nuScenes and KITTI-360. Notably, X-Align significantly outperforms the state-of-the-art by 3 absolute mIoU points on nuScenes. We also provide extensive ablation studies to demonstrate the effectiveness of the individual components.","PeriodicalId":497882,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136298132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3