2022 19th Conference on Robots and Vision (CRV)最新文献_第2页

3DVQA: Visual Question Answering for 3D Environments 3DVQA: 3D环境的视觉问答

2022 19th Conference on Robots and Vision (CRV) Pub Date : 2022-05-01 DOI: 10.1109/CRV55824.2022.00038

Yasaman Etesam, Leon Kochiev, Angel X. Chang

引用次数: 3

Conference Organization: CRV 2022 会议组织:CRV 2022

2022 19th Conference on Robots and Vision (CRV) Pub Date : 2022-05-01 DOI: 10.1109/crv55824.2022.00006

引用次数: 0

A View Invariant Human Action Recognition System for Noisy Inputs 基于噪声输入的视觉不变人体动作识别系统

2022 19th Conference on Robots and Vision (CRV) Pub Date : 2022-05-01 DOI: 10.1109/CRV55824.2022.00017

Joo Wang Kim, J. Hernandez, Richard Cobos, Ricardo Palacios, Andres G. Abad

引用次数: 0

TemporalNet: Real-time 2D-3D Video Object Detection TemporalNet:实时2D-3D视频对象检测

2022 19th Conference on Robots and Vision (CRV) Pub Date : 2022-05-01 DOI: 10.1109/CRV55824.2022.00034

Mei-Huan Chen, J. Lang

{"title":"TemporalNet: Real-time 2D-3D Video Object Detection","authors":"Mei-Huan Chen, J. Lang","doi":"10.1109/CRV55824.2022.00034","DOIUrl":"https://doi.org/10.1109/CRV55824.2022.00034","url":null,"abstract":"Designing a video detection network based on state-of-the-art single-image object detectors may seem like an obvious choice. However, video object detection has extra challenges due to the lower quality of individual frames in a video, and hence the need to include temporal information for high-quality detection results. We design a novel interleaved architecture combining a 2D convolutional network and a 3D temporal network. To explore inter-frame information, we propose feature aggregation based on a temporal network. Our TemporalNet utilizes Appearance-preserving 3D convolution (AP3D) for extracting aligned features in the temporal dimension. Our temporal network functions at multiple scales for better performance, which allows communication between 2D and 3D blocks at each scale and also across scales. Our TemporalNet is a plug-and-play block that can be added to a multi-scale single-image detection network without any adjustments in the network architecture. When TemporalNet is applied to Yolov3 it is real-time with a running time of 35ms/frame on a low-end GPU. Our real-time approach achieves 77.1 % mAP (mean Average Precision) on ImageNet VID 2017 dataset with TemporalNet-4, where TemporalNet-16 achieves 80.9 % mAP which is a competitive result.","PeriodicalId":131142,"journal":{"name":"2022 19th Conference on Robots and Vision (CRV)","volume":"187 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114647008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Permutation Model for the Self-Supervised Stereo Matching Problem 自监督立体匹配问题的置换模型

2022 19th Conference on Robots and Vision (CRV) Pub Date : 2022-05-01 DOI: 10.1109/CRV55824.2022.00024

Pierre-Andre Brousseau, S. Roy

引用次数: 1

Learned Intrinsic Auto-Calibration From Fundamental Matrices 从基本矩阵学习固有的自动校准

2022 19th Conference on Robots and Vision (CRV) Pub Date : 2022-05-01 DOI: 10.1109/CRV55824.2022.00037

Karim Samaha, Georges Younes, Daniel C. Asmar, J. Zelek

引用次数: 0

Integrating High-Resolution Tactile Sensing into Grasp Stability Prediction 将高分辨率触觉传感集成到抓握稳定性预测中

2022 19th Conference on Robots and Vision (CRV) Pub Date : 2022-05-01 DOI: 10.1109/CRV55824.2022.00021

Lachlan Chumbley, Morris Gu, Rhys Newbury, J. Leitner, Akansel Cosgun

引用次数: 0

Inter- & Intra-City Image Geolocalization 城市间与城市内形象地理定位

2022 19th Conference on Robots and Vision (CRV) Pub Date : 2022-05-01 DOI: 10.1109/CRV55824.2022.00023

J. Tanner, K. Dick, J.R. Green

{"title":"Inter- & Intra-City Image Geolocalization","authors":"J. Tanner, K. Dick, J.R. Green","doi":"10.1109/CRV55824.2022.00023","DOIUrl":"https://doi.org/10.1109/CRV55824.2022.00023","url":null,"abstract":"Can a photo be accurately geolocated within a city from its pixels alone? While this image geolocation problem has been successfully addressed at the planetary- and nation-levels when framed as a classification problem using convolutional neural networks, no method has yet been able to precisely geolocate images within the city- and/or at the street-level when framed as a latitude/longitude regression-type problem. We leverage the highly densely sampled Streetlearn dataset of imagery from Manhattan and Pittsburgh to first develop a highly accurate inter-city predictor and then experimentally resolve, for the first time, the intra-city performance limits of framing image geolocation as a regression-type problem. We then reformulate the problem as an extreme-resolution classification task by subdividing the city into hundreds of equirectangular-scaled bins and train our respective intra-city deep convolutional neural network on tens of thousands of images. Our experiments serve as a foundation to develop a scalable inter- and intra-city image geolocation framework that, on average, resolves an image within 250 m2. We demonstrate that our models outperform SIFT-based image retrieval-type models based on differing weather patterns, lighting conditions, location-specific imagery, and are temporally robust when evaluated upon both past and future imagery. Both the practical and ethical ramifications of such a model are also discussed given the threat to individual privacy in a technocentric surveillance capitalist society.","PeriodicalId":131142,"journal":{"name":"2022 19th Conference on Robots and Vision (CRV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130183709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Anomaly Detection with Adversarially Learned Perturbations of Latent Space 基于潜空间逆学习扰动的异常检测

2022 19th Conference on Robots and Vision (CRV) Pub Date : 2022-05-01 DOI: 10.1109/CRV55824.2022.00031

Vahid Reza Khazaie, A. Wong, John Taylor Jewell, Y. Mohsenzadeh

{"title":"Anomaly Detection with Adversarially Learned Perturbations of Latent Space","authors":"Vahid Reza Khazaie, A. Wong, John Taylor Jewell, Y. Mohsenzadeh","doi":"10.1109/CRV55824.2022.00031","DOIUrl":"https://doi.org/10.1109/CRV55824.2022.00031","url":null,"abstract":"Anomaly detection is to identify samples that do not conform to the distribution of the normal data. Due to the unavailability of anomalous data, training a supervised deep neural network is a cumbersome task. As such, unsupervised methods are preferred as a common approach to solve this task. Deep autoencoders have been broadly adopted as a base of many unsupervised anomaly detection methods. However, a notable shortcoming of deep autoencoders is that they provide insufficient representations for anomaly detection by generalizing to reconstruct outliers. In this work, we have designed an adversarial framework consisting of two competing components, an Adversarial Distorter, and an Autoencoder. The Adversarial Distorter is a convolutional encoder that learns to produce effective perturbations and the autoencoder is a deep convolutional neural network that aims to reconstruct the images from the perturbed latent feature space. The networks are trained with opposing goals in which the Adversarial Distorter produces perturbations that are applied to the en-coder's latent feature space to maximize the reconstruction error and the autoencoder tries to neutralize the effect of these perturbations to minimize it. When applied to anomaly detection, the proposed method learns semantically richer representations due to applying perturbations to the feature space. The proposed method outperforms the existing state-of-the-art methods in anomaly detection on image and video datasets.","PeriodicalId":131142,"journal":{"name":"2022 19th Conference on Robots and Vision (CRV)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115700758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Object Class Aware Video Anomaly Detection through Image Translation 基于图像翻译的对象类感知视频异常检测

2022 19th Conference on Robots and Vision (CRV) Pub Date : 2022-05-01 DOI: 10.1109/CRV55824.2022.00020

M. Baradaran, R. Bergevin

{"title":"Object Class Aware Video Anomaly Detection through Image Translation","authors":"M. Baradaran, R. Bergevin","doi":"10.1109/CRV55824.2022.00020","DOIUrl":"https://doi.org/10.1109/CRV55824.2022.00020","url":null,"abstract":"Semi-supervised video anomaly detection (VAD) methods formulate the task of anomaly detection as detection of deviations from the learned normal patterns. Previous works in the field (reconstruction or prediction-based methods) suffer from two drawbacks: 1) They focus on low-level features, and they (especially holistic approaches) do not effectively consider the object classes. 2) Object-centric approaches neglect some of the context information (such as location). To tackle these challenges, this paper proposes a novel two-stream object-aware VAD method that learns the normal appearance and motion patterns through image translation tasks. The appearance branch translates the input image to the target semantic segmentation map produced by Mask-RCNN, and the motion branch associates each frame with its expected optical flow magnitude. Any deviation from the expected appearance or motion in the inference stage shows the degree of potential abnormality. We evaluated our proposed method on the ShanghaiTech, UCSD-Pedl, and UCSD-Ped2 datasets and the results show competitive performance compared with state-of-the-art works. Most importantly, the results show that, as significant improvements to previous methods, detections by our method are completely explainable and anomalies are localized accurately in the frames.","PeriodicalId":131142,"journal":{"name":"2022 19th Conference on Robots and Vision (CRV)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125956425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2