2022 19th Conference on Robots and Vision (CRV)最新文献

筛选
英文 中文
3DVQA: Visual Question Answering for 3D Environments 3DVQA: 3D环境的视觉问答
2022 19th Conference on Robots and Vision (CRV) Pub Date : 2022-05-01 DOI: 10.1109/CRV55824.2022.00038
Yasaman Etesam, Leon Kochiev, Angel X. Chang
{"title":"3DVQA: Visual Question Answering for 3D Environments","authors":"Yasaman Etesam, Leon Kochiev, Angel X. Chang","doi":"10.1109/CRV55824.2022.00038","DOIUrl":"https://doi.org/10.1109/CRV55824.2022.00038","url":null,"abstract":"Visual Question Answering (VQA) is a widely studied problem in computer vision and natural language processing. However, current approaches to VQA have been investigated primarily in the 2D image domain. We study VQA in the 3D domain, with our input being point clouds of real-world 3D scenes, instead of 2D images. We believe that this 3D data modality provide richer spatial relation information that is of interest in the VQA task. In this paper, we introduce the 3DVQA-ScanNet dataset, the first VQA dataset in 3D, and we investigate the performance of a spectrum of baseline approaches on the 3D VQA task.","PeriodicalId":131142,"journal":{"name":"2022 19th Conference on Robots and Vision (CRV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130468311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Conference Organization: CRV 2022 会议组织:CRV 2022
2022 19th Conference on Robots and Vision (CRV) Pub Date : 2022-05-01 DOI: 10.1109/crv55824.2022.00006
{"title":"Conference Organization: CRV 2022","authors":"","doi":"10.1109/crv55824.2022.00006","DOIUrl":"https://doi.org/10.1109/crv55824.2022.00006","url":null,"abstract":"","PeriodicalId":131142,"journal":{"name":"2022 19th Conference on Robots and Vision (CRV)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133842507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A View Invariant Human Action Recognition System for Noisy Inputs 基于噪声输入的视觉不变人体动作识别系统
2022 19th Conference on Robots and Vision (CRV) Pub Date : 2022-05-01 DOI: 10.1109/CRV55824.2022.00017
Joo Wang Kim, J. Hernandez, Richard Cobos, Ricardo Palacios, Andres G. Abad
{"title":"A View Invariant Human Action Recognition System for Noisy Inputs","authors":"Joo Wang Kim, J. Hernandez, Richard Cobos, Ricardo Palacios, Andres G. Abad","doi":"10.1109/CRV55824.2022.00017","DOIUrl":"https://doi.org/10.1109/CRV55824.2022.00017","url":null,"abstract":"We propose a skeleton-based Human Action Recognition (HAR) system, robust to both noisy inputs and perspective variation. This system receives RGB videos as input and consists of three modules: (M1) 2D Key-Points Estimation module, (M2) Robustness module, and (M3) Action Classification module; of which M2 is our main contribution. This module uses pre-trained 3D pose estimator and pose refinement networks to handle noisy information including missing points, and uses rotations of the 3D poses to add robustness to camera view-point variation. To evaluate our approach, we carried out comparison experiments between models trained with M2 and without it. These experiments were conducted on the UESTC view-varying dataset, on the i3DPost multi-view human action dataset and on a Boxing Actions dataset, created by us. Our system achieved positive results, improving the accuracy by 24%, 3% and 11% on each dataset, respectively. On the UESTC dataset, our method achieves the new state of the art for the cross-view evaluation protocols.","PeriodicalId":131142,"journal":{"name":"2022 19th Conference on Robots and Vision (CRV)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114337637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learned Intrinsic Auto-Calibration From Fundamental Matrices 从基本矩阵学习固有的自动校准
2022 19th Conference on Robots and Vision (CRV) Pub Date : 2022-05-01 DOI: 10.1109/CRV55824.2022.00037
Karim Samaha, Georges Younes, Daniel C. Asmar, J. Zelek
{"title":"Learned Intrinsic Auto-Calibration From Fundamental Matrices","authors":"Karim Samaha, Georges Younes, Daniel C. Asmar, J. Zelek","doi":"10.1109/CRV55824.2022.00037","DOIUrl":"https://doi.org/10.1109/CRV55824.2022.00037","url":null,"abstract":"Auto-calibration that relies on unconstrained image content and epipolar relationships is necessary in online operations, especially when internal calibration parameters such as focal length can vary. In contrast, traditional calibration relies on a checkerboard and other scene information and are typically conducted offline. Unfortunately, auto-calibration may not always converge when solved traditionally in an iterative optimization formalism. We propose to solve for the intrinsic calibration parameters using a neural network that is trained on a synthetic Unity dataset that we created. We demonstrate our results on both synthetic and real data to validate the generalizability of our neural network model, which outperforms traditional methods by 2% to 30%, and outperforms recent deep learning approaches by a factor of 2 to 4 times.","PeriodicalId":131142,"journal":{"name":"2022 19th Conference on Robots and Vision (CRV)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127822748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TemporalNet: Real-time 2D-3D Video Object Detection TemporalNet:实时2D-3D视频对象检测
2022 19th Conference on Robots and Vision (CRV) Pub Date : 2022-05-01 DOI: 10.1109/CRV55824.2022.00034
Mei-Huan Chen, J. Lang
{"title":"TemporalNet: Real-time 2D-3D Video Object Detection","authors":"Mei-Huan Chen, J. Lang","doi":"10.1109/CRV55824.2022.00034","DOIUrl":"https://doi.org/10.1109/CRV55824.2022.00034","url":null,"abstract":"Designing a video detection network based on state-of-the-art single-image object detectors may seem like an obvious choice. However, video object detection has extra challenges due to the lower quality of individual frames in a video, and hence the need to include temporal information for high-quality detection results. We design a novel interleaved architecture combining a 2D convolutional network and a 3D temporal network. To explore inter-frame information, we propose feature aggregation based on a temporal network. Our TemporalNet utilizes Appearance-preserving 3D convolution (AP3D) for extracting aligned features in the temporal dimension. Our temporal network functions at multiple scales for better performance, which allows communication between 2D and 3D blocks at each scale and also across scales. Our TemporalNet is a plug-and-play block that can be added to a multi-scale single-image detection network without any adjustments in the network architecture. When TemporalNet is applied to Yolov3 it is real-time with a running time of 35ms/frame on a low-end GPU. Our real-time approach achieves 77.1 % mAP (mean Average Precision) on ImageNet VID 2017 dataset with TemporalNet-4, where TemporalNet-16 achieves 80.9 % mAP which is a competitive result.","PeriodicalId":131142,"journal":{"name":"2022 19th Conference on Robots and Vision (CRV)","volume":"187 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114647008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Permutation Model for the Self-Supervised Stereo Matching Problem 自监督立体匹配问题的置换模型
2022 19th Conference on Robots and Vision (CRV) Pub Date : 2022-05-01 DOI: 10.1109/CRV55824.2022.00024
Pierre-Andre Brousseau, S. Roy
{"title":"A Permutation Model for the Self-Supervised Stereo Matching Problem","authors":"Pierre-Andre Brousseau, S. Roy","doi":"10.1109/CRV55824.2022.00024","DOIUrl":"https://doi.org/10.1109/CRV55824.2022.00024","url":null,"abstract":"This paper proposes a novel permutation formulation to the stereo matching problem. Our proposed approach introduces a permutation volume which provides a natural representation of stereo constraints and disentangles stereo matching from monocular disparity estimation. It also has the benefit of simultaneously computing disparity and a confidence measure which provides explainability and a simple confidence heuristic for occlusions. In the context of self-supervised learning, the stereo performance is validated for standard testing datasets and the confidence maps are validated through stereo-visibility. Results show that the permutation volume increases stereo performance and features good generalization behaviour. We believe that measuring confidence is a key part of explainability which is instrumental to adoption of deep methods in critical stereo applications such as autonomous navigation.","PeriodicalId":131142,"journal":{"name":"2022 19th Conference on Robots and Vision (CRV)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125639818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Integrating High-Resolution Tactile Sensing into Grasp Stability Prediction 将高分辨率触觉传感集成到抓握稳定性预测中
2022 19th Conference on Robots and Vision (CRV) Pub Date : 2022-05-01 DOI: 10.1109/CRV55824.2022.00021
Lachlan Chumbley, Morris Gu, Rhys Newbury, J. Leitner, Akansel Cosgun
{"title":"Integrating High-Resolution Tactile Sensing into Grasp Stability Prediction","authors":"Lachlan Chumbley, Morris Gu, Rhys Newbury, J. Leitner, Akansel Cosgun","doi":"10.1109/CRV55824.2022.00021","DOIUrl":"https://doi.org/10.1109/CRV55824.2022.00021","url":null,"abstract":"We investigate how high-resolution tactile sensors can be utilized in combination with vision and depth sensing, to improve grasp stability prediction. Recent advances in simulating high-resolution tactile sensing, in particular the TACTO simulator, enabled us to evaluate how neural networks can be trained with a combination of sensing modalities. With the large amounts of data needed to train large neural networks, robotic simulators provide a fast way to automate the data collection process. We expand on the existing work through an ablation study and an increased set of objects taken from the YCB benchmark set. Our results indicate that while the combination of vision, depth, and tactile sensing provides the best prediction results on known objects, the network fails to generalize to unknown objects. Our work also addresses existing issues with robotic grasping in tactile simulation and how to overcome them.","PeriodicalId":131142,"journal":{"name":"2022 19th Conference on Robots and Vision (CRV)","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133102915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Anomaly Detection with Adversarially Learned Perturbations of Latent Space 基于潜空间逆学习扰动的异常检测
2022 19th Conference on Robots and Vision (CRV) Pub Date : 2022-05-01 DOI: 10.1109/CRV55824.2022.00031
Vahid Reza Khazaie, A. Wong, John Taylor Jewell, Y. Mohsenzadeh
{"title":"Anomaly Detection with Adversarially Learned Perturbations of Latent Space","authors":"Vahid Reza Khazaie, A. Wong, John Taylor Jewell, Y. Mohsenzadeh","doi":"10.1109/CRV55824.2022.00031","DOIUrl":"https://doi.org/10.1109/CRV55824.2022.00031","url":null,"abstract":"Anomaly detection is to identify samples that do not conform to the distribution of the normal data. Due to the unavailability of anomalous data, training a supervised deep neural network is a cumbersome task. As such, unsupervised methods are preferred as a common approach to solve this task. Deep autoencoders have been broadly adopted as a base of many unsupervised anomaly detection methods. However, a notable shortcoming of deep autoencoders is that they provide insufficient representations for anomaly detection by generalizing to reconstruct outliers. In this work, we have designed an adversarial framework consisting of two competing components, an Adversarial Distorter, and an Autoencoder. The Adversarial Distorter is a convolutional encoder that learns to produce effective perturbations and the autoencoder is a deep convolutional neural network that aims to reconstruct the images from the perturbed latent feature space. The networks are trained with opposing goals in which the Adversarial Distorter produces perturbations that are applied to the en-coder's latent feature space to maximize the reconstruction error and the autoencoder tries to neutralize the effect of these perturbations to minimize it. When applied to anomaly detection, the proposed method learns semantically richer representations due to applying perturbations to the feature space. The proposed method outperforms the existing state-of-the-art methods in anomaly detection on image and video datasets.","PeriodicalId":131142,"journal":{"name":"2022 19th Conference on Robots and Vision (CRV)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115700758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Inter- & Intra-City Image Geolocalization 城市间与城市内形象地理定位
2022 19th Conference on Robots and Vision (CRV) Pub Date : 2022-05-01 DOI: 10.1109/CRV55824.2022.00023
J. Tanner, K. Dick, J.R. Green
{"title":"Inter- & Intra-City Image Geolocalization","authors":"J. Tanner, K. Dick, J.R. Green","doi":"10.1109/CRV55824.2022.00023","DOIUrl":"https://doi.org/10.1109/CRV55824.2022.00023","url":null,"abstract":"Can a photo be accurately geolocated within a city from its pixels alone? While this image geolocation problem has been successfully addressed at the planetary- and nation-levels when framed as a classification problem using convolutional neural networks, no method has yet been able to precisely geolocate images within the city- and/or at the street-level when framed as a latitude/longitude regression-type problem. We leverage the highly densely sampled Streetlearn dataset of imagery from Manhattan and Pittsburgh to first develop a highly accurate inter-city predictor and then experimentally resolve, for the first time, the intra-city performance limits of framing image geolocation as a regression-type problem. We then reformulate the problem as an extreme-resolution classification task by subdividing the city into hundreds of equirectangular-scaled bins and train our respective intra-city deep convolutional neural network on tens of thousands of images. Our experiments serve as a foundation to develop a scalable inter- and intra-city image geolocation framework that, on average, resolves an image within 250 m2. We demonstrate that our models outperform SIFT-based image retrieval-type models based on differing weather patterns, lighting conditions, location-specific imagery, and are temporally robust when evaluated upon both past and future imagery. Both the practical and ethical ramifications of such a model are also discussed given the threat to individual privacy in a technocentric surveillance capitalist society.","PeriodicalId":131142,"journal":{"name":"2022 19th Conference on Robots and Vision (CRV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130183709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Object Class Aware Video Anomaly Detection through Image Translation 基于图像翻译的对象类感知视频异常检测
2022 19th Conference on Robots and Vision (CRV) Pub Date : 2022-05-01 DOI: 10.1109/CRV55824.2022.00020
M. Baradaran, R. Bergevin
{"title":"Object Class Aware Video Anomaly Detection through Image Translation","authors":"M. Baradaran, R. Bergevin","doi":"10.1109/CRV55824.2022.00020","DOIUrl":"https://doi.org/10.1109/CRV55824.2022.00020","url":null,"abstract":"Semi-supervised video anomaly detection (VAD) methods formulate the task of anomaly detection as detection of deviations from the learned normal patterns. Previous works in the field (reconstruction or prediction-based methods) suffer from two drawbacks: 1) They focus on low-level features, and they (especially holistic approaches) do not effectively consider the object classes. 2) Object-centric approaches neglect some of the context information (such as location). To tackle these challenges, this paper proposes a novel two-stream object-aware VAD method that learns the normal appearance and motion patterns through image translation tasks. The appearance branch translates the input image to the target semantic segmentation map produced by Mask-RCNN, and the motion branch associates each frame with its expected optical flow magnitude. Any deviation from the expected appearance or motion in the inference stage shows the degree of potential abnormality. We evaluated our proposed method on the ShanghaiTech, UCSD-Pedl, and UCSD-Ped2 datasets and the results show competitive performance compared with state-of-the-art works. Most importantly, the results show that, as significant improvements to previous methods, detections by our method are completely explainable and anomalies are localized accurately in the frames.","PeriodicalId":131142,"journal":{"name":"2022 19th Conference on Robots and Vision (CRV)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125956425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信