2018 15th Conference on Computer and Robot Vision (CRV)最新文献

筛选
英文 中文
Convolutional Neural Networks Regularized by Correlated Noise 相关噪声正则化卷积神经网络
2018 15th Conference on Computer and Robot Vision (CRV) Pub Date : 2018-04-03 DOI: 10.1109/CRV.2018.00059
Shamak Dutta, B. Tripp, Graham W. Taylor
{"title":"Convolutional Neural Networks Regularized by Correlated Noise","authors":"Shamak Dutta, B. Tripp, Graham W. Taylor","doi":"10.1109/CRV.2018.00059","DOIUrl":"https://doi.org/10.1109/CRV.2018.00059","url":null,"abstract":"Neurons in the visual cortex are correlated in their variability. The presence of correlation impacts cortical processing because noise cannot be averaged out over many neurons. In an effort to understand the functional purpose of correlated variability, we implement and evaluate correlated noise models in deep convolutional neural networks. Inspired by the cortex, correlation is defined as a function of the distance between neurons and their selectivity. We show how to sample from high-dimensional correlated distributions while keeping the procedure differentiable, so that back-propagation can proceed as usual. The impact of correlated variability is evaluated on the classification of occluded and non-occluded images with and without the presence of other regularization techniques, such as dropout. More work is needed to understand the effects of correlations in various conditions, however in 10/12 of the cases we studied, the best performance on occluded images was obtained from a model with correlated noise.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134157418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Deep Learning Object Detection Methods for Ecological Camera Trap Data 生态相机陷阱数据的深度学习目标检测方法
2018 15th Conference on Computer and Robot Vision (CRV) Pub Date : 2018-03-28 DOI: 10.1109/CRV.2018.00052
Stefan Schneider, Graham W. Taylor, S. C. Kremer
{"title":"Deep Learning Object Detection Methods for Ecological Camera Trap Data","authors":"Stefan Schneider, Graham W. Taylor, S. C. Kremer","doi":"10.1109/CRV.2018.00052","DOIUrl":"https://doi.org/10.1109/CRV.2018.00052","url":null,"abstract":"Deep learning methods for computer vision tasks show promise for automating the data analysis of camera trap images. Ecological camera traps are a common approach for monitoring an ecosystem's animal population, as they provide continual insight into an environment without being intrusive. However, the analysis of camera trap images is expensive, labour intensive, and time consuming. Recent advances in the field of deep learning for object detection show promise towards automating the analysis of camera trap images. Here, we demonstrate their capabilities by training and comparing two deep learning object detection classifiers, Faster R-CNN and YOLO v2.0, to identify, quantify, and localize animal species within camera trap images using the Reconyx Camera Trap and the self-labeled Gold Standard Snapshot Serengeti data sets. When trained on large labeled datasets, object recognition methods have shown success. We demonstrate their use, in the context of realistically sized ecological data sets, by testing if object detection methods are applicable for ecological research scenarios when utilizing transfer learning. Faster R-CNN outperformed YOLO v2.0 with average accuracies of 93.0% and 76.7% on the two data sets, respectively. Our findings show promising steps towards the automation of the labourious task of labeling camera trap images, which can be used to improve our understanding of the population dynamics of ecosystems across the planet.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121148372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 128
Generalized Hadamard-Product Fusion Operators for Visual Question Answering 用于视觉问答的广义Hadamard-Product融合算子
2018 15th Conference on Computer and Robot Vision (CRV) Pub Date : 2018-03-26 DOI: 10.1109/CRV.2018.00016
Brendan Duke, Graham W. Taylor
{"title":"Generalized Hadamard-Product Fusion Operators for Visual Question Answering","authors":"Brendan Duke, Graham W. Taylor","doi":"10.1109/CRV.2018.00016","DOIUrl":"https://doi.org/10.1109/CRV.2018.00016","url":null,"abstract":"We propose a generalized class of multimodal fusion operators for the task of visual question answering (VQA). We identify generalizations of existing multimodal fusion operators based on the Hadamard product, and show that specific non-trivial instantiations of this generalized fusion operator exhibit superior performance in terms of OpenEnded accuracy on the VQA task. In particular, we introduce Nonlinearity Ensembling, Feature Gating, and post-fusion neural network layers as fusion operator components, culminating in an absolute percentage point improvement of 1.1% on the VQA 2.0 test-dev set over baseline fusion operators, which use the same features as input. We use our findings as evidence that our generalized class of fusion operators could lead to the discovery of even superior task-specific operators when used as a search space in an architecture search over fusion operators.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134154781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Real-Time End-to-End Action Detection with Two-Stream Networks 实时端到端动作检测与两流网络
2018 15th Conference on Computer and Robot Vision (CRV) Pub Date : 2018-02-23 DOI: 10.1109/CRV.2018.00015
Alaaeldin El-Nouby, Graham W. Taylor
{"title":"Real-Time End-to-End Action Detection with Two-Stream Networks","authors":"Alaaeldin El-Nouby, Graham W. Taylor","doi":"10.1109/CRV.2018.00015","DOIUrl":"https://doi.org/10.1109/CRV.2018.00015","url":null,"abstract":"Two-stream networks have been very successful for solving the problem of action detection. However, prior work using two-stream networks train both streams separately, which prevents the network from exploiting regularities between the two streams. Moreover, unlike the visual stream, the dominant forms of optical flow computation typically do not maximally exploit GPU parallelism. We present a real-time end-to-end trainable two-stream network for action detection. First, we integrate the optical flow computation in our framework by using Flownet2. Second, we apply early fusion for the two streams and train the whole pipeline jointly end-to-end. Finally, for better network initialization, we transfer from the task of action recognition to action detection by pre-training our framework using the recently released large-scale Kinetics dataset. Our experimental results show that training the pipeline jointly end-to-end with fine-tuning the optical flow for the objective of action detection improves detection performance significantly. Additionally, we observe an improvement when initializing with parameters pre-trained using Kinetics. Last, we show that by integrating the optical flow computation, our framework is more efficient, running at real-time speeds (up to 31 fps).","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131984657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Tiny SSD: A Tiny Single-Shot Detection Deep Convolutional Neural Network for Real-Time Embedded Object Detection 微型固态硬盘:用于实时嵌入式目标检测的微型单镜头检测深度卷积神经网络
2018 15th Conference on Computer and Robot Vision (CRV) Pub Date : 2018-02-19 DOI: 10.1109/CRV.2018.00023
A. Wong, M. Shafiee, Francis Li, Brendan Chwyl
{"title":"Tiny SSD: A Tiny Single-Shot Detection Deep Convolutional Neural Network for Real-Time Embedded Object Detection","authors":"A. Wong, M. Shafiee, Francis Li, Brendan Chwyl","doi":"10.1109/CRV.2018.00023","DOIUrl":"https://doi.org/10.1109/CRV.2018.00023","url":null,"abstract":"Object detection is a major challenge in computer vision, involving both object classification and object localization within a scene. While deep neural networks have been shown in recent years to yield very powerful techniques for tackling the challenge of object detection, one of the biggest challenges with enabling such object detection networks for widespread deployment on embedded devices is high computational and memory requirements. Recently, there has been an increasing focus in exploring small deep neural network architectures for object detection that are more suitable for embedded devices, such as Tiny YOLO and SqueezeDet. Inspired by the efficiency of the Fire microarchitecture introduced in SqueezeNet and the object detection performance of the singleshot detection macroarchitecture introduced in SSD, this paper introduces Tiny SSD, a single-shot detection deep convolutional neural network for real-time embedded object detection that is composed of a highly optimized, non-uniform Fire subnetwork stack and a non-uniform sub-network stack of highly optimized SSD-based auxiliary convolutional feature layers designed specifically to minimize model size while maintaining object detection performance. The resulting Tiny SSD possess a model size of 2.3MB (~26X smaller than Tiny YOLO) while still achieving an mAP of 61.3% on VOC 2007 (~4.2% higher than Tiny YOLO). These experimental results show that very small deep neural network architectures can be designed for real-time object detection that are well-suited for embedded scenarios.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122362424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 127
Nature vs. Nurture: The Role of Environmental Resources in Evolutionary Deep Intelligence 先天vs.后天:环境资源在进化深度智能中的作用
2018 15th Conference on Computer and Robot Vision (CRV) Pub Date : 2018-02-09 DOI: 10.1109/CRV.2018.00058
A. Chung, P. Fieguth, A. Wong
{"title":"Nature vs. Nurture: The Role of Environmental Resources in Evolutionary Deep Intelligence","authors":"A. Chung, P. Fieguth, A. Wong","doi":"10.1109/CRV.2018.00058","DOIUrl":"https://doi.org/10.1109/CRV.2018.00058","url":null,"abstract":"Evolutionary deep intelligence synthesizes highly efficient deep neural network architectures over successive generations. Inspired by the nature versus nurture debate, we propose a study to examine the role of external factors on the network synthesis process by varying the availability of simulated environmental resources. Experimental results were obtained for networks synthesized via asexual evolutionary synthesis (1-parent) and sexual evolutionary synthesis (2-parent, 3-parent, and 5-parent) using a 10% subset of the MNIST dataset. Results show that a lower environmental factor model resulted in a more gradual loss in performance accuracy and decrease in storage size. This potentially allows significantly reduced storage size with minimal to no drop in performance accuracy, and the best networks were synthesized using the lowest environmental factor models.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126515130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
In Defense of Classical Image Processing: Fast Depth Completion on the CPU 经典图像处理的防御:CPU上的快速深度完成
2018 15th Conference on Computer and Robot Vision (CRV) Pub Date : 2018-01-31 DOI: 10.1109/CRV.2018.00013
Jason Ku, Ali Harakeh, Steven L. Waslander
{"title":"In Defense of Classical Image Processing: Fast Depth Completion on the CPU","authors":"Jason Ku, Ali Harakeh, Steven L. Waslander","doi":"10.1109/CRV.2018.00013","DOIUrl":"https://doi.org/10.1109/CRV.2018.00013","url":null,"abstract":"With the rise of data driven deep neural networks as a realization of universal function approximators, most research on computer vision problems has moved away from handcrafted classical image processing algorithms. This paper shows that with a well designed algorithm, we are capable of outperforming neural network based methods on the task of depth completion. The proposed algorithm is simple and fast, runs on the CPU, and relies only on basic image processing operations to perform depth completion of sparse LIDAR depth data. We evaluate our algorithm on the challenging KITTI depth completion benchmark, and at the time of submission, our method ranks first on the KITTI test server among all published methods. Furthermore, our algorithm is data independent, requiring no training data to perform the task at hand. The code written in Python is publicly available at https://github.com/kujason/ip_basic","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121340911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 197
Learning a Bias Correction for Lidar-Only Motion Estimation 学习仅激光雷达运动估计的偏差校正
2018 15th Conference on Computer and Robot Vision (CRV) Pub Date : 2018-01-15 DOI: 10.1109/CRV.2018.00032
T. Y. Tang, David J. Yoon, F. Pomerleau, T. Barfoot
{"title":"Learning a Bias Correction for Lidar-Only Motion Estimation","authors":"T. Y. Tang, David J. Yoon, F. Pomerleau, T. Barfoot","doi":"10.1109/CRV.2018.00032","DOIUrl":"https://doi.org/10.1109/CRV.2018.00032","url":null,"abstract":"This paper presents a novel technique to correct for bias in a classical estimator using a learning approach. We apply a learned bias correction to a lidar-only motion estimation pipeline. Our technique trains a Gaussian process (GP) regression model using data with ground truth. The inputs to the model are high-level features derived from the geometry of the point-clouds, and the outputs are the predicted biases between poses computed by the estimator and the ground truth. The predicted biases are applied as a correction to the poses computed by the estimator. Our technique is evaluated on over 50km of lidar data, which includes the KITTI odometry benchmark and lidar datasets collected around the University of Toronto campus. After applying the learned bias correction, we obtained significant improvements to lidar odometry in all datasets tested. We achieved around 10% reduction in errors on all datasets from an already accurate lidar odometry algorithm, at the expense of only less than 1% increase in computational cost at run-time.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129866855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Real-Time Deep Hair Matting on Mobile Devices 移动设备上的实时深层毛发铺垫
2018 15th Conference on Computer and Robot Vision (CRV) Pub Date : 2017-12-19 DOI: 10.1109/CRV.2018.00011
Alex Levinshtein, Cheng Chang, Edmund Phung, I. Kezele, W. Guo, P. Aarabi
{"title":"Real-Time Deep Hair Matting on Mobile Devices","authors":"Alex Levinshtein, Cheng Chang, Edmund Phung, I. Kezele, W. Guo, P. Aarabi","doi":"10.1109/CRV.2018.00011","DOIUrl":"https://doi.org/10.1109/CRV.2018.00011","url":null,"abstract":"Augmented reality is an emerging technology in many application domains. Among them is the beauty industry, where live virtual try-on of beauty products is of great importance. In this paper, we address the problem of live hair color augmentation. To achieve this goal, hair needs to be segmented quickly and accurately. We show how a modified MobileNet CNN architecture can be used to segment the hair in real-time. Instead of training this network using large amounts of accurate segmentation data, which is difficult to obtain, we use crowd sourced hair segmentation data. While such data is much simpler to obtain, the segmentations there are noisy and coarse. Despite this, we show how our system can produce accurate and fine-detailed hair mattes, while running at over 30 fps on an iPad Pro tablet.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130499686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
WAYLA - Generating Images from Eye Movements WAYLA——从眼球运动中生成图像
2018 15th Conference on Computer and Robot Vision (CRV) Pub Date : 2017-11-21 DOI: 10.1109/CRV.2018.00026
Bingqing Yu, James J. Clark
{"title":"WAYLA - Generating Images from Eye Movements","authors":"Bingqing Yu, James J. Clark","doi":"10.1109/CRV.2018.00026","DOIUrl":"https://doi.org/10.1109/CRV.2018.00026","url":null,"abstract":"We present a method for reconstructing images viewed by observers based only on their eye movements. By exploring the relationships between gaze patterns and image stimuli, the What Are You Looking At?\" (WAYLA) system has the goal of synthesizing photo-realistic images that are similar to the original pictures being viewed. The WAYLA approach is based on the Conditional Generative Adversarial Network (Conditional GAN) image-to-image translation technique of Isola et al. We consider two specific applications - the first of reconstructing newspaper images from gaze heat maps and the second of detailed reconstruction of images containing only text. The newspaper image reconstruction process is divided into two image-to-image translation operations the first mapping gaze heat maps into image segmentations and the second mapping the generated segmentation into a newspaper image. We validate the performance of our approach using various evaluation metrics along with human visual inspection. All results confirm the ability of our network to perform image generation tasks using eye tracking data","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"150 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131428190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信