2019 IEEE Winter Conference on Applications of Computer Vision (WACV)最新文献

筛选
英文 中文
Location-Velocity Attention for Pedestrian Trajectory Prediction 行人轨迹预测的位置-速度注意
2019 IEEE Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00221
Hao Xue, D. Huynh, Mark Reynolds
{"title":"Location-Velocity Attention for Pedestrian Trajectory Prediction","authors":"Hao Xue, D. Huynh, Mark Reynolds","doi":"10.1109/WACV.2019.00221","DOIUrl":"https://doi.org/10.1109/WACV.2019.00221","url":null,"abstract":"Pedestrian path forecasting is crucial in applications such as smart video surveillance. It is a challenging task because of the complex crowd movement patterns in the scenes. Most of existing state-of-the-art LSTM based prediction methods require rich context like labelled static obstacles, labelled entrance/exit regions and even the background scene. Furthermore, incorporating contextual information into trajectory prediction increases the computational overhead and decreases the generalization of the prediction models across different scenes. In this paper, we propose a joint Location-Velocity Attention LSTM based method to predict trajectories. Specifically, a module is designed to tweak the LSTM network and an attention mechanism is trained to learn to optimally combine the location and the velocity information of pedestrians in the prediction process. We have evaluated our approach against other baselines and state-of-the-art methods on several publicly available datasets. The results show that it not only outperforms other prediction methods but it also has a good generalization ability.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121965156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Bringing Vision to the Blind: From Coarse to Fine, One Dollar at a Time 为盲人带来视力:从粗糙到精细,一次一美元
2019 IEEE Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00057
T. Huynh, J. Pillai, Eunyoung Kim, Kristen Aw, Jack Sim, Ken Goldman, Rui Min
{"title":"Bringing Vision to the Blind: From Coarse to Fine, One Dollar at a Time","authors":"T. Huynh, J. Pillai, Eunyoung Kim, Kristen Aw, Jack Sim, Ken Goldman, Rui Min","doi":"10.1109/WACV.2019.00057","DOIUrl":"https://doi.org/10.1109/WACV.2019.00057","url":null,"abstract":"While deep learning has achieved great success in building vision applications for mainstream users, there is relatively less work for the blind and visually impaired to have a personal, on-device visual assistant for their daily life. Unlike mainstream applications, vision system for the blind must be robust, reliable and safe-to-use. In this paper, we propose a fine-grained currency recognizer based on CONGAS, which significantly surpasses other popular local features by a large margin. In addition, we introduce an effective and light-weight coarse classifier that gates the fine-grained recognizer on resource-constrained mobile devices. The coarse-to-fine approach is orchestrated to provide an extensible mobile-vision architecture, that demonstrates how the benefits of coordinating deep learning and local feature based methods can help in resolving a challenging problem for the blind and visually impaired. The proposed system runs in real-time with ~150ms latency on a Pixel device, and achieved 98% precision and 97% recall on a challenging evaluation set.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129139329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Comparative Analysis of Visual-Inertial SLAM for Assisted Wayfinding of the Visually Impaired 视觉惯性SLAM在视障人士辅助寻路中的比较分析
2019 IEEE Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00028
He Zhang, Lingqiu Jin, H. Zhang, C. Ye
{"title":"A Comparative Analysis of Visual-Inertial SLAM for Assisted Wayfinding of the Visually Impaired","authors":"He Zhang, Lingqiu Jin, H. Zhang, C. Ye","doi":"10.1109/WACV.2019.00028","DOIUrl":"https://doi.org/10.1109/WACV.2019.00028","url":null,"abstract":"This paper compares the performance of three state-of-the-art visual-inertial simultaneous localization and mapping (SLAM) methods in the context of assisted wayfinding of the visually impaired. Specifically, we analyze their strengths and weaknesses for assisted wayfinding of a robotic navigation aid (RNA). Based on the analysis, we select the best visual-inertial SLAM method for the RNA application and extend the method by integrating with it a method capable of detecting loops caused by the RNA's unique motion pattern. By incorporating the loop closures in the graph and optimization process, the extended visual-inertial SLAM method reduces the pose estimation error. The experimental results with our own datasets and the TUM VI benchmark datasets confirm the advantage of the selected method over the other two and validate the efficacy of the extended method.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"2007 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127486251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Iris Recognition: Comparing Visible-Light Lateral and Frontal Illumination to NIR Frontal Illumination 虹膜识别:比较可见光侧面和正面照明与近红外正面照明
2019 IEEE Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00097
Daniel P. Benalcazar, C. Pérez, Diego Bastias, K. Bowyer
{"title":"Iris Recognition: Comparing Visible-Light Lateral and Frontal Illumination to NIR Frontal Illumination","authors":"Daniel P. Benalcazar, C. Pérez, Diego Bastias, K. Bowyer","doi":"10.1109/WACV.2019.00097","DOIUrl":"https://doi.org/10.1109/WACV.2019.00097","url":null,"abstract":"In most iris recognition systems the texture of the iris image is either the result of the interaction between the iris and Near Infrared (NIR) light, or between the iris pigmentation and visible-light. The iris, however, is a three-dimensional organ, and the information contained on its relief is not being exploited completely. In this article, we present an image acquisition method that enhances viewing the structural information of the iris. Our method consists of adding lateral illumination to the visible light frontal illumination to capture the structural information of the muscle fibers of the iris on the resulting image. These resulting images contain highly textured patterns of the iris. To test our method, we collected a database of 1,920 iris images using both a conventional NIR device, and a custom-made device that illuminates the eye in lateral and frontal angles with visible-light (LFVL). Then, we compared the iris recognition performance of both devices by means of a Hamming distance distribution analysis among the corresponding binary iris codes. The ROC curves show that our method produced more separable distributions than those of the NIR device, and much better distribution than using frontal visible-light alone. Eliminating errors produced by images captured with different iris dilation (13 cases), the NIR produced inter-class and intra-class distributions that are completely separable as in the case of LFVL. This acquisition method could also be useful for 3D iris scanning.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"191 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129237288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Multi-Component Image Translation for Deep Domain Generalization 基于深度域泛化的多分量图像平移
2019 IEEE Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2018-12-21 DOI: 10.1109/WACV.2019.00067
Mohammad Mahfujur Rahman, C. Fookes, Mahsa Baktash, S. Sridharan
{"title":"Multi-Component Image Translation for Deep Domain Generalization","authors":"Mohammad Mahfujur Rahman, C. Fookes, Mahsa Baktash, S. Sridharan","doi":"10.1109/WACV.2019.00067","DOIUrl":"https://doi.org/10.1109/WACV.2019.00067","url":null,"abstract":"Domain adaption (DA) and domain generalization (DG) are two closely related methods which are both concerned with the task of assigning labels to an unlabeled data set. The only dissimilarity between these approaches is that DA can access the target data during the training phase, while the target data is totally unseen during the training phase in DG. The task of DG is challenging as we have no earlier knowledge of the target samples. If DA methods are applied directly to DG by a simple exclusion of the target data from training, poor performance will result for a given task. In this paper, we tackle the domain generalization challenge in two ways. In our first approach, we propose a novel deep domain generalization architecture utilizing synthetic data generated by a Generative Adversarial Network (GAN). The discrepancy between the generated images and synthetic images is minimized using existing domain discrepancy metrics such as maximum mean discrepancy or correlation alignment. In our second approach, we introduce a protocol for applying DA methods to a DG scenario by excluding the target data from the training phase, splitting the source data to training and validation parts, and treating the validation data as target data for DA. We conduct extensive experiments on four cross-domain benchmark datasets. Experimental results signify our proposed model outperforms the current state-of-the-art methods for DG.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116613401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 54
DAC: Data-Free Automatic Acceleration of Convolutional Networks 卷积网络的无数据自动加速
2019 IEEE Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2018-12-20 DOI: 10.1109/WACV.2019.00175
Xin Li, Shuai Zhang, Bolan Jiang, Y. Qi, M. Chuah, N. Bi
{"title":"DAC: Data-Free Automatic Acceleration of Convolutional Networks","authors":"Xin Li, Shuai Zhang, Bolan Jiang, Y. Qi, M. Chuah, N. Bi","doi":"10.1109/WACV.2019.00175","DOIUrl":"https://doi.org/10.1109/WACV.2019.00175","url":null,"abstract":"Deploying a deep learning model on mobile/IoT devices is a challenging task. The difficulty lies in the trade-off between computation speed and accuracy. A complex deep learning model with high accuracy runs slowly on resource-limited devices, while a light-weight model that runs much faster loses accuracy. In this paper, we propose a novel decomposition method, namely DAC, that is capable of factorizing an ordinary convolutional layer into two layers with much fewer parameters. DAC computes the corresponding weights for the newly generated layers directly from the weights of the original convolutional layer. Thus, no training (or fine-tuning) or any data is needed. The experimental results show that DAC reduces a large number of floating-point operations (FLOPs) while maintaining high accuracy of a pre-trained model. If 2% accuracy drop is acceptable, DAC saves 53% FLOPs of VGG16 image classification model on ImageNet dataset, 29% FLOPS of SSD300 object detection model on PASCAL VOC2007 dataset, and 46% FLOPS of a multi-person pose estimation model on Microsoft COCO dataset. Compared to other existing decomposition methods, DAC achieves better performance.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"68 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128725176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
SfMLearner++: Learning Monocular Depth & Ego-Motion Using Meaningful Geometric Constraints SfMLearner++:使用有意义的几何约束学习单目深度和自我运动
2019 IEEE Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2018-12-20 DOI: 10.1109/WACV.2019.00226
V. Prasad, B. Bhowmick
{"title":"SfMLearner++: Learning Monocular Depth & Ego-Motion Using Meaningful Geometric Constraints","authors":"V. Prasad, B. Bhowmick","doi":"10.1109/WACV.2019.00226","DOIUrl":"https://doi.org/10.1109/WACV.2019.00226","url":null,"abstract":"Most geometric approaches to monocular Visual Odometry (VO) provide robust pose estimates, but sparse or semi-dense depth estimates. Off late, deep methods have shown good performance in generating dense depths and VO from monocular images by optimizing the photometric consistency between images. Despite being intuitive, a naive photometric loss does not ensure proper pixel correspondences between two views, which is the key factor for accurate depth and relative pose estimations. It is a well known fact that simply minimizing such an error is prone to failures. We propose a method using Epipolar constraints to make the learning more geometrically sound. We use the Essential matrix, obtained using Nistér's Five Point Algorithm, for enforcing meaningful geometric constraints on the loss, rather than using it as labels for training. Our method, although simplistic but more geometrically meaningful, uses lesser number of parameters to give a comparable performance to state-of-the-art methods which use complex losses and large networks showing the effectiveness of using epipolar constraints. Such a geometrically constrained learning method performs successfully even in cases where simply minimizing the photometric error would fail.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123576278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Learning On-Road Visual Control for Self-Driving Vehicles With Auxiliary Tasks 具有辅助任务的自动驾驶汽车的道路视觉控制学习
2019 IEEE Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2018-12-19 DOI: 10.1109/WACV.2019.00041
Yilun Chen, Praveen Palanisamy, P. Mudalige, Katharina Muelling, J. Dolan
{"title":"Learning On-Road Visual Control for Self-Driving Vehicles With Auxiliary Tasks","authors":"Yilun Chen, Praveen Palanisamy, P. Mudalige, Katharina Muelling, J. Dolan","doi":"10.1109/WACV.2019.00041","DOIUrl":"https://doi.org/10.1109/WACV.2019.00041","url":null,"abstract":"A safe and robust on-road navigation system is a crucial component of achieving fully automated vehicles. NVIDIA recently proposed an End-to-End algorithm that can directly learn steering commands from raw pixels of a front camera by using one convolutional neural network. In this paper, we leverage auxiliary information aside from raw images and design a novel network structure, called Auxiliary Task Network (ATN), to help boost the driving performance while maintaining the advantage of minimal training data and an End-to-End training method. In this network, we introduce human prior knowledge into vehicle navigation by transferring features from image recognition tasks. Image semantic segmentation is applied as an auxiliary task for navigation. We consider temporal information by introducing an LSTM module and optical flow to the network. Finally, we combine vehicle kinematics with a sensor fusion step. We discuss the benefits of our method over state-of-the-art visual navigation methods both in the Udacity simulation environment and on the real-world Comma.ai dataset.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124941668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Model-Free Tracking With Deep Appearance and Motion Features Integration 无模型跟踪与深度外观和运动特征的集成
2019 IEEE Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2018-12-16 DOI: 10.1109/WACV.2019.00018
Xiaolong Jiang, Peizhao Li, Xiantong Zhen, Xianbin Cao
{"title":"Model-Free Tracking With Deep Appearance and Motion Features Integration","authors":"Xiaolong Jiang, Peizhao Li, Xiantong Zhen, Xianbin Cao","doi":"10.1109/WACV.2019.00018","DOIUrl":"https://doi.org/10.1109/WACV.2019.00018","url":null,"abstract":"Being able to track an anonymous object, a model-free tracker is comprehensively applicable regardless of the target type. However, designing such a generalized framework is challenged by the lack of object-oriented prior information. As one solution, a real-time model-free object tracking approach is designed in this work relying on Convolutional Neural Networks (CNNs). To overcome the object-centric information scarcity, both appearance and motion features are deeply integrated by the proposed AMNet, which is an end-to-end offline trained two-stream network. Between the two parallel streams, the ANet investigates appearance features with a multi-scale Siamese atrous CNN, enabling the tracking-by-matching strategy. The MNet achieves deep motion detection to localize anonymous moving objects by processing generic motion features. The final tracking result at each frame is generated by fusing the output response maps from both sub-networks. The proposed AMNet reports leading performance on both OTB and VOT benchmark datasets with favorable real-time processing speed.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"389 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124801161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Action Quality Assessment Across Multiple Actions 跨多个行动的行动质量评估
2019 IEEE Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2018-12-15 DOI: 10.1109/WACV.2019.00161
Paritosh Parmar, B. Morris
{"title":"Action Quality Assessment Across Multiple Actions","authors":"Paritosh Parmar, B. Morris","doi":"10.1109/WACV.2019.00161","DOIUrl":"https://doi.org/10.1109/WACV.2019.00161","url":null,"abstract":"Can learning to measure the quality of an action help in measuring the quality of other actions? If so, can consolidated samples from multiple actions help improve the performance of current approaches? In this paper, we carry out experiments to see if knowledge transfer is possible in the action quality assessment (AQA) setting. Experiments are carried out on our newly released AQA dataset (http://rtis.oit.unlv.edu/datasets.html) consisting of 1106 action samples from seven actions with quality as measured by expert human judges. Our experimental results show that there is utility in learning a single model across multiple actions.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115698192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 78
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信