{"title":"Multi-scale Voxel Hashing and Efficient 3D Representation for Mobile Augmented Reality","authors":"Yi Xu, Yuzhang Wu, Hui Zhou","doi":"10.1109/CVPRW.2018.00200","DOIUrl":"https://doi.org/10.1109/CVPRW.2018.00200","url":null,"abstract":"In recent years, Visual-Inertial Odometry (VIO) technologies have been making great strides in both research community and industry. With the development of ARKit and ARCore, mobile Augmented Reality (AR) applications have become popular. However, collision detection and avoidance is largely un-addressed with these applications. In this paper, we present an efficient multi-scale voxel hashing algorithm for representing a 3D environment using a set of multi-scale voxels. The input to our algorithm is the 3D point cloud generated by a VIO system (e.g., ARKit). We show that our method can process the 3D points and convert them into multi-scale 3D representation in real time, while maintaining a small memory footprint. The 3D representation can be used to efficiently detect collision between digital objects and real objects in an environment in AR applications.","PeriodicalId":150600,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129318400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Light Field Depth Estimation on Off-the-Shelf Mobile GPU","authors":"Andre Ivan, Williem, I. Park","doi":"10.1109/CVPRW.2018.00106","DOIUrl":"https://doi.org/10.1109/CVPRW.2018.00106","url":null,"abstract":"While novel light processing algorithms have been continuously introduced, it is still challenging to perform light field processing on a mobile device with limited computation resource due to the high dimensionality of light field data. Recently, the performance of mobile graphics processing unit (GPU) increases rapidly and GPGPU on mobile GPU utilizes massive parallel computation to solve various computer vision problems with high computational complexity. To show the potential capability of light field processing on mobile GPU, we parallelize and optimize the state-of-the-art light field depth estimation which is essential to many light field applications. We employ both algorithm and kernel-based optimization to enable light field processing on mobile GPU. Light field processing involves independent pixel processing with intensive floating-point operations that can be vectorized to match single instruction multiple data (SIMD) style of GPU architecture. We design efficient memory access, caching, and prefetching to exploit light field properties. The experimental result shows that the light field depth estimation on mobile GPU obtains comparable performance as on the desktop CPU. The proposed optimization method gains up to 25 times speedup compared to the naïve baseline method.","PeriodicalId":150600,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122145806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Evgeny Smirnov, A. Melnikov, A. Oleinik, Elizaveta Ivanova, I. Kalinovskiy, Eugene Luckyanets
{"title":"Hard Example Mining with Auxiliary Embeddings","authors":"Evgeny Smirnov, A. Melnikov, A. Oleinik, Elizaveta Ivanova, I. Kalinovskiy, Eugene Luckyanets","doi":"10.1109/CVPRW.2018.00013","DOIUrl":"https://doi.org/10.1109/CVPRW.2018.00013","url":null,"abstract":"Hard example mining is an important part of the deep embedding learning. Most methods perform it at the mini-batch level. However, in the large-scale settings there is only a small chance that proper examples will appear in the same mini-batch and will be coupled into the hard example pairs or triplets. Doppelganger mining was previously proposed to increase this chance by means of class-wise similarity. This method ensures that examples of similar classes are sampled into the same mini-batch together. One of the drawbacks of this method is that it operates only at the class level, while there also might be a way to select appropriate examples within class in a more elaborated way than randomly. In this paper, we propose to use auxiliary embeddings for hard example mining. These embeddings are constructed in such way that similar examples have close embeddings in the cosine similarity sense. With the help of these embeddings it is possible to select new examples for the mini-batch based on their similarity with the already selected examples. We propose several ways to create auxiliary embeddings and use them to increase the number of potentially hard positive and negative examples in each mini-batch. Our experiments on the challenging Disguised Faces in the Wild (DFW) dataset show that hard example mining with auxiliary embeddings improves the discriminative power of learned representations.","PeriodicalId":150600,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127028431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Weitao Feng, Deyi Ji, Yiru Wang, Shuorong Chang, Hansheng Ren, Weihao Gan
{"title":"Challenges on Large Scale Surveillance Video Analysis","authors":"Weitao Feng, Deyi Ji, Yiru Wang, Shuorong Chang, Hansheng Ren, Weihao Gan","doi":"10.1109/CVPRW.2018.00017","DOIUrl":"https://doi.org/10.1109/CVPRW.2018.00017","url":null,"abstract":"Large scale surveillance video analysis is one of the most important components in the future artificial intelligent city. It is a very challenging but practical system, consists of multiple functionalities such as object detection, tracking, identification and behavior analysis. In this paper, we try to address three tasks hosted in NVIDIA AI City Challenge contest. First, a system that transforming the image coordinate to world coordinate has been proposed, which is useful to estimate the vehicle speed on the road. Second, anomalies like car crash event and stalled vehicles can be found by the proposed anomaly detector framework. Third, multiple camera vehicle re-identification problem has been investigated and a matching algorithm is explained. All these tasks are based on our proposed online single camera multiple object tracking (MOT) system, which has been evaluated on the widely used MOT16 challenge benchmark. We show that it achieves the best performance compared to the state-of-the-art methods. Besides of MOT, we evaluate the proposed vehicle re-identification model on VeRi-776 dataset and it outperforms all other methods with a large margin.","PeriodicalId":150600,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127098428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Residual Inception Skip Network for Binary Segmentation","authors":"Jigar Doshi","doi":"10.1109/CVPRW.2018.00037","DOIUrl":"https://doi.org/10.1109/CVPRW.2018.00037","url":null,"abstract":"This paper summarizes our approach to the Deep Globe Road Extraction challenge 2018. In this challenge we are tasked to find road networks from satellite images. First, we explain our U-Net type baseline model for the challenge. Second, we explain a new architecture that takes in the lessons from some of the popular approaches that we call Residual Inception Skip Net. Finally, we outline our cyclic learning rate based ensembling approach which improved the overall single model performance and the final solution for submission. Our final model increases the IoU by 3 points over the baseline.","PeriodicalId":150600,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114138916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Advertisement Effectiveness Estimation Based on Crowdsourced Multimodal Affective Responses","authors":"Genki Okada, Kenta Masui, N. Tsumura","doi":"10.1109/CVPRW.2018.00173","DOIUrl":"https://doi.org/10.1109/CVPRW.2018.00173","url":null,"abstract":"In this paper, we estimate the effectiveness of an advertisement using online data collection and the remote measurement of facial expressions and physiological responses. Recently, the online advertisement market has expanded, and the measurement of advertisement effectiveness has become very important. We collected a significant number of videos of Japanese faces watching video advertisements in the same scenario in which media is normally used via the Internet. Facial expression and physiological responses such as heart rate and gaze were remotely measured by analyzing facial videos. By combining the measured responses into multimodal features and using machine learning, we show that ad liking can be predicted (ROC AUC = 0.93) better than when only single-mode features are used. Furthermore, intent to purchase can be estimated well (ROC AUC = 0.91) using multimodal features.","PeriodicalId":150600,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121875355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Face Verification with Disguise Variations via Deep Disguise Recognizer","authors":"Naman Kohli, Daksha Yadav, A. Noore","doi":"10.1109/CVPRW.2018.00010","DOIUrl":"https://doi.org/10.1109/CVPRW.2018.00010","url":null,"abstract":"The performance of current automatic face recognition algorithms is hindered by different covariates such as facial aging, disguises, and pose variations. Specifically, disguises are employed for intentional or unintentional modifications in the facial appearance for hiding one's own identity or impersonating someone else's identity. In this paper, we utilize deep learning based transfer learning approach for face verification with disguise variations. We employ Residual Inception network framework with center loss for learning inherent face representations. The training for the Inception-ResNet model is performed using a large-scale face database which is followed by inductive transfer learning to mitigate the impact of facial disguises. To evaluate the performance of the proposed Deep Disguise Recognizer (DDR) framework, Disguised Faces in the Wild and IIIT-Delhi Disguise Version 1 face databases are used. Experimental evaluation reveals that for the two databases, the proposed DDR framework yields 90.36% and 66.9% face verification accuracy at the false accept rate of 10%.","PeriodicalId":150600,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":" 10","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114053316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuki Hiramatsu, K. Hotta, Ayako Imanishi, M. Matsuda, Kenta Terai
{"title":"Cell Image Segmentation by Integrating Multiple CNNs","authors":"Yuki Hiramatsu, K. Hotta, Ayako Imanishi, M. Matsuda, Kenta Terai","doi":"10.1109/CVPRW.2018.00296","DOIUrl":"https://doi.org/10.1109/CVPRW.2018.00296","url":null,"abstract":"Convolutional Neural Network is valid for segmentation of objects in an image. In recent years, it is beginning to be applied to the field of medicine and cell biology. In semantic segmentation, the accuracy has been improved by using single deeper neural network. However, the accuracy is saturated for difficult segmentation tasks. In this paper, we propose a semantic segmentation method by integrating multiple CNNs adaptively. This method consists of a gating network and multiple expert networks. Expert network outputs the segmentation result for an input image. Gating network automatically divides the input image into several sub-problems and assigns them to expert networks. Thus, each expert network solves only the specific problem, and our proposed method is possible to learn more efficiently than single deep neural network. We evaluate the proposed method on the segmentation problem of cell membrane and nucleus. The proposed method improved the segmentation accuracy in comparison with single deep neural network.","PeriodicalId":150600,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124087646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pedro A. Marín-Reyes, Andrea Palazzi, Luca Bergamini, S. Calderara, J. Lorenzo-Navarro, R. Cucchiara
{"title":"Unsupervised Vehicle Re-identification Using Triplet Networks","authors":"Pedro A. Marín-Reyes, Andrea Palazzi, Luca Bergamini, S. Calderara, J. Lorenzo-Navarro, R. Cucchiara","doi":"10.1109/CVPRW.2018.00030","DOIUrl":"https://doi.org/10.1109/CVPRW.2018.00030","url":null,"abstract":"Vehicle re-identification plays a major role in modern smart surveillance systems. Specifically, the task requires the capability to predict the identity of a given vehicle, given a dataset of known associations, collected from different views and surveillance cameras. Generally, it can be cast as a ranking problem: given a probe image of a vehicle, the model needs to rank all database images based on their similarities w.r.t the probe image. In line with recent research, we devise a metric learning model that employs a supervision based on local constraints. In particular, we leverage pairwise and triplet constraints for training a network capable of assigning a high degree of similarity to samples sharing the same identity, while keeping different identities distant in feature space. Eventually, we show how vehicle tracking can be exploited to automatically generate a weakly labelled dataset that can be used to train the deep network for the task of vehicle re-identification. Learning and evaluation is carried out on the NVIDIA AI city challenge videos.","PeriodicalId":150600,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"430 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121162209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sriharsha Koundinya, Himanshu Sharma, Manoj Sharma, Avinash Upadhyay, Raunak Manekar, Rudrabha Mukhopadhyay, A. Karmakar, S. Chaudhury
{"title":"2D-3D CNN Based Architectures for Spectral Reconstruction from RGB Images","authors":"Sriharsha Koundinya, Himanshu Sharma, Manoj Sharma, Avinash Upadhyay, Raunak Manekar, Rudrabha Mukhopadhyay, A. Karmakar, S. Chaudhury","doi":"10.1109/CVPRW.2018.00129","DOIUrl":"https://doi.org/10.1109/CVPRW.2018.00129","url":null,"abstract":"Hyperspectral cameras are used to preserve fine spectral details of scenes that are not captured by traditional RGB cameras that comprehensively quantizes radiance in RGB images. Spectral details provide additional information that improves the performance of numerous image based analytic applications, but due to high hyperspectral hardware cost and associated physical constraints, hyperspectral images are not easily available for further processing. Motivated by the performance of deep learning for various computer vision applications, we propose a 2D convolution neural network and a 3D convolution neural network based approaches for hyperspectral image reconstruction from RGB images. A 2D-CNN model primarily focuses on extracting spectral data by considering only spatial correlation of the channels in the image, while in 3D-CNN model the inter-channel co-relation is also exploited to refine the extraction of spectral data. Our 3D-CNN based architecture achieves very good performance in terms of MRAE and RMSE. In contrast to 3D-CNN, our 2D-CNN based architecture also achieves comparable performance with very less computational complexity.","PeriodicalId":150600,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129004725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}