{"title":"Visual Tracking of Small Animals in Cluttered Natural Environments Using a Freely Moving Camera","authors":"B. Risse, M. Mangan, B. Webb, Luca Del Pero","doi":"10.1109/ICCVW.2017.335","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.335","url":null,"abstract":"Image-based tracking of animals in their natural habitats can provide rich behavioural data, but is very challenging due to complex and dynamic background and target appearances. We present an effective method to recover the positions of terrestrial animals in cluttered environments from video sequences filmed using a freely moving monocular camera. The method uses residual motion cues to detect the targets and is thus robust to different lighting conditions and requires no a-priori appearance model of the animal or environment. The detection is globally optimised based on an inference problem formulation using factor graphs. This handles ambiguities such as occlusions and intersections and provides automatic initialisation. Furthermore, this formulation allows a seamless integration of occasional user input for the most difficult situations, so that the effect of a few manual position estimates are smoothly distributed over long sequences. Testing our system against a benchmark dataset featuring small targets in natural scenes, we obtain 96% accuracy for fully automated tracking. We also demonstrate reliable tracking in a new data set that includes different targets (insects, vertebrates or artificial objects) in a variety of environments (desert, jungle, meadows, urban) using different imaging devices (day / night vision cameras, smart phones) and modalities (stationary, hand-held, drone operated).","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114726560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Guerry, Alexandre Boulch, B. L. Saux, J. Moras, A. Plyer, David Filliat
{"title":"SnapNet-R: Consistent 3D Multi-view Semantic Labeling for Robotics","authors":"J. Guerry, Alexandre Boulch, B. L. Saux, J. Moras, A. Plyer, David Filliat","doi":"10.1109/ICCVW.2017.85","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.85","url":null,"abstract":"In this paper we present a new approach for semantic recognition in the context of robotics. When a robot evolves in its environment, it gets 3D information given either by its sensors or by its own motion through 3D reconstruction. Our approach uses (i) 3D-coherent synthesis of scene observations and (ii) mix them in a multi-view framework for 3D labeling. (iii) This is efficient locally (for 2D semantic segmentation) and globally (for 3D structure labeling). This allows to add semantics to the observed scene that goes beyond simple image classification, as shown on challenging datasets such as SUNRGBD or the 3DRMS Reconstruction Challenge.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126552917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maxime Portaz, Matthias Kohl, G. Quénot, J. Chevallet
{"title":"Fully Convolutional Network and Region Proposal for Instance Identification with Egocentric Vision","authors":"Maxime Portaz, Matthias Kohl, G. Quénot, J. Chevallet","doi":"10.1109/ICCVW.2017.281","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.281","url":null,"abstract":"This paper presents a novel approach for egocentric image retrieval and object detection. This approach uses fully convolutional networks (FCN) to obtain region proposals without the need for an additional component in the network and training. It is particularly suited for small datasets with low object variability. The proposed network can be trained end-to-end and produces an effective global descriptor as an image representation. Additionally, it can be built upon any type of CNN pre-trained for classification. Through multiple experiments on two egocentric image datasets taken from museum visits, we show that the descriptor obtained using our proposed network outperforms those from previous state-of-the-art approaches. It is also just as memory-efficient, making it adapted to mobile devices such as an augmented museum audio-guide.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"207 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134041516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jun Wan, Sergio Escalera, G. Anbarjafari, H. Escalante, Xavier Baró, Isabelle M Guyon, Meysam Madadi, J. Allik, Jelena Gorbova, Chi Lin, Yiliang Xie
{"title":"Results and Analysis of ChaLearn LAP Multi-modal Isolated and Continuous Gesture Recognition, and Real Versus Fake Expressed Emotions Challenges","authors":"Jun Wan, Sergio Escalera, G. Anbarjafari, H. Escalante, Xavier Baró, Isabelle M Guyon, Meysam Madadi, J. Allik, Jelena Gorbova, Chi Lin, Yiliang Xie","doi":"10.1109/ICCVW.2017.377","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.377","url":null,"abstract":"We analyze the results of the 2017 ChaLearn Looking at People Challenge at ICCV The challenge comprised three tracks: (1) large-scale isolated (2) continuous gesture recognition, and (3) real versus fake expressed emotions tracks. It is the second round for both gesture recognition challenges, which were held first in the context of the ICPR 2016 workshop on \"multimedia challenges beyond visual analysis\". In this second round, more participants joined the competitions, and the performances considerably improved compared to the first round. Particularly, the best recognition accuracy of isolated gesture recognition has improved from 56.90% to 67.71% in the IsoGD test set, and Mean Jaccard Index (MJI) of continuous gesture recognition has improved from 0.2869 to 0.6103 in the ConGD test set. The third track is the first challenge on real versus fake expressed emotion classification, including six emotion categories, for which a novel database was introduced. The first place was shared between two teams who achieved 67.70% averaged recognition rate on the test set. The data of the three tracks, the participants' code and method descriptions are publicly available to allow researchers to keep making progress in the field.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"330 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134263313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yağmur Güçlütürk, Umut Güçlü, Marc Pérez, H. Escalante, Xavier Baró, C. Andújar, Isabelle M Guyon, Julio C. S. Jacques Junior, Meysam Madadi, Sergio Escalera, M. Gerven, R. Lier
{"title":"Visualizing Apparent Personality Analysis with Deep Residual Networks","authors":"Yağmur Güçlütürk, Umut Güçlü, Marc Pérez, H. Escalante, Xavier Baró, C. Andújar, Isabelle M Guyon, Julio C. S. Jacques Junior, Meysam Madadi, Sergio Escalera, M. Gerven, R. Lier","doi":"10.1109/ICCVW.2017.367","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.367","url":null,"abstract":"Automatic prediction of personality traits is a subjective task that has recently received much attention. Specifically, automatic apparent personality trait prediction from multimodal data has emerged as a hot topic within the filed of computer vision and, more particularly, the so called \"looking at people\" sub-field. Considering \"apparent\" personality traits as opposed to real ones considerably reduces the subjectivity of the task. The real world applications are encountered in a wide range of domains, including entertainment, health, human computer interaction, recruitment and security. Predictive models of personality traits are useful for individuals in many scenarios (e.g., preparing for job interviews, preparing for public speaking). However, these predictions in and of themselves might be deemed to be untrustworthy without human understandable supportive evidence. Through a series of experiments on a recently released benchmark dataset for automatic apparent personality trait prediction, this paper characterizes the audio and visual information that is used by a state-of-the-art model while making its predictions, so as to provide such supportive evidence by explaining predictions made. Additionally, the paper describes a new web application, which gives feedback on apparent personality traits of its users by combining model predictions with their explanations.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127590574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dong Liu, Y. Zhou, Xiaoyan Sun, Zhengjun Zha, Wenjun Zeng
{"title":"Adaptive Pooling in Multi-instance Learning for Web Video Annotation","authors":"Dong Liu, Y. Zhou, Xiaoyan Sun, Zhengjun Zha, Wenjun Zeng","doi":"10.1109/ICCVW.2017.46","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.46","url":null,"abstract":"Web videos are usually weakly annotated, i.e., a tag is associated to a video once the corresponding concept appears in a frame of this video without indicating when and where it occurs. These weakly annotated tags pose big troubles to many Web video applications, e.g. search and recommendation. In this paper, we present a new Web video annotation approach based on multi-instance learning (MIL) with a learnable pooling function. By formulating the Web video annotation as a MIL problem, we present an end-to-end deep network framework to solve this problem in which the frame (instance) level annotation is estimated from tags given at the video (bag of instances) level via a convolutional neural network (CNN). A learnable pooling function is proposed to adaptively fuse the outputs of the CNN to determine tags at the video level. We further propose a new loss function that consists of both bag-level and instance-level losses, which enables the penalty term to be aware of the internal state of network rather than only an overall loss, thus makes the pooling function learned better and faster. Experimental results demonstrate that our proposed framework is able to not only enhance the accuracy of Web video annotation by outperforming the state-of-the-art Web video annotation methods on the large-scale video dataset FCVID, but also help to infer the most relevant frames in Web videos.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"143 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127290844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christian Winkens, Florian Sattler, Veronika Adams, D. Paulus
{"title":"HyKo: A Spectral Dataset for Scene Understanding","authors":"Christian Winkens, Florian Sattler, Veronika Adams, D. Paulus","doi":"10.1109/ICCVW.2017.39","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.39","url":null,"abstract":"We present datasets containing urban traffic and rural road scenes recorded using hyperspectral snap-shot sensors mounted on a moving car. The novel hyperspectral cameras used can capture whole spectral cubes at up to 15 Hz. This emerging new sensor modality enables hyperspectral scene analysis for autonomous driving tasks. Up to the best of the author's knowledge no such dataset has been published so far. The datasets contain synchronized 3-D laser, spectrometer and hyperspectral data. Dense ground truth annotations are provided as semantic labels, material and traversability. The hyperspectral data ranges from visible to near infrared wavelengths. We explain our recoding platform and method, the associated data format along with a code library for easy data consumption. The datasets are publicly available for download.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114988610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Computer Vision Problems in Plant Phenotyping, CVPPP 2017: Introduction to the CVPPP 2017 Workshop Papers","authors":"H. Scharr, T. Pridmore, S. Tsaftaris","doi":"10.1109/ICCVW.2017.236","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.236","url":null,"abstract":"Plant phenotyping is the identification of effects on the phenotype (i.e., the plant appearance and behavior) as a result of genotype differences (i.e., differences in the genetic code) and the environment. Previously, the process of taking phenotypic measurements has been laborious, costly, and time consuming. In recent years, non-invasive, image-based methods have become more common. These images are recorded by a range of capture devices from small embedded camera systems to multi-million Euro smart-greenhouses, at scales ranging from microscopic images of cells, to entire fields captured by UAV imaging. These images needs to be analyzed in a high throughput, robust, and accurate manner.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"474 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115313800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lightweight Monocular Obstacle Avoidance by Salient Feature Fusion","authors":"Andrea Manno-Kovács, Levente Kovács","doi":"10.1109/ICCVW.2017.92","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.92","url":null,"abstract":"We present a monocular obstacle avoidance method based on a novel image feature map built by fusing robust saliency features, to be used in embedded systems on lightweight autonomous vehicles. The fused salient features are a textural-directional Harris based feature map and a relative focus feature map. We present the generation of the fused salient map, along with its application for obstacle avoidance. Evaluations are performed from a saliency point of view, and for the assessment of the method's applicability for obstacle avoidance in simulated environments. The presented results support the usability of the method in embedded systems on lightweight unmanned vehicles.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126828943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Miroslava Slavcheva, Maximilian Baust, Slobodan Ilic
{"title":"Towards Implicit Correspondence in Signed Distance Field Evolution","authors":"Miroslava Slavcheva, Maximilian Baust, Slobodan Ilic","doi":"10.1109/ICCVW.2017.103","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.103","url":null,"abstract":"The level set framework is widely used in geometry processing due to its ability to handle topological changes and the readily accessible shape properties it provides, such as normals and curvature. However, its major drawback is the lack of correspondence preservation throughout the level set evolution. Therefore, data associated with the surface, such as colour, is lost. The objective of this paper is a variational approach for signed distance field evolution which implicitly preserves correspondences. We propose an energy functional based on a novel data term, which aligns the lowest-frequency Laplacian eigenfunction representations of the input and target shapes. As these encode information about natural deformations that the shape can undergo, our strategy manages to prevent data diffusion into the volume. We demonstrate that our system is able to preserve texture throughout articulated motion sequences, and evaluate its geometric accuracy on public data.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123020272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}