{"title":"Fast CNN surveillance pipeline for fine-grained vessel classification and detection in maritime scenarios","authors":"Fouad Bousetouane, B. Morris","doi":"10.1109/AVSS.2016.7738076","DOIUrl":"https://doi.org/10.1109/AVSS.2016.7738076","url":null,"abstract":"Deep convolutional neural networks (CNNs) have proven very effective for many vision benchmarks in object detection and classification tasks. However, the computational complexity and object resolution requirements of CNNs limit their applicability in wide-view video surveillance settings where objects are small. This paper presents a CNN surveillance pipeline for vessel localization and classification in maritime video. The proposed pipeline is build upon the GPU implementation of Fast-R-CNN with three main steps:(1) Vessel filtering and regions proposal using low-cost weak object detectors based on hand-engineered features. (2) Deep CNN features of the candidates regions are computed with one feed-forward pass from the high-level layer of a fine-tuned VGG16 network. (3) Fine-grained classification is performed using CNN features and a support vector machine classifier with linear kernel for object verification. The performance of the proposed pipeline is compared with other popular CNN architectures with respect to detection accuracy and evaluation speed. The proposed approach mAP of 61.10% was the comparable with Fast-R-CNN but with a 10× speed up (on the order of Faster-R-CNN) on the new Annapolis Maritime Surveillance Dataset.","PeriodicalId":438290,"journal":{"name":"2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122247899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Taalimi, Hesam Shams, Alireza Rahimpour, R. Khorsandi, Wei Wang, Rui Guo, H. Qi
{"title":"Multimodal weighted dictionary learning","authors":"A. Taalimi, Hesam Shams, Alireza Rahimpour, R. Khorsandi, Wei Wang, Rui Guo, H. Qi","doi":"10.1109/AVSS.2016.7738026","DOIUrl":"https://doi.org/10.1109/AVSS.2016.7738026","url":null,"abstract":"Classical dictionary learning algorithms that rely on a single source of information have been successfully used for the discriminative tasks. However, exploiting multiple sources has demonstrated its effectiveness in solving challenging real-world situations. We propose a new framework for feature fusion to achieve better classification performance as compared to the case where individual sources are utilized. In the context of multimodal data analysis, the modality configuration induces a strong group/coupling structure. The proposed method models the coupling between different modalities in space of sparse codes while at the same time within each modality a discriminative dictionary is learned in an all-vs-all scheme whose class-specific sub-parts are non-correlated. The proposed dictionary learning scheme is referred to as the multimodal weighted dictionary learning (MWDL). We demonstrate that MWDL outperforms state-of-the-art dictionary learning approaches in various experiments.","PeriodicalId":438290,"journal":{"name":"2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"132 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130012772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ensemble Of adaptive correlation filters for robust visual tracking","authors":"Erhan Gundogdu, Huseyin Ozkan, Aydin Alatan","doi":"10.1109/AVSS.2016.7738031","DOIUrl":"https://doi.org/10.1109/AVSS.2016.7738031","url":null,"abstract":"Correlation filters have recently been popular due to their success in short-term single-object tracking as well as their computational efficiency. Nevertheless, the appearance model of a single correlation filter based tracking algorithm quickly forgets the past poses of the target object due to the updates over time. To overcome this undesired forgetting, our approach is to run trackers with separate models simultaneously. Hence, we propose a novel tracker relying on an ensemble of correlation filters, where the ensemble is obtained via a decision tree partitioning in the object appearance space. Our technique efficiently searches among the ensemble trackers and activates the ones which are most specialized on the current object appearance. Our tracking method is capable of switching frequently in the ensemble. Thus, an inherently adaptive and non-linear learning rate is achieved. Moreover, we demonstrate the superior performance of our method in benchmark video sequences.","PeriodicalId":438290,"journal":{"name":"2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134532030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Detecting threat behaviours","authors":"J. L. Patino, J. Ferryman","doi":"10.1109/AVSS.2016.7738072","DOIUrl":"https://doi.org/10.1109/AVSS.2016.7738072","url":null,"abstract":"This paper addresses the complex problem of recognising threat situations from videos streamed by surveillance cameras. A behaviour recognition approach is proposed, which is based on a semantic recognition of the event. Low-level tracking information is transformed into high-level semantic descriptions mainly by analysis of the tracked object speed and direction. Semantic terms combined with automatically learned activity zones of the observed scene allow delivering behaviour events indicating the mobile activity. Behaviours of interest are modelled and recognised in the semantic domain. The approach has been applied on different public datasets, namely CAVIAR and ARENA. Both datasets contain instances of people attacked (with physical aggression). Successful results have been obtained when compared to other state of the art algorithms.","PeriodicalId":438290,"journal":{"name":"2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115590356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Single object tracking based on active and passive detection information in distributed heterogeneous sensor network","authors":"Hyunhak Shin, C. Cho, Hanseok Ko","doi":"10.1109/AVSS.2016.7738083","DOIUrl":"https://doi.org/10.1109/AVSS.2016.7738083","url":null,"abstract":"In this paper, a single object tracking method based on fusion of detection information collected from a distributed heterogeneous sensor network is proposed. The considered sensor network is composed of one active type source and multiple receivers. It is assumed that the heterogeneous network is capable of acquiring both passive and active information simultaneously. By means of fusion of the acquired heterogeneous data, the proposed method estimates the candidate region of target location. Then, position of the object is estimated by Maximum Likelihood Estimation. In the experimental results, the performance of the proposed method is demonstrated in terms of deployment strategy of the heterogeneous sensor network.","PeriodicalId":438290,"journal":{"name":"2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115141122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modeling spatial layout of features for real world scenario RGB-D action recognition","authors":"Michal Koperski, F. Brémond","doi":"10.1109/AVSS.2016.7738023","DOIUrl":"https://doi.org/10.1109/AVSS.2016.7738023","url":null,"abstract":"Depth information improves skeleton detection, thus skeleton based methods are the most popular methods in RGB-D action recognition. But skeleton detection working range is limited in terms of distance and view-point. Most of the skeleton based action recognition methods ignore fact that skeleton may be missing. Local points-of-interest (POIs) do not require skeleton detection. But they fail if they cannot detect enough POIs e.g. amount of motion in action is low. Most of them ignore spatial-location of features. We cope with the above problems by employing people detector instead of skeleton detector. We propose method to encode spatial-layout of features inside bounding box. We also introduce descriptor which encodes static information for actions with low amount of motion. We validate our approach on: 3 public data-sets. The results show that our method is competitive to skeleton based methods, while requiring much simpler people detection instead of skeleton detection.","PeriodicalId":438290,"journal":{"name":"2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"193 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133349757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TREAT: Terse Rapid Edge-Anchored Tracklets","authors":"Rémi Trichet, N. O’Connor","doi":"10.1109/AVSS.2016.7738078","DOIUrl":"https://doi.org/10.1109/AVSS.2016.7738078","url":null,"abstract":"Fast computation, efficient memory storage, and performance on par with standard state-of-the-art descriptors make binary descriptors a convenient tool for many computer vision applications. However their development is mostly tailored for static images. To respond to this limitation, we introduce TREAT (Terse Rapid Edge-Anchored Tracklets), a new binary detector and descriptor, based on tracklets. It harnesses moving edge maps to perform efficient feature detection, tracking, and description at low computational cost. Experimental results on 3 different public datasets demonstrate improved performance over other popular binary features. These experiments also provide a basis for benchmarking the performance of binary descriptors in video-based applications.","PeriodicalId":438290,"journal":{"name":"2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114637535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring depth information for head detection with depth images","authors":"Siyuan Chen, F. Brémond, H. Nguyen, Hugues Thomas","doi":"10.1109/AVSS.2016.7738060","DOIUrl":"https://doi.org/10.1109/AVSS.2016.7738060","url":null,"abstract":"Head detection may be more demanding than face recognition and pedestrian detection in the scenarios where a face turns away or body parts are occluded in the view of a sensor, but locating people is needed. In this paper, we introduce an efficient head detection approach for single depth images at low computational expense. First, a novel head descriptor is developed and used to classify pixels as head or non-head. We use depth values to guide each window size, to eliminate false positives of head centers, and to cluster head pixels, which significantly reduce the computation costs of searching for appropriate parameters. High head detection performance was achieved in experiments - 90% accuracy for our dataset containing heads with different body postures, head poses, and distances to a Kinect2 sensor, and above 70% precision on a public dataset composed of a few daily activities, which is higher than using a head-shoulder detector with HOG feature for depth images.","PeriodicalId":438290,"journal":{"name":"2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117116570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic generation of scene-specific person trackers","authors":"Gerrit Holzbach, F. V. D. Camp, R. Stiefelhagen","doi":"10.1109/AVSS.2016.7738079","DOIUrl":"https://doi.org/10.1109/AVSS.2016.7738079","url":null,"abstract":"The large variety of influencing factors requires the manual creation and optimization of person trackers for different environments as well as different views, such as in a distributed network of cameras. The manual creation and adjustments are time-consuming as well as prone to error and therefore expensive. We propose a system that uses basic computer-vision building blocks to automatically create and optimize a person tracker, using only a few annotated camera frames. An evaluation shows that our system creates target-oriented trackers from largely very basic methods, that can compare to manually, published person trackers. The system has the potential to drastically reduce the cost of creating person trackers for new environments and optimizing it for every single camera in a camera network.","PeriodicalId":438290,"journal":{"name":"2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"77 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123230412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust discriminative tracking via query-by-bagging","authors":"Kourosh Meshgi, Shigeyuki Oba, S. Ishii","doi":"10.1109/AVSS.2016.7738027","DOIUrl":"https://doi.org/10.1109/AVSS.2016.7738027","url":null,"abstract":"Adaptive tracking-by-detection is a popular approach to track arbitrary objects in various situations. Such approaches treat tracking as a classification task and constantly update the object model. The update procedure requires a set of labeled examples, where samples are collected from the last observation, and then labeled. However, these intermediate steps typically follow a set of heuristic rules for labeling and uninformed search in the sample space, which decrease the effectiveness of model update. In this study, we present a framework for adaptive tracking that utilizes active learning for effective sample selection and labeling them. The active sampler employs a committee of randomized-classifiers to select the most informative samples and query their label from an auxiliary detector with a long-term memory. The committee is then updated with the obtained labels. Experiments show that our algorithm outperforms state-of-the-art trackers on various benchmark videos.","PeriodicalId":438290,"journal":{"name":"2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128663492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}