{"title":"Human Action Retrieval via efficient feature matching","authors":"Jun Tang, Ling Shao, Xiantong Zhen","doi":"10.1109/AVSS.2013.6636657","DOIUrl":"https://doi.org/10.1109/AVSS.2013.6636657","url":null,"abstract":"As a large proportion of the available video media concerns humans, human action retrieval is posed as a new topic in the domain of content-based video retrieval. For retrieving complex human actions, measuring the similarity between two videos represented by local features is a critical issue. In this paper, a fast and explicit feature correspondence approach is presented to compute the match cost serving as the similarity metric. Then the proposed similarity metric is embedded into the framework of manifold ranking for action retrieval. In contrast to the Bag-of-Words model and its variants, our method yields an encouraging improvement of accuracy on the KTH and the UCF YouTube datasets with reasonably efficient computation.","PeriodicalId":336903,"journal":{"name":"2013 10th IEEE International Conference on Advanced Video and Signal Based Surveillance","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129574307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A sparsity-driven approach to multi-camera tracking in visual sensor networks","authors":"S. Coşar, M. Çetin","doi":"10.1109/AVSS.2013.6636674","DOIUrl":"https://doi.org/10.1109/AVSS.2013.6636674","url":null,"abstract":"In this paper, a sparsity-driven approach is presented for multi-camera tracking in visual sensor networks (VSNs). VSNs consist of image sensors, embedded processors and wireless transceivers which are powered by batteries. Since the energy and bandwidth resources are limited, setting up a tracking system in VSNs is a challenging problem. Motivated by the goal of tracking in a bandwidth-constrained environment, we present a sparsity-driven method to compress the features extracted by the camera nodes, which are then transmitted across the network for distributed inference. We have designed special overcomplete dictionaries that match the structure of the features, leading to very parsimonious yet accurate representations. We have tested our method in indoor and outdoor people tracking scenarios. Our experimental results demonstrate how our approach leads to communication savings without significant loss in tracking performance.","PeriodicalId":336903,"journal":{"name":"2013 10th IEEE International Conference on Advanced Video and Signal Based Surveillance","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129133759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improved action recognition by combining multiple 2D views in the bag-of-words model","authors":"G. Burghouts, P. Eendebak, H. Bouma, J. T. Hove","doi":"10.1109/AVSS.2013.6636648","DOIUrl":"https://doi.org/10.1109/AVSS.2013.6636648","url":null,"abstract":"Action recognition is a hard problem due to the many degrees of freedom of the human body and the movement of its limbs. This is especially hard when only one camera viewpoint is available and when actions involve subtle movements. For instance, when looked from the side, checking one's watch may look very similar to crossing one's arms. In this paper, we investigate how much the recognition can be improved when multiple views are available. The novelty is that we explore various combination schemes within the robust and simple bag-of-words (BoW) framework, from early fusion of features to late fusion of multiple classifiers. In new experiments on the publicly available IXMAS dataset, we learn that action recognition can be improved significantly already by only adding one viewpoint. We demonstrate that the state-of-the-art on this dataset can be improved by 5% - achieving 96.4% accuracy - when multiple views are combined. Cross-view invariance of the BoW pipeline can be improved by 32% with intermediate-level fusion.","PeriodicalId":336903,"journal":{"name":"2013 10th IEEE International Conference on Advanced Video and Signal Based Surveillance","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123890345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Takeshi Arikuma, K. Koyama, T. Kitano, N. Shiraishi, Yoichi Nagai, Tsunehisa Kawamata
{"title":"Analysis control middleware for large-scale video surveillance","authors":"Takeshi Arikuma, K. Koyama, T. Kitano, N. Shiraishi, Yoichi Nagai, Tsunehisa Kawamata","doi":"10.1109/AVSS.2013.6636655","DOIUrl":"https://doi.org/10.1109/AVSS.2013.6636655","url":null,"abstract":"Recently the demand for large-scale surveillance systems automated by video analysis technologies has been increased. A key to developing real-world systems of this type is the streamlining of analysis execution on the basis of application characteristics including video contents. This paper proposes novel middleware we call “Analysis Control Middleware (ASCOT)” for achieving and customizing analysis execution control. It provides an Analysis Control Framework for load reduction, load control and customization of control logics in addition to stream processing features. We applied ASCOT to a case study application consisting of a search function for forensics and a suspect alert function for real-time intruder detection. For a system with hundreds camera, ASCOT reduces the number of servers by 40.2% by combining two simple analysis controls. Furthermore, the effects on load control and customization of control logics on the middleware are evaluated with the case study application.","PeriodicalId":336903,"journal":{"name":"2013 10th IEEE International Conference on Advanced Video and Signal Based Surveillance","volume":"2 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114121391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Video segmentation with spatio-temporal tubes","authors":"Rémi Trichet, R. Nevatia","doi":"10.1109/AVSS.2013.6636661","DOIUrl":"https://doi.org/10.1109/AVSS.2013.6636661","url":null,"abstract":"Long-term temporal interactions among objects are an important cue for video understanding. To capture such object relations, we propose a novel method for spatiotemporal video segmentation based on dense trajectory clustering that is also effective when objects articulate. We use superpixels of homogeneous size jointly with optical flow information to ease the matching of regions from one frame to another. Our second main contribution is a hierarchical fusion algorithm that yields segmentation information available at multiple linked scales. We test the algorithm on several videos from the web showing a large variety of difficulties.","PeriodicalId":336903,"journal":{"name":"2013 10th IEEE International Conference on Advanced Video and Signal Based Surveillance","volume":"51 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113981391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Vehicle logo recognition based on Bag-of-Words","authors":"Shuyuan Yu, Shibao Zheng, Hua Yang, Longfei Liang","doi":"10.1109/AVSS.2013.6636665","DOIUrl":"https://doi.org/10.1109/AVSS.2013.6636665","url":null,"abstract":"The recognition of vehicle manufacturer logo is a crucial and very challenging problem, which is still an area with few published effective methods. This paper proposes a new fast and reliable system for Vehicle Logo Recognition (VLR) based on Bag-of-Words (BoW). In our system, vehicle logo images are represented as histograms of visual words and classified by SVM in three steps: firstly, extract dense-SIFT features; secondly, quantize features into visual words by `Soft-assignment' thirdly, build histograms of visual words with spatial information. Compared with traditional VLR methods, experiment results show that our proposed system achieves higher recognition accuracy with less processing time. The proposed system is evaluated on a dataset of 840 low-resolution vehicle logo images with about 30×30 pixels, which verifies that our system is practical and effective.","PeriodicalId":336903,"journal":{"name":"2013 10th IEEE International Conference on Advanced Video and Signal Based Surveillance","volume":"4 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133410762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Activity recognition and uncertain knowledge in video scenes","authors":"R. Romdhane, C. Crispim, F. Brémond, M. Thonnat","doi":"10.1109/AVSS.2013.6636669","DOIUrl":"https://doi.org/10.1109/AVSS.2013.6636669","url":null,"abstract":"Activity recognition has been a growing research topic in the last years and its application varies from automatic recognition of social interaction such as shaking hands, parking lot surveillance, traffic monitoring and the detection of abandoned luggage. This paper describes a probabilistic framework for uncertainty handling in a description-based event recognition approach. The proposed approach allows the flexible modeling of composite events with complex temporal constraints. It uses probability theory to provide a consistent framework for dealing with uncertain knowledge for the recognition of complex events. We validate the event recognition accuracy of the proposed algorithm on real-world videos. The experimental results show that our system can successfully recognize activities with a high recognition rate. We conclude by comparing our algorithm with the state of the art and showing how the definition of event models and the probabilistic reasoning can influence the results of real-time event recognition.","PeriodicalId":336903,"journal":{"name":"2013 10th IEEE International Conference on Advanced Video and Signal Based Surveillance","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124765238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexander Kolarow, Konrad Schenk, M. Eisenbach, M. Dose, M. Brauckmann, Klaus Debes, H. Groß
{"title":"APFel: The intelligent video analysis and surveillance system for assisting human operators","authors":"Alexander Kolarow, Konrad Schenk, M. Eisenbach, M. Dose, M. Brauckmann, Klaus Debes, H. Groß","doi":"10.1109/AVSS.2013.6636639","DOIUrl":"https://doi.org/10.1109/AVSS.2013.6636639","url":null,"abstract":"The rising need for security in the last years has led to an increased use of surveillance cameras in both public and private areas. The increasing amount of footage makes it necessary to assist human operators with automated systems to monitor and analyze the video data in reasonable time. In this paper we summarize our work of the past three years in the field of intelligent and automated surveillance. Our proposed system extends the common active monitoring of camera footage into an intelligent automated investigative person-search and walk path reconstruction of a selected person within hours of image data. Our system is evaluated and tested under life-like conditions in real-world surveillance scenarios. Our experiments show that with our system an operator can reconstruct a case in a fraction of time, compared to manually searching the recorded data.","PeriodicalId":336903,"journal":{"name":"2013 10th IEEE International Conference on Advanced Video and Signal Based Surveillance","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129550942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unsupervised online learning of visual focus of attention","authors":"S. Duffner, Christophe Garcia","doi":"10.1109/AVSS.2013.6636611","DOIUrl":"https://doi.org/10.1109/AVSS.2013.6636611","url":null,"abstract":"In this paper, we propose a novel approach for estimating visual focus of attention in video streams. The method is based on an unsupervised algorithm that incrementally learns the different appearance clusters from low-level visual features extracted from face patches provided by a face tracker. The clusters learnt in that way can then be used to classify the different visual attention targets of a given person during a tracking run, without any prior knowledge on the environment and the configuration of the room or the visible persons. Experiments on public datasets containing almost two hours of annotated videos from meetings and video-conferencing show that the proposed algorithm produces state-of-the-art results and even outperforms a traditional supervised method that is based on head orientation estimation and that classifies visual focus of attention using Gaussian Mixture Models.","PeriodicalId":336903,"journal":{"name":"2013 10th IEEE International Conference on Advanced Video and Signal Based Surveillance","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124864163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Selective attention automatic focus for cognitive crowd monitoring","authors":"Simone Chiappino, L. Marcenaro, C. Regazzoni","doi":"10.1109/AVSS.2013.6636609","DOIUrl":"https://doi.org/10.1109/AVSS.2013.6636609","url":null,"abstract":"In most recent Intelligent Video Surveillance systems, mechanisms used to support human decisions are integrated in cognitive artificial processes. Large scale video surveillance networks must be able to analyse a huge amount of information. In this context, a cognitive perception mechanism integrate in an intelligent system could help an operator for focusing his attention on relevant aspects of the environment ignoring other parts. This paper presents a bio-inspired algorithm called Selective Attention Automatic Focus (S2AF), as a part of more complex Cognitive Dynamic Surveillance System (CDSS) for crowd monitoring. The main objective of the proposed method is to extract relevant information needed for crowd monitoring directly from the environmental observations. Experimental results are provided by means of a 3D crowd simulator; they show how by the proposed attention focus method is able to detect densely populated areas.","PeriodicalId":336903,"journal":{"name":"2013 10th IEEE International Conference on Advanced Video and Signal Based Surveillance","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133152192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}