{"title":"A Context-Based Tracker Switching Framework","authors":"A. Tyagi, J.W. Davis","doi":"10.1109/WMVC.2008.4544050","DOIUrl":"https://doi.org/10.1109/WMVC.2008.4544050","url":null,"abstract":"We present a robust framework for tracking people in crowded outdoor environments monitored by multiple cameras with a goal of real-time performance. Since no single algorithm is perfect for the task of object tracking in all cases, we instead take an alternate approach. Our algorithm dynamically switches between several available trackers on-the-fly by evaluating the current state/context of the scene. Autonomous agents that make the switching decisions are assigned to each object in the scene. Initialization of new agents and the handoff between various tracking algorithms are completely automated. The collaboration between different trackers is shown to improve performance compared to the individual methods in terms of both computation and reliability. The tracker switching framework is evaluated on a multi-camera dataset and both qualitative and quantitative results are presented.","PeriodicalId":150666,"journal":{"name":"2008 IEEE Workshop on Motion and video Computing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122296428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Incorporating Long-Term Observations of Human Actions for Stable 3D People Tracking","authors":"D. Sugimura, Y. Kobayashi, Y. Sato, A. Sugimoto","doi":"10.1109/WMVC.2008.4544057","DOIUrl":"https://doi.org/10.1109/WMVC.2008.4544057","url":null,"abstract":"We propose a method for enhancing the stability of tracking people by incorporating long-term observations of human actions in a scene. Basic human actions, such as walking or standing still, are frequently observed at particular locations in an observation scene. By observing human actions for a long period of time, we can identify regions that are more likely to be occupied by a person. These regions have a high probability of a person existing compared with others. The key idea of our approach is to incorporate this probability as a bias in generating samples under the framework of a particle filter for tracking people. We call this bias the environmental existence map (EEM). The EEM is iteratively updated at every frame by using the tracking results from our tracker, which leads to more stable tracking of people. Our experimental results demonstrate the effectiveness of our method.","PeriodicalId":150666,"journal":{"name":"2008 IEEE Workshop on Motion and video Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129359631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Two-Frames Accurate Motion Segmentation Using Tensor Voting and Graph-Cuts","authors":"T. Dinh, G. Medioni","doi":"10.1109/WMVC.2008.4544067","DOIUrl":"https://doi.org/10.1109/WMVC.2008.4544067","url":null,"abstract":"Motion segmentation and motion estimation are important topics in computer vision. Tensor Voting is a process that addresses both issues simultaneously; but running time is a challenge. We propose a novel approach which can yield both the motion segmentation and the motion estimation in the presence of discontinuities. This method is a combination of a non-iterative boosted-speed voting process in sparse space in a first stage, and a Graph-Cuts framework for boundary refinement in a second stage. Here, we concentrate on the motion segmentation problem. After initially choosing a sparse space by sampling the original image, we represent each of these pixels as 4-D tensor points and apply the voting framework to enforce local smoothness of motion. Afterwards, the boundary refinement is obtained by using the Graph-Cuts image segmentation. Our results attained in different types of motion show that the method outperforms other Tensor Voting approaches in speed, and the results are comparable with other methodologies in motion segmentation.","PeriodicalId":150666,"journal":{"name":"2008 IEEE Workshop on Motion and video Computing","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124757412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Face Pose Estimation From Video Sequence Using Dynamic Bayesian Network","authors":"S. A. Suandi, S. Enokida, T. Ejima","doi":"10.1109/WMVC.2008.4544053","DOIUrl":"https://doi.org/10.1109/WMVC.2008.4544053","url":null,"abstract":"This paper describes a technique to estimate human face pose from color video sequence using dynamic Bayesian network(DBN). As face and facial features trackers usually track eyes, pupils, mouth corners and skin region(face), our proposed method utilizes merely three of these features - pupils, mouth center and skin region - to compute the evidence for DBN inference. No additional image processing algorithm is required, thus, it is simple and operates in real-time. The evidence, which are called horizontal ratio and vertical ratio in this paper, are determined using model-based technique and designed significantly to simultaneously solve two problems in tracking task; scaling factor and noise influence. Results reveal that the proposed method can be realized in real-time on a 2.2 GHz Celeron CPU machine with very satisfactory pose estimation results.","PeriodicalId":150666,"journal":{"name":"2008 IEEE Workshop on Motion and video Computing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126991689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Online, Real-time Tracking and Recognition of Human Actions","authors":"Pradeep Natarajan, R. Nevatia","doi":"10.1109/WMVC.2008.4544064","DOIUrl":"https://doi.org/10.1109/WMVC.2008.4544064","url":null,"abstract":"We present a top-down approach to simultaneously track and recognize articulated full-body human motion using learned action models that is robust to variations in style, lighting, background,occlusion and viewpoint. To this end, we introduce the hierarchical variable transition hidden Markov model (HVT-HMM) that is a three-layered extension of the variable transition hidden Markov model (VTHMM). The top-most layer of the HVT-HMM represents the composite actions and contains a single Markov chain, the middle layer represents the primitive actions which are modeled using a VTHMM whose state transition probability varies with time and the bottom-most layer represents the body pose transitions using a HMM. We represent the pose using a 23D body model and present efficient learning and decoding algorithms for HVT-HMM. Further, in classical Viterbi decoding the entire sequence must be seen before the state at any instant can be recognized and hence can potentially have large latency for long video sequences. In order to address this we use a variable window approach to decoding with very low latency. We demonstrate our methods first in a domain for recognizing two-handed gestures and then in a domain with actions involving articulated motion of the entire body. Our approach shows 90-100% action recognition in both domains and runs at real-time (ap 30 fps) with very low average latency (ap 2 frames).","PeriodicalId":150666,"journal":{"name":"2008 IEEE Workshop on Motion and video Computing","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121761950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Savarese, A. DelPozo, Juan Carlos Niebles, Li Fei-Fei
{"title":"Spatial-Temporal correlatons for unsupervised action classification","authors":"S. Savarese, A. DelPozo, Juan Carlos Niebles, Li Fei-Fei","doi":"10.1109/WMVC.2008.4544068","DOIUrl":"https://doi.org/10.1109/WMVC.2008.4544068","url":null,"abstract":"Spatial-temporal local motion features have shown promising results in complex human action classification. Most of the previous works [6],[16],[21] treat these spatial- temporal features as a bag of video words, omitting any long range, global information in either the spatial or temporal domain. Other ways of learning temporal signature of motion tend to impose a fixed trajectory of the features or parts of human body returned by tracking algorithms. This leaves little flexibility for the algorithm to learn the optimal temporal pattern describing these motions. In this paper, we propose the usage of spatial-temporal correlograms to encode flexible long range temporal information into the spatial-temporal motion features. This results into a much richer description of human actions. We then apply an unsupervised generative model to learn different classes of human actions from these ST-correlograms. KTH dataset, one of the most challenging and popular human action dataset, is used for experimental evaluation. Our algorithm achieves the highest classification accuracy reported for this dataset under an unsupervised learning scheme.","PeriodicalId":150666,"journal":{"name":"2008 IEEE Workshop on Motion and video Computing","volume":"201 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115892842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Estimating Gait Phase using Low-Level Motion","authors":"B. Daubney, D. Gibson, N. Campbell","doi":"10.1109/WMVC.2008.4544060","DOIUrl":"https://doi.org/10.1109/WMVC.2008.4544060","url":null,"abstract":"This paper presents a method that is capable of robustly estimating gait phase of a human walking from a sequence of images using only low-level motion. The approach we adopt is first to learn statistical motion models of the trajectories we would expect to observe for each of the main limbs. We then extract a sparse cloud of motion features from an image sequence using a standard feature tracker. By comparing the motion of the tracked features to our models and integrating over all feature points, a HMM can be used to estimate the most likely sequence of phases. This method is then extended to be invariant to translation by using a particle filter to track the dominant foreground object. Experimental results show that the presented system is capable of extracting gait phase to a high level of accuracy, demonstrating robustness to changes in height of the walker, gait frequency and individual gait characteristics. The purpose of this work is to ask the question \"how much information can we extract if we choose to throw away all appearance cues and rely only on motion? \".","PeriodicalId":150666,"journal":{"name":"2008 IEEE Workshop on Motion and video Computing","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130743361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Steepest Descent For Efficient Covariance Tracking","authors":"A. Tyagi, J.W. Davis, G. Potamianos","doi":"10.1109/WMVC.2008.4544049","DOIUrl":"https://doi.org/10.1109/WMVC.2008.4544049","url":null,"abstract":"Recent research has advocated the use of a covariance matrix of image features for tracking objects instead of the conventional histogram object representation models used in popular algorithms. In this paper we extend the covariance tracker and propose efficient algorithms with an emphasis on both improving the tracking accuracy and reducing the execution time. The algorithms are compared to a baseline covariance tracker and the popular histogram-based mean shift tracker. Quantitative evaluations on a publicly available dataset demonstrate the efficacy of the presented methods. Our algorithms obtain significant speedups factors up to 330 while reducing the tracking errors by 86-90% relative to the baseline approach.","PeriodicalId":150666,"journal":{"name":"2008 IEEE Workshop on Motion and video Computing","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130897646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Motion Patterns in Surveillance Video using HMM Clustering","authors":"E. Swears, A. Hoogs, A. Perera","doi":"10.1109/WMVC.2008.4544063","DOIUrl":"https://doi.org/10.1109/WMVC.2008.4544063","url":null,"abstract":"We present a novel approach to learning motion behavior in video, and detecting abnormal behavior, using hierarchical clustering of hidden Markov models (HMMs). A continuous stream of track data is used for online and on-demand creation and training of HMMs, where tracks may be of highly variable length and scenes may be very complex with an unknown number of motion patterns. We show how these HMMs can be used for on-line clustering of tracks that represent normal behavior and for detection of deviant tracks. The track clustering algorithm uses a hierarchical agglomerative HMM clustering technique that jointly determines all the HMM parameters (including the number of states) via an expectation maximization (EM) algorithm and the Akaike information criteria. Results are demonstrated on a highly complex scene containing dozens of routes, significant occlusions and hundreds of moving objects.","PeriodicalId":150666,"journal":{"name":"2008 IEEE Workshop on Motion and video Computing","volume":"123 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126867662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Integrated Detection and Tracking for Multiple Moving Objects using Data-Driven MCMC Data Association","authors":"Qian Yu, G. Medioni","doi":"10.1109/WMVC.2008.4544066","DOIUrl":"https://doi.org/10.1109/WMVC.2008.4544066","url":null,"abstract":"We propose a framework to address the multiple target tracking problem, which is to recover trajectories of targets of interest over time from noisy observations. Due to occlusions by targets and static objects, parallax or other moving objects, foreground regions cannot represents targets faithfully although motion segmentation is usually computationally efficient. We adopt the real Adaboost classifier to generate meaningful candidate rectangles to interpret the foreground regions. Tracks are generated from these candidates according to the smoothness of motion, appearance and model likelihood overtime. To avoid enumerating all possible joint associations, we take a Data Driven Markov Chain Monte Carlo (DD-MCMC) approach which samples the solution space efficiently. The sampling is driven by an informed proposal scheme controlled by a joint probability model combining motion, appearance and model information. Comparative experiments with quantitative evaluations are provided.","PeriodicalId":150666,"journal":{"name":"2008 IEEE Workshop on Motion and video Computing","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133358177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}