{"title":"Towards real-time 3-D monocular visual tracking of human limbs in unconstrained environments","authors":"Dave Bullock , John Zelek","doi":"10.1016/j.rti.2005.06.004","DOIUrl":null,"url":null,"abstract":"<div><p><span>The 3-D visual tracking of human limbs is fundamental to a wide array of computer vision applications<span><span> including gesture recognition, interactive entertainment, </span>biomechanical analysis, vehicle driver monitoring, and electronic surveillance. The problem of limb tracking is complicated by issues of occlusion, depth ambiguities, rotational ambiguities, and high levels of noise caused by loose fitting clothing. We attempt to solve the 3-D limb tracking problem using only monocular imagery (a single 2-D video source) in largely unconstrained environments. The approach presented is a movement towards full real-time operating capabilities. The described system presents a </span></span><em>complete visual tracking system</em><span> which incorporates target detection, target model acquisition/initialization, and target tracking components into a single, cohesive, probabilistic framework. The presence of a target is detected, using visual cues alone, by recognition of an individual performing a simple pre-defined initialization cue. The physical dimensions of the limb are then learned probabilistically until a statistically stable model estimate has been found. The appearance of the limb is learned in a joint spatial-chromatic domain which incorporates normalized color data with spatial constraints in order to model complex target appearances. The target tracking is performed within a Monte Carlo particle filtering framework which is capable of maintaining multiple state-space hypotheses and propagating ambiguity until less ambiguous data is observed. Multiple image cues are combined within this framework in a principled Bayesian manner. The target detection and model acquisition components are able to perform at near real-time frame rates and are shown to accurately recognize the presence of a target and initialize a target model specific to that user. The target tracking component has demonstrated exceptional resilience to occlusion and temporary target disappearance and contains a natural mechanism for the trade-off between accuracy and speed. At this point, the target tracking component performs at sub real-time frame rates, although several methods to increase the effective operating speed are proposed.</span></p></div>","PeriodicalId":101062,"journal":{"name":"Real-Time Imaging","volume":"11 4","pages":"Pages 323-353"},"PeriodicalIF":0.0000,"publicationDate":"2005-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.rti.2005.06.004","citationCount":"22","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Real-Time Imaging","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077201405000604","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 22
Abstract
The 3-D visual tracking of human limbs is fundamental to a wide array of computer vision applications including gesture recognition, interactive entertainment, biomechanical analysis, vehicle driver monitoring, and electronic surveillance. The problem of limb tracking is complicated by issues of occlusion, depth ambiguities, rotational ambiguities, and high levels of noise caused by loose fitting clothing. We attempt to solve the 3-D limb tracking problem using only monocular imagery (a single 2-D video source) in largely unconstrained environments. The approach presented is a movement towards full real-time operating capabilities. The described system presents a complete visual tracking system which incorporates target detection, target model acquisition/initialization, and target tracking components into a single, cohesive, probabilistic framework. The presence of a target is detected, using visual cues alone, by recognition of an individual performing a simple pre-defined initialization cue. The physical dimensions of the limb are then learned probabilistically until a statistically stable model estimate has been found. The appearance of the limb is learned in a joint spatial-chromatic domain which incorporates normalized color data with spatial constraints in order to model complex target appearances. The target tracking is performed within a Monte Carlo particle filtering framework which is capable of maintaining multiple state-space hypotheses and propagating ambiguity until less ambiguous data is observed. Multiple image cues are combined within this framework in a principled Bayesian manner. The target detection and model acquisition components are able to perform at near real-time frame rates and are shown to accurately recognize the presence of a target and initialize a target model specific to that user. The target tracking component has demonstrated exceptional resilience to occlusion and temporary target disappearance and contains a natural mechanism for the trade-off between accuracy and speed. At this point, the target tracking component performs at sub real-time frame rates, although several methods to increase the effective operating speed are proposed.