{"title":"A Decision-Theoretic Video Conference System Based on Gesture Recognition","authors":"J.A. Montero, L. Sucar","doi":"10.1109/FGR.2006.7","DOIUrl":null,"url":null,"abstract":"This paper presents a new approach that combines computer vision and decision theory for an automatic video conference system. The setting is a video conference room in which a speaker interacts with surrounding objects, such as a computer, notes and books. Among a set of cameras, the system selects the most appropriate to show to the audience, according to the speaker activity. We assume that the activity of the speaker can be recognized based on hand gestures, and their interaction with the objects in the environment. The proposed approach combines context-based gesture recognition with a decision theoretic model to select the best view. Gesture recognition is based on hidden Markov models, combining motion and contextual information, where the context refers to the relation of the position of the hand with other objects. The posterior probability of each gesture is used in a partially observable Markov decision process (POMDP), to select the best view according to a utility function. The POMDP is implemented as a dynamic Bayesian network with certain lookahead. Preliminary experiments show good results in both, gesture recognition and view selection. We also present the effect of different lookahead periods in the performance of the system","PeriodicalId":109260,"journal":{"name":"7th International Conference on Automatic Face and Gesture Recognition (FGR06)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"7th International Conference on Automatic Face and Gesture Recognition (FGR06)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FGR.2006.7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
This paper presents a new approach that combines computer vision and decision theory for an automatic video conference system. The setting is a video conference room in which a speaker interacts with surrounding objects, such as a computer, notes and books. Among a set of cameras, the system selects the most appropriate to show to the audience, according to the speaker activity. We assume that the activity of the speaker can be recognized based on hand gestures, and their interaction with the objects in the environment. The proposed approach combines context-based gesture recognition with a decision theoretic model to select the best view. Gesture recognition is based on hidden Markov models, combining motion and contextual information, where the context refers to the relation of the position of the hand with other objects. The posterior probability of each gesture is used in a partially observable Markov decision process (POMDP), to select the best view according to a utility function. The POMDP is implemented as a dynamic Bayesian network with certain lookahead. Preliminary experiments show good results in both, gesture recognition and view selection. We also present the effect of different lookahead periods in the performance of the system