基于视觉的贝叶斯网络说话人检测

Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149) Pub Date : 1999-06-23 DOI:10.1109/CVPR.1999.784617

James M. Rehg, Kevin P. Murphy, P. Fieguth

{"title":"基于视觉的贝叶斯网络说话人检测","authors":"James M. Rehg, Kevin P. Murphy, P. Fieguth","doi":"10.1109/CVPR.1999.784617","DOIUrl":null,"url":null,"abstract":"The development of user interfaces based on vision and speech requires the solution of a challenging statistical inference problem: The intentions and actions of multiple individuals must be inferred from noisy and ambiguous data. We argue that Bayesian network models are an attractive statistical framework for cue fusion in these applications. Bayes nets combine a natural mechanism for expressing contextual information with efficient algorithms for learning and inference. We illustrate these points through the development of a Bayes net model for detecting when a user is speaking. The model combines four simple vision sensors: face detection, skin color, skin texture, and mouth motion. We present some promising experimental results.","PeriodicalId":20644,"journal":{"name":"Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149)","volume":"54 1","pages":"110-116 Vol. 2"},"PeriodicalIF":0.0000,"publicationDate":"1999-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"79","resultStr":"{\"title\":\"Vision-based speaker detection using Bayesian networks\",\"authors\":\"James M. Rehg, Kevin P. Murphy, P. Fieguth\",\"doi\":\"10.1109/CVPR.1999.784617\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The development of user interfaces based on vision and speech requires the solution of a challenging statistical inference problem: The intentions and actions of multiple individuals must be inferred from noisy and ambiguous data. We argue that Bayesian network models are an attractive statistical framework for cue fusion in these applications. Bayes nets combine a natural mechanism for expressing contextual information with efficient algorithms for learning and inference. We illustrate these points through the development of a Bayes net model for detecting when a user is speaking. The model combines four simple vision sensors: face detection, skin color, skin texture, and mouth motion. We present some promising experimental results.\",\"PeriodicalId\":20644,\"journal\":{\"name\":\"Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149)\",\"volume\":\"54 1\",\"pages\":\"110-116 Vol. 2\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1999-06-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"79\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CVPR.1999.784617\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR.1999.784617","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 79

摘要

基于视觉和语音的用户界面的开发需要解决一个具有挑战性的统计推断问题:必须从嘈杂和模糊的数据中推断出多个个体的意图和行为。我们认为贝叶斯网络模型是这些应用中线索融合的一个有吸引力的统计框架。贝叶斯网络结合了表达上下文信息的自然机制和高效的学习和推理算法。我们通过开发用于检测用户何时说话的贝叶斯网络模型来说明这些要点。该模型结合了四种简单的视觉传感器:面部检测、肤色、皮肤纹理和口腔运动。我们提出了一些有希望的实验结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Vision-based speaker detection using Bayesian networks

The development of user interfaces based on vision and speech requires the solution of a challenging statistical inference problem: The intentions and actions of multiple individuals must be inferred from noisy and ambiguous data. We argue that Bayesian network models are an attractive statistical framework for cue fusion in these applications. Bayes nets combine a natural mechanism for expressing contextual information with efficient algorithms for learning and inference. We illustrate these points through the development of a Bayes net model for detecting when a user is speaking. The model combines four simple vision sensors: face detection, skin color, skin texture, and mouth motion. We present some promising experimental results.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149)

自引率

0.00%

发文量