基于视觉的贝叶斯网络说话人检测

James M. Rehg, Kevin P. Murphy, P. Fieguth
{"title":"基于视觉的贝叶斯网络说话人检测","authors":"James M. Rehg, Kevin P. Murphy, P. Fieguth","doi":"10.1109/CVPR.1999.784617","DOIUrl":null,"url":null,"abstract":"The development of user interfaces based on vision and speech requires the solution of a challenging statistical inference problem: The intentions and actions of multiple individuals must be inferred from noisy and ambiguous data. We argue that Bayesian network models are an attractive statistical framework for cue fusion in these applications. Bayes nets combine a natural mechanism for expressing contextual information with efficient algorithms for learning and inference. We illustrate these points through the development of a Bayes net model for detecting when a user is speaking. The model combines four simple vision sensors: face detection, skin color, skin texture, and mouth motion. We present some promising experimental results.","PeriodicalId":20644,"journal":{"name":"Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149)","volume":"54 1","pages":"110-116 Vol. 2"},"PeriodicalIF":0.0000,"publicationDate":"1999-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"79","resultStr":"{\"title\":\"Vision-based speaker detection using Bayesian networks\",\"authors\":\"James M. Rehg, Kevin P. Murphy, P. Fieguth\",\"doi\":\"10.1109/CVPR.1999.784617\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The development of user interfaces based on vision and speech requires the solution of a challenging statistical inference problem: The intentions and actions of multiple individuals must be inferred from noisy and ambiguous data. We argue that Bayesian network models are an attractive statistical framework for cue fusion in these applications. Bayes nets combine a natural mechanism for expressing contextual information with efficient algorithms for learning and inference. We illustrate these points through the development of a Bayes net model for detecting when a user is speaking. The model combines four simple vision sensors: face detection, skin color, skin texture, and mouth motion. We present some promising experimental results.\",\"PeriodicalId\":20644,\"journal\":{\"name\":\"Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149)\",\"volume\":\"54 1\",\"pages\":\"110-116 Vol. 2\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1999-06-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"79\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CVPR.1999.784617\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR.1999.784617","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 79

摘要

基于视觉和语音的用户界面的开发需要解决一个具有挑战性的统计推断问题:必须从嘈杂和模糊的数据中推断出多个个体的意图和行为。我们认为贝叶斯网络模型是这些应用中线索融合的一个有吸引力的统计框架。贝叶斯网络结合了表达上下文信息的自然机制和高效的学习和推理算法。我们通过开发用于检测用户何时说话的贝叶斯网络模型来说明这些要点。该模型结合了四种简单的视觉传感器:面部检测、肤色、皮肤纹理和口腔运动。我们提出了一些有希望的实验结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Vision-based speaker detection using Bayesian networks
The development of user interfaces based on vision and speech requires the solution of a challenging statistical inference problem: The intentions and actions of multiple individuals must be inferred from noisy and ambiguous data. We argue that Bayesian network models are an attractive statistical framework for cue fusion in these applications. Bayes nets combine a natural mechanism for expressing contextual information with efficient algorithms for learning and inference. We illustrate these points through the development of a Bayes net model for detecting when a user is speaking. The model combines four simple vision sensors: face detection, skin color, skin texture, and mouth motion. We present some promising experimental results.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信