根据凝视和声音估计注意力的焦点

Workshop on Perceptive User Interfaces Pub Date : 2001-11-15 DOI:10.1145/971478.971505

R. Stiefelhagen, Jie Yang, A. Waibel

{"title":"根据凝视和声音估计注意力的焦点","authors":"R. Stiefelhagen, Jie Yang, A. Waibel","doi":"10.1145/971478.971505","DOIUrl":null,"url":null,"abstract":"Estimating a person's focus of attention is useful for various human-computer interaction applications, such as smart meeting rooms, where a user's goals and intent have to be monitored. In work presented here, we are interested in modeling focus of attention in a meeting situation. We have developed a system capable of estimating participants' focus of attention from multiple cues. We employ an omnidirectional camera to simultaneously track participants' faces around a meeting table and use neural networks to estimate their head poses. In addition, we use microphones to detect who is speaking. The system predicts participants' focus of attention from acoustic and visual information separately, and then combines the output of the audio- and video-based focus of attention predictors. We have evaluated the system using the data from three recorded meetings. The acoustic information has provided 8% error reduction on average compared to using a single modality.","PeriodicalId":416822,"journal":{"name":"Workshop on Perceptive User Interfaces","volume":"148 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"76","resultStr":"{\"title\":\"Estimating focus of attention based on gaze and sound\",\"authors\":\"R. Stiefelhagen, Jie Yang, A. Waibel\",\"doi\":\"10.1145/971478.971505\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Estimating a person's focus of attention is useful for various human-computer interaction applications, such as smart meeting rooms, where a user's goals and intent have to be monitored. In work presented here, we are interested in modeling focus of attention in a meeting situation. We have developed a system capable of estimating participants' focus of attention from multiple cues. We employ an omnidirectional camera to simultaneously track participants' faces around a meeting table and use neural networks to estimate their head poses. In addition, we use microphones to detect who is speaking. The system predicts participants' focus of attention from acoustic and visual information separately, and then combines the output of the audio- and video-based focus of attention predictors. We have evaluated the system using the data from three recorded meetings. The acoustic information has provided 8% error reduction on average compared to using a single modality.\",\"PeriodicalId\":416822,\"journal\":{\"name\":\"Workshop on Perceptive User Interfaces\",\"volume\":\"148 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2001-11-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"76\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Workshop on Perceptive User Interfaces\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/971478.971505\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop on Perceptive User Interfaces","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/971478.971505","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 76

摘要

估计一个人的注意力焦点对于各种人机交互应用程序非常有用，例如智能会议室，其中必须监控用户的目标和意图。在这里展示的工作中，我们感兴趣的是建模会议情境中的注意力焦点。我们已经开发了一个系统，能够从多个线索估计参与者的注意力焦点。我们使用了一个全向摄像头来同时跟踪会议桌旁参与者的面部，并使用神经网络来估计他们的头部姿势。此外，我们使用麦克风来检测谁在说话。该系统分别从声音和视觉信息中预测参与者的注意力焦点，然后结合基于音频和视频的注意力焦点预测器的输出。我们利用三次会议记录的数据评估了这个系统。与使用单一模态相比，声学信息平均减少了8%的误差。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Estimating focus of attention based on gaze and sound

Estimating a person's focus of attention is useful for various human-computer interaction applications, such as smart meeting rooms, where a user's goals and intent have to be monitored. In work presented here, we are interested in modeling focus of attention in a meeting situation. We have developed a system capable of estimating participants' focus of attention from multiple cues. We employ an omnidirectional camera to simultaneously track participants' faces around a meeting table and use neural networks to estimate their head poses. In addition, we use microphones to detect who is speaking. The system predicts participants' focus of attention from acoustic and visual information separately, and then combines the output of the audio- and video-based focus of attention predictors. We have evaluated the system using the data from three recorded meetings. The acoustic information has provided 8% error reduction on average compared to using a single modality.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Workshop on Perceptive User Interfaces

自引率

0.00%

发文量