Attentional object spotting by integrating multimodal input

Proceedings. Fourth IEEE International Conference on Multimodal Interfaces Pub Date : 2002-10-14 DOI:10.1109/ICMI.2002.1167008

Chen Yu, D. Ballard, Shenghuo Zhu

引用次数: 11

Abstract

An intelligent human-computer interface is expected to allow computers to work with users in a cooperative manner. To achieve this goal, computers need to be aware of user attention and provide assistance without explicit user requests. Cognitive studies of eye movements suggest that in accomplishing well-learned tasks, the performer's focus of attention is locked onto ongoing work and more than 90% of eye movements are closely related to the objects being manipulated in the tasks. In light of this, we have developed an attentional object spotting system that integrates multimodal data consisting of eye position, head position and video from the "first-person" perspective. To detect the user's focus of attention, we modeled eye gaze and head movements using a hidden Markov model (HMM) representation. For each attentional point in time, the object of user interest is automatically extracted and recognized. We report the results of experiments on finding attentional objects in the natural task of "making a peanut-butter sandwich".

查看原文本刊更多论文

整合多模态输入的注意目标识别

一个智能的人机界面被期望允许计算机以一种合作的方式与用户一起工作。为了实现这一目标，计算机需要意识到用户的注意力，并在没有用户明确请求的情况下提供帮助。对眼球运动的认知研究表明，在完成熟练掌握的任务时，表演者的注意力集中在正在进行的工作上，超过90%的眼球运动与任务中被操纵的物体密切相关。鉴于此，我们开发了一个注意力物体识别系统，该系统集成了多模态数据，包括眼睛位置、头部位置和“第一人称”视角的视频。为了检测用户的注意力焦点，我们使用隐马尔可夫模型(HMM)表示对眼睛注视和头部运动进行建模。对于每一个关注时间点，自动提取和识别用户感兴趣的对象。我们报告了在“制作花生酱三明治”这一自然任务中寻找注意对象的实验结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings. Fourth IEEE International Conference on Multimodal Interfaces

自引率

0.00%

发文量