An audiovisual attention model for natural conversation scenes

2014 IEEE International Conference on Image Processing (ICIP) Pub Date : 2014-10-27 DOI:10.1109/ICIP.2014.7025219

A. Coutrot, Nathalie Guyader

引用次数: 36

Abstract

Classical visual attention models neither consider social cues, such as faces, nor auditory cues, such as speech. However, faces are known to capture visual attention more than any other visual features, and recent studies showed that speech turn-taking affects the gaze of non-involved viewers. In this paper, we propose an audiovisual saliency model able to predict the eye movements of observers viewing other people having a conversation. Thanks to a speaker diarization algorithm, our audiovisual saliency model increases the saliency of the speakers compared to the addressees. We evaluated our model with eye-tracking data, and found that it significantly outperforms visual attention models using an equal and constant saliency value for all faces.

查看原文本刊更多论文

自然对话场景的视听注意模型

经典的视觉注意模型既不考虑社会线索，如面孔，也不考虑听觉线索，如言语。然而，众所周知，面部比其他任何视觉特征都更能吸引视觉注意力，最近的研究表明，言语转换会影响未参与的观众的目光。在本文中，我们提出了一个视听显著性模型，能够预测观察者在观看他人交谈时的眼球运动。由于说话人的拨号算法，我们的视听显著性模型增加了说话人相对于收件人的显著性。我们用眼动追踪数据评估了我们的模型，发现它明显优于对所有面孔使用相同且恒定的显著性值的视觉注意模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 IEEE International Conference on Image Processing (ICIP)

自引率

0.00%

发文量