Audio-visual tracking for natural interactivity

MULTIMEDIA '99 Pub Date : 1999-10-30 DOI:10.1145/319463.319652

G. Pingali, G. Tunali, I. Carlbom

引用次数: 30

Abstract

The goal in user interfaces is natural interactivity unencumbered by sensor and display technology. In this paper, we propose that a multi-modal approach using inverse modeling techniques from computer vision, speech recognition, and acoustics can result in such interfaces. In particular, we demonstrate a system for audio-visual tracking, showing that such a system is more robust, more accurate, more compact, and yields more information than using a single modality for tracking. We also demonstrate how such a system can be used to find the talker among a group of individuals, and render 3D scenes to the user.

查看原文本刊更多论文

自然互动的视听跟踪

用户界面的目标是不受传感器和显示技术阻碍的自然交互。在本文中，我们提出了一种使用计算机视觉、语音识别和声学的逆建模技术的多模态方法，可以产生这样的接口。特别是，我们展示了一个用于视听跟踪的系统，表明这样的系统比使用单一模式进行跟踪更健壮、更准确、更紧凑，并且产生更多的信息。我们还演示了如何使用这样的系统在一组个人中找到说话者，并向用户呈现3D场景。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

MULTIMEDIA '99

自引率

0.00%

发文量