Signal level fusion for multimodal perceptual user interface

Workshop on Perceptive User Interfaces Pub Date : 2001-11-15 DOI:10.1145/971478.971482

John W. Fisher III, Trevor Darrell

引用次数: 17

Abstract

Multi-modal fusion is an important, yet challenging task for perceptual user interfaces. Humans routinely perform complex and simple tasks in which ambiguous auditory and visual data are combined in order to support accurate perception. By contrast, automated approaches for processing multi-modal data sources lag far behind. This is primarily due to the fact that few methods adequately model the complexity of the audio/visual relationship. We present an information theoretic approach for fusion of multiple modalities. Furthermore we discuss a statistical model for which our approach to fusion is justified. We present empirical results demonstrating audio-video localization and consistency measurement. We show examples determining where a speaker is within a scene, and whether they are producing the specified audio stream.

查看原文本刊更多论文

多模态感知用户界面的信号级融合

对于感知用户界面来说，多模态融合是一项重要但具有挑战性的任务。人类经常执行复杂和简单的任务，在这些任务中，模糊的听觉和视觉数据相结合，以支持准确的感知。相比之下，处理多模态数据源的自动化方法远远落后。这主要是因为很少有方法能够充分模拟音频/视觉关系的复杂性。提出了一种多模态融合的信息理论方法。此外，我们讨论了一个统计模型，我们的方法融合是合理的。我们提出了实证结果，证明了音视频定位和一致性测量。我们展示了确定扬声器在场景中的位置以及它们是否正在产生指定的音频流的示例。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Workshop on Perceptive User Interfaces

自引率

0.00%

发文量