多模态感知用户界面的信号级融合

Workshop on Perceptive User Interfaces Pub Date : 2001-11-15 DOI:10.1145/971478.971482

John W. Fisher III, Trevor Darrell

{"title":"多模态感知用户界面的信号级融合","authors":"John W. Fisher III, Trevor Darrell","doi":"10.1145/971478.971482","DOIUrl":null,"url":null,"abstract":"Multi-modal fusion is an important, yet challenging task for perceptual user interfaces. Humans routinely perform complex and simple tasks in which ambiguous auditory and visual data are combined in order to support accurate perception. By contrast, automated approaches for processing multi-modal data sources lag far behind. This is primarily due to the fact that few methods adequately model the complexity of the audio/visual relationship. We present an information theoretic approach for fusion of multiple modalities. Furthermore we discuss a statistical model for which our approach to fusion is justified. We present empirical results demonstrating audio-video localization and consistency measurement. We show examples determining where a speaker is within a scene, and whether they are producing the specified audio stream.","PeriodicalId":416822,"journal":{"name":"Workshop on Perceptive User Interfaces","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"Signal level fusion for multimodal perceptual user interface\",\"authors\":\"John W. Fisher III, Trevor Darrell\",\"doi\":\"10.1145/971478.971482\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multi-modal fusion is an important, yet challenging task for perceptual user interfaces. Humans routinely perform complex and simple tasks in which ambiguous auditory and visual data are combined in order to support accurate perception. By contrast, automated approaches for processing multi-modal data sources lag far behind. This is primarily due to the fact that few methods adequately model the complexity of the audio/visual relationship. We present an information theoretic approach for fusion of multiple modalities. Furthermore we discuss a statistical model for which our approach to fusion is justified. We present empirical results demonstrating audio-video localization and consistency measurement. We show examples determining where a speaker is within a scene, and whether they are producing the specified audio stream.\",\"PeriodicalId\":416822,\"journal\":{\"name\":\"Workshop on Perceptive User Interfaces\",\"volume\":\"64 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2001-11-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Workshop on Perceptive User Interfaces\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/971478.971482\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop on Perceptive User Interfaces","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/971478.971482","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

摘要

对于感知用户界面来说，多模态融合是一项重要但具有挑战性的任务。人类经常执行复杂和简单的任务，在这些任务中，模糊的听觉和视觉数据相结合，以支持准确的感知。相比之下，处理多模态数据源的自动化方法远远落后。这主要是因为很少有方法能够充分模拟音频/视觉关系的复杂性。提出了一种多模态融合的信息理论方法。此外，我们讨论了一个统计模型，我们的方法融合是合理的。我们提出了实证结果，证明了音视频定位和一致性测量。我们展示了确定扬声器在场景中的位置以及它们是否正在产生指定的音频流的示例。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Signal level fusion for multimodal perceptual user interface

Multi-modal fusion is an important, yet challenging task for perceptual user interfaces. Humans routinely perform complex and simple tasks in which ambiguous auditory and visual data are combined in order to support accurate perception. By contrast, automated approaches for processing multi-modal data sources lag far behind. This is primarily due to the fact that few methods adequately model the complexity of the audio/visual relationship. We present an information theoretic approach for fusion of multiple modalities. Furthermore we discuss a statistical model for which our approach to fusion is justified. We present empirical results demonstrating audio-video localization and consistency measurement. We show examples determining where a speaker is within a scene, and whether they are producing the specified audio stream.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Workshop on Perceptive User Interfaces

自引率

0.00%

发文量