Audio-visual sensor fusion system for intelligent sound sensing

Kota Takahashi, Hiro Yamasaki
{"title":"Audio-visual sensor fusion system for intelligent sound sensing","authors":"Kota Takahashi, Hiro Yamasaki","doi":"10.1109/MFI.1994.398413","DOIUrl":null,"url":null,"abstract":"An intelligent sensing system is proposed, which extracts a target sound signal autonomously from multi-microphone signals corrupted by interference ambient noise. Although many types of intelligent signal receivers with multiple sensors have been proposed recently, the use of audio-visual sensor fusion techniques is a special feature of the system described here. This sensor fusion system can be divided into two subsystems: an audio subsystem and a visual subsystem. The audio subsystem extracts a target signal with a digital filter composed of tapped delay lines and adjustable weights. These weights are renewed by a special adaptive algorithm, which is called the \"cue signal method\". For adaptation, the cue signal method needs only a narrow bandwidth signal which correlates with the power level of the target signal. This narrow bandwidth signal is called the \"cue signal\". The role of the visual subsystem is, therefore, to generate a cue signal. The authors have already proposed methods for generating a cue signal using video images. Sensor fusion of audio and visual information was accomplished by simple methods. In this paper, two new sensor fusion techniques are proposed. One is a method for generating a cue signal using not only video images but also microphone signals, and the other is a method for generating a cue signal using microphone signals, video images and internal knowledge. Both are a hierarchical sensor fusion of audio and visual information. In order to evaluate and demonstrate the sensor fusion algorithm, a real-time processing system including seventy DSPs was constructed. The architecture of this system is also described.<<ETX>>","PeriodicalId":133630,"journal":{"name":"Proceedings of 1994 IEEE International Conference on MFI '94. Multisensor Fusion and Integration for Intelligent Systems","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1994-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of 1994 IEEE International Conference on MFI '94. Multisensor Fusion and Integration for Intelligent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MFI.1994.398413","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

An intelligent sensing system is proposed, which extracts a target sound signal autonomously from multi-microphone signals corrupted by interference ambient noise. Although many types of intelligent signal receivers with multiple sensors have been proposed recently, the use of audio-visual sensor fusion techniques is a special feature of the system described here. This sensor fusion system can be divided into two subsystems: an audio subsystem and a visual subsystem. The audio subsystem extracts a target signal with a digital filter composed of tapped delay lines and adjustable weights. These weights are renewed by a special adaptive algorithm, which is called the "cue signal method". For adaptation, the cue signal method needs only a narrow bandwidth signal which correlates with the power level of the target signal. This narrow bandwidth signal is called the "cue signal". The role of the visual subsystem is, therefore, to generate a cue signal. The authors have already proposed methods for generating a cue signal using video images. Sensor fusion of audio and visual information was accomplished by simple methods. In this paper, two new sensor fusion techniques are proposed. One is a method for generating a cue signal using not only video images but also microphone signals, and the other is a method for generating a cue signal using microphone signals, video images and internal knowledge. Both are a hierarchical sensor fusion of audio and visual information. In order to evaluate and demonstrate the sensor fusion algorithm, a real-time processing system including seventy DSPs was constructed. The architecture of this system is also described.<>
用于智能声音感知的视听传感器融合系统
提出了一种从受环境噪声干扰的多麦克风信号中自动提取目标声音信号的智能传感系统。虽然最近已经提出了许多类型的具有多个传感器的智能信号接收器,但使用视听传感器融合技术是这里描述的系统的一个特殊特征。该传感器融合系统可分为两个子系统:音频子系统和视觉子系统。音频分系统通过抽头延迟线和可调权重组成的数字滤波器提取目标信号。这些权重通过一种特殊的自适应算法更新,这种算法被称为“线索信号法”。在自适应方面,提示信号方法只需要一个与目标信号功率级相关的窄带宽信号。这种窄带宽信号被称为“提示信号”。因此,视觉子系统的作用是产生提示信号。作者已经提出了利用视频图像产生线索信号的方法。通过简单的方法实现了传感器的视听信息融合。本文提出了两种新的传感器融合技术。一种是既利用视频图像又利用麦克风信号产生线索信号的方法,另一种是利用麦克风信号、视频图像和内部知识产生线索信号的方法。两者都是一种融合视听信息的分层传感器。为了对传感器融合算法进行评估和验证,构建了一个包含70个dsp的实时处理系统。并对系统的体系结构进行了描述
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信