在杂乱的室内环境中的视听主动扬声器跟踪。

IEEE Transactions on Systems Man and Cybernetics Part B-Cybernetics Pub Date : 2009-02-01 DOI:10.1109/TSMCB.2008.2009558

Fotios Talantzis, Aristodemos Pnevmatikakis, Anthony G Constantinides

{"title":"在杂乱的室内环境中的视听主动扬声器跟踪。","authors":"Fotios Talantzis, Aristodemos Pnevmatikakis, Anthony G Constantinides","doi":"10.1109/TSMCB.2008.2009558","DOIUrl":null,"url":null,"abstract":"We propose a system for detecting the active speaker in cluttered and reverberant environments where more than one person speaks and moves. Rather than using only audio information, the system utilizes audiovisual information from multiple acoustic and video sensors that feed separate audio and video tracking modules. The audio module operates using a particle filter (PF) and an information-theoretic framework to provide accurate acoustic source location under reverberant conditions. The video subsystem combines in 3-D a number of 2-D trackers based on a variation of Stauffer's adaptive background algorithm with spatiotemporal adaptation of the learning parameters and a Kalman tracker in a feedback configuration. Extensive experiments show that gains are to be expected when fusion of the separate modalities is performed to detect the active speaker.","PeriodicalId":55006,"journal":{"name":"IEEE Transactions on Systems Man and Cybernetics Part B-Cybernetics","volume":" ","pages":"7-15"},"PeriodicalIF":0.0000,"publicationDate":"2009-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TSMCB.2008.2009558","citationCount":"3","resultStr":"{\"title\":\"Audio-visual active speaker tracking in cluttered indoors environments.\",\"authors\":\"Fotios Talantzis, Aristodemos Pnevmatikakis, Anthony G Constantinides\",\"doi\":\"10.1109/TSMCB.2008.2009558\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose a system for detecting the active speaker in cluttered and reverberant environments where more than one person speaks and moves. Rather than using only audio information, the system utilizes audiovisual information from multiple acoustic and video sensors that feed separate audio and video tracking modules. The audio module operates using a particle filter (PF) and an information-theoretic framework to provide accurate acoustic source location under reverberant conditions. The video subsystem combines in 3-D a number of 2-D trackers based on a variation of Stauffer's adaptive background algorithm with spatiotemporal adaptation of the learning parameters and a Kalman tracker in a feedback configuration. Extensive experiments show that gains are to be expected when fusion of the separate modalities is performed to detect the active speaker.\",\"PeriodicalId\":55006,\"journal\":{\"name\":\"IEEE Transactions on Systems Man and Cybernetics Part B-Cybernetics\",\"volume\":\" \",\"pages\":\"7-15\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1109/TSMCB.2008.2009558\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Systems Man and Cybernetics Part B-Cybernetics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TSMCB.2008.2009558\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Systems Man and Cybernetics Part B-Cybernetics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TSMCB.2008.2009558","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

我们提出了一个系统，用于在一个以上的人说话和移动的杂乱和混响环境中检测有源扬声器。该系统不仅使用音频信息，还利用来自多个声学和视频传感器的视听信息，为单独的音频和视频跟踪模块提供信息。音频模块使用粒子滤波器(PF)和信息论框架在混响条件下提供准确的声源定位。视频子系统在三维中结合了多个基于Stauffer自适应背景算法的二维跟踪器，该算法具有学习参数的时空自适应和反馈配置的卡尔曼跟踪器。大量的实验表明，当对不同的模态进行融合以检测有源说话者时，可以预期增益。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Audio-visual active speaker tracking in cluttered indoors environments.

We propose a system for detecting the active speaker in cluttered and reverberant environments where more than one person speaks and moves. Rather than using only audio information, the system utilizes audiovisual information from multiple acoustic and video sensors that feed separate audio and video tracking modules. The audio module operates using a particle filter (PF) and an information-theoretic framework to provide accurate acoustic source location under reverberant conditions. The video subsystem combines in 3-D a number of 2-D trackers based on a variation of Stauffer's adaptive background algorithm with spatiotemporal adaptation of the learning parameters and a Kalman tracker in a feedback configuration. Extensive experiments show that gains are to be expected when fusion of the separate modalities is performed to detect the active speaker.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Systems Man and Cybernetics Part B-Cybernetics 工程技术-计算机：控制论

自引率

0.00%

发文量

审稿时长

6.0 months