Fotios Talantzis, Aristodemos Pnevmatikakis, Anthony G Constantinides
{"title":"在杂乱的室内环境中的视听主动扬声器跟踪。","authors":"Fotios Talantzis, Aristodemos Pnevmatikakis, Anthony G Constantinides","doi":"10.1109/TSMCB.2008.2009558","DOIUrl":null,"url":null,"abstract":"<p><p>We propose a system for detecting the active speaker in cluttered and reverberant environments where more than one person speaks and moves. Rather than using only audio information, the system utilizes audiovisual information from multiple acoustic and video sensors that feed separate audio and video tracking modules. The audio module operates using a particle filter (PF) and an information-theoretic framework to provide accurate acoustic source location under reverberant conditions. The video subsystem combines in 3-D a number of 2-D trackers based on a variation of Stauffer's adaptive background algorithm with spatiotemporal adaptation of the learning parameters and a Kalman tracker in a feedback configuration. Extensive experiments show that gains are to be expected when fusion of the separate modalities is performed to detect the active speaker.</p>","PeriodicalId":55006,"journal":{"name":"IEEE Transactions on Systems Man and Cybernetics Part B-Cybernetics","volume":" ","pages":"7-15"},"PeriodicalIF":0.0000,"publicationDate":"2009-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TSMCB.2008.2009558","citationCount":"3","resultStr":"{\"title\":\"Audio-visual active speaker tracking in cluttered indoors environments.\",\"authors\":\"Fotios Talantzis, Aristodemos Pnevmatikakis, Anthony G Constantinides\",\"doi\":\"10.1109/TSMCB.2008.2009558\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>We propose a system for detecting the active speaker in cluttered and reverberant environments where more than one person speaks and moves. Rather than using only audio information, the system utilizes audiovisual information from multiple acoustic and video sensors that feed separate audio and video tracking modules. The audio module operates using a particle filter (PF) and an information-theoretic framework to provide accurate acoustic source location under reverberant conditions. The video subsystem combines in 3-D a number of 2-D trackers based on a variation of Stauffer's adaptive background algorithm with spatiotemporal adaptation of the learning parameters and a Kalman tracker in a feedback configuration. Extensive experiments show that gains are to be expected when fusion of the separate modalities is performed to detect the active speaker.</p>\",\"PeriodicalId\":55006,\"journal\":{\"name\":\"IEEE Transactions on Systems Man and Cybernetics Part B-Cybernetics\",\"volume\":\" \",\"pages\":\"7-15\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1109/TSMCB.2008.2009558\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Systems Man and Cybernetics Part B-Cybernetics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TSMCB.2008.2009558\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Systems Man and Cybernetics Part B-Cybernetics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TSMCB.2008.2009558","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Audio-visual active speaker tracking in cluttered indoors environments.
We propose a system for detecting the active speaker in cluttered and reverberant environments where more than one person speaks and moves. Rather than using only audio information, the system utilizes audiovisual information from multiple acoustic and video sensors that feed separate audio and video tracking modules. The audio module operates using a particle filter (PF) and an information-theoretic framework to provide accurate acoustic source location under reverberant conditions. The video subsystem combines in 3-D a number of 2-D trackers based on a variation of Stauffer's adaptive background algorithm with spatiotemporal adaptation of the learning parameters and a Kalman tracker in a feedback configuration. Extensive experiments show that gains are to be expected when fusion of the separate modalities is performed to detect the active speaker.