{"title":"顺序蒙特卡罗融合的声音和视觉扬声器跟踪","authors":"J. Vermaak, Michel Gangnet, A. Blake, P. Pérez","doi":"10.1109/ICCV.2001.937600","DOIUrl":null,"url":null,"abstract":"Video telephony could be considerably enhanced by provision of a tracking system that allows freedom of movement to the speaker while maintaining a well-framed image, for transmission over limited bandwidth. Already commercial multi-microphone systems exist which track speaker direction in order to reject background noise. Stereo sound and vision are complementary modalities in that sound is good for initialisation (where vision is expensive) whereas vision is good for localisation (where sound is less precise). Using generative probabilistic models and particle filtering, we show that stereo sound and vision can indeed be fused effectively, to make a system more capable than with either modality on its own.","PeriodicalId":429441,"journal":{"name":"Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001","volume":"62 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"130","resultStr":"{\"title\":\"Sequential Monte Carlo fusion of sound and vision for speaker tracking\",\"authors\":\"J. Vermaak, Michel Gangnet, A. Blake, P. Pérez\",\"doi\":\"10.1109/ICCV.2001.937600\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Video telephony could be considerably enhanced by provision of a tracking system that allows freedom of movement to the speaker while maintaining a well-framed image, for transmission over limited bandwidth. Already commercial multi-microphone systems exist which track speaker direction in order to reject background noise. Stereo sound and vision are complementary modalities in that sound is good for initialisation (where vision is expensive) whereas vision is good for localisation (where sound is less precise). Using generative probabilistic models and particle filtering, we show that stereo sound and vision can indeed be fused effectively, to make a system more capable than with either modality on its own.\",\"PeriodicalId\":429441,\"journal\":{\"name\":\"Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001\",\"volume\":\"62 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2001-07-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"130\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCV.2001.937600\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCV.2001.937600","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Sequential Monte Carlo fusion of sound and vision for speaker tracking
Video telephony could be considerably enhanced by provision of a tracking system that allows freedom of movement to the speaker while maintaining a well-framed image, for transmission over limited bandwidth. Already commercial multi-microphone systems exist which track speaker direction in order to reject background noise. Stereo sound and vision are complementary modalities in that sound is good for initialisation (where vision is expensive) whereas vision is good for localisation (where sound is less precise). Using generative probabilistic models and particle filtering, we show that stereo sound and vision can indeed be fused effectively, to make a system more capable than with either modality on its own.