{"title":"基于粒子滤波的纯方位声视扬声器检测与跟踪","authors":"A. Rae, A. Khamis, O. Basir, M. Kamel","doi":"10.1109/ICSCS.2009.5412478","DOIUrl":null,"url":null,"abstract":"We present a method for audio-visual speaker detection and tracking in a smart meeting room environment based on bearing measurements and particle filtering. Bearing measurements are determined using the Time Difference of Arrival (TDOA) of the acoustic signal reaching a pair of microphones, and by tracking facial regions in images from monocular cameras. A particle filter is used to sample the space of possible speaker locations within the meeting room, and to fuse the bearing measurements from auditory and visual sources. The proposed system was tested in a video messaging scenario, using a single participant seated in front of a screen to which a camera and microphone pair are attached. The experimental results show that the accuracy of speaker tracking using bearing measurements is related to the location of the speaker relative to the locations of the camera and microphones, which can be quantified using a parameter known as Dilution of Precision.","PeriodicalId":126072,"journal":{"name":"2009 3rd International Conference on Signals, Circuits and Systems (SCS)","volume":"119 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Particle filtering for bearing-only audio-visual speaker detection and tracking\",\"authors\":\"A. Rae, A. Khamis, O. Basir, M. Kamel\",\"doi\":\"10.1109/ICSCS.2009.5412478\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present a method for audio-visual speaker detection and tracking in a smart meeting room environment based on bearing measurements and particle filtering. Bearing measurements are determined using the Time Difference of Arrival (TDOA) of the acoustic signal reaching a pair of microphones, and by tracking facial regions in images from monocular cameras. A particle filter is used to sample the space of possible speaker locations within the meeting room, and to fuse the bearing measurements from auditory and visual sources. The proposed system was tested in a video messaging scenario, using a single participant seated in front of a screen to which a camera and microphone pair are attached. The experimental results show that the accuracy of speaker tracking using bearing measurements is related to the location of the speaker relative to the locations of the camera and microphones, which can be quantified using a parameter known as Dilution of Precision.\",\"PeriodicalId\":126072,\"journal\":{\"name\":\"2009 3rd International Conference on Signals, Circuits and Systems (SCS)\",\"volume\":\"119 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 3rd International Conference on Signals, Circuits and Systems (SCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSCS.2009.5412478\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 3rd International Conference on Signals, Circuits and Systems (SCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSCS.2009.5412478","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Particle filtering for bearing-only audio-visual speaker detection and tracking
We present a method for audio-visual speaker detection and tracking in a smart meeting room environment based on bearing measurements and particle filtering. Bearing measurements are determined using the Time Difference of Arrival (TDOA) of the acoustic signal reaching a pair of microphones, and by tracking facial regions in images from monocular cameras. A particle filter is used to sample the space of possible speaker locations within the meeting room, and to fuse the bearing measurements from auditory and visual sources. The proposed system was tested in a video messaging scenario, using a single participant seated in front of a screen to which a camera and microphone pair are attached. The experimental results show that the accuracy of speaker tracking using bearing measurements is related to the location of the speaker relative to the locations of the camera and microphones, which can be quantified using a parameter known as Dilution of Precision.