Hyun-Don Kim, Kazunori Komatani, T. Ogata, HIroshi G. Okuno
{"title":"基于双通道的类人机器人在嘈杂家庭环境中的语音活动检测","authors":"Hyun-Don Kim, Kazunori Komatani, T. Ogata, HIroshi G. Okuno","doi":"10.1109/ROBOT.2008.4543745","DOIUrl":null,"url":null,"abstract":"The purpose of this research is to accurately classify the speech signals originating from the front even in noisy home environments. This ability can help robots to improve speech recognition and to spot keywords. We therefore developed a new voice activity detection (VAD) based on the complex spectrum circle centroid (CSCC) method. It can classify the speech signals that are received at the front of two microphones by comparing the spectral energy of observed signals with that of target signals estimated by CSCC. Also, it can work in real time without training filter coefficients beforehand even in noisy environments (SNR > 0 dB) and can cope with speech noises generated by audio-visual equipments such as televisions and audio devices. Since the CSCC method requires the directions of the noise signals, we also developed a sound source localization system integrated with cross-power spectrum phase (CSP) analysis and an expectation-maximization (EM) algorithm. This system was demonstrated to enable a robot to cope with multiple sound sources using two microphones.","PeriodicalId":351230,"journal":{"name":"2008 IEEE International Conference on Robotics and Automation","volume":"97 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Two-channel-based voice activity detection for humanoid robots in noisy home environments\",\"authors\":\"Hyun-Don Kim, Kazunori Komatani, T. Ogata, HIroshi G. Okuno\",\"doi\":\"10.1109/ROBOT.2008.4543745\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The purpose of this research is to accurately classify the speech signals originating from the front even in noisy home environments. This ability can help robots to improve speech recognition and to spot keywords. We therefore developed a new voice activity detection (VAD) based on the complex spectrum circle centroid (CSCC) method. It can classify the speech signals that are received at the front of two microphones by comparing the spectral energy of observed signals with that of target signals estimated by CSCC. Also, it can work in real time without training filter coefficients beforehand even in noisy environments (SNR > 0 dB) and can cope with speech noises generated by audio-visual equipments such as televisions and audio devices. Since the CSCC method requires the directions of the noise signals, we also developed a sound source localization system integrated with cross-power spectrum phase (CSP) analysis and an expectation-maximization (EM) algorithm. This system was demonstrated to enable a robot to cope with multiple sound sources using two microphones.\",\"PeriodicalId\":351230,\"journal\":{\"name\":\"2008 IEEE International Conference on Robotics and Automation\",\"volume\":\"97 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-05-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 IEEE International Conference on Robotics and Automation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ROBOT.2008.4543745\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE International Conference on Robotics and Automation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ROBOT.2008.4543745","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Two-channel-based voice activity detection for humanoid robots in noisy home environments
The purpose of this research is to accurately classify the speech signals originating from the front even in noisy home environments. This ability can help robots to improve speech recognition and to spot keywords. We therefore developed a new voice activity detection (VAD) based on the complex spectrum circle centroid (CSCC) method. It can classify the speech signals that are received at the front of two microphones by comparing the spectral energy of observed signals with that of target signals estimated by CSCC. Also, it can work in real time without training filter coefficients beforehand even in noisy environments (SNR > 0 dB) and can cope with speech noises generated by audio-visual equipments such as televisions and audio devices. Since the CSCC method requires the directions of the noise signals, we also developed a sound source localization system integrated with cross-power spectrum phase (CSP) analysis and an expectation-maximization (EM) algorithm. This system was demonstrated to enable a robot to cope with multiple sound sources using two microphones.