基于麦克风阵列源分离和缺失特征理论的增强机器人语音识别

Proceedings of the 2005 IEEE International Conference on Robotics and Automation Pub Date : 2005-04-18 DOI:10.1109/ROBOT.2005.1570323

S. Yamamoto, J. Valin, K. Nakadai, J. Rouat, F. Michaud, T. Ogata, HIroshi G. Okuno

{"title":"基于麦克风阵列源分离和缺失特征理论的增强机器人语音识别","authors":"S. Yamamoto, J. Valin, K. Nakadai, J. Rouat, F. Michaud, T. Ogata, HIroshi G. Okuno","doi":"10.1109/ROBOT.2005.1570323","DOIUrl":null,"url":null,"abstract":"A humanoid robot under real-world environments usually hears mixtures of sounds, and thus three capabilities are essential for robot audition; sound source localization, separation, and recognition of separated sounds. While the first two are frequently addressed, the last one has not been studied so much. We present a system that gives a humanoid robot the ability to localize, separate and recognize simultaneous sound sources. A microphone array is used along with a real-time dedicated implementation of Geometric Source Separation (GSS) and a multi-channel post-filter that gives us a further reduction of interferences from other sources. An automatic speech recognizer (ASR) based on the Missing Feature Theory (MFT) recognizes separated sounds in real-time by generating missing feature masks automatically from the post-filtering step. The main advantage of this approach for humanoid robots resides in the fact that the ASR with a clean acoustic model can adapt the distortion of separated sound by consulting the post-filter feature masks. Recognition rates are presented for three simultaneous speakers located at 2m from the robot. Use of both the post-filter and the missing feature mask results in an average reduction in error rate of 42% (relative).","PeriodicalId":350878,"journal":{"name":"Proceedings of the 2005 IEEE International Conference on Robotics and Automation","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"79","resultStr":"{\"title\":\"Enhanced Robot Speech Recognition Based on Microphone Array Source Separation and Missing Feature Theory\",\"authors\":\"S. Yamamoto, J. Valin, K. Nakadai, J. Rouat, F. Michaud, T. Ogata, HIroshi G. Okuno\",\"doi\":\"10.1109/ROBOT.2005.1570323\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A humanoid robot under real-world environments usually hears mixtures of sounds, and thus three capabilities are essential for robot audition; sound source localization, separation, and recognition of separated sounds. While the first two are frequently addressed, the last one has not been studied so much. We present a system that gives a humanoid robot the ability to localize, separate and recognize simultaneous sound sources. A microphone array is used along with a real-time dedicated implementation of Geometric Source Separation (GSS) and a multi-channel post-filter that gives us a further reduction of interferences from other sources. An automatic speech recognizer (ASR) based on the Missing Feature Theory (MFT) recognizes separated sounds in real-time by generating missing feature masks automatically from the post-filtering step. The main advantage of this approach for humanoid robots resides in the fact that the ASR with a clean acoustic model can adapt the distortion of separated sound by consulting the post-filter feature masks. Recognition rates are presented for three simultaneous speakers located at 2m from the robot. Use of both the post-filter and the missing feature mask results in an average reduction in error rate of 42% (relative).\",\"PeriodicalId\":350878,\"journal\":{\"name\":\"Proceedings of the 2005 IEEE International Conference on Robotics and Automation\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-04-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"79\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2005 IEEE International Conference on Robotics and Automation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ROBOT.2005.1570323\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2005 IEEE International Conference on Robotics and Automation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ROBOT.2005.1570323","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 79

摘要

在现实环境中，人形机器人通常会听到混合的声音，因此机器人的听力有三个必要的能力;声源定位，分离和识别分离的声音。虽然前两个问题经常被提及，但最后一个问题却没有得到太多研究。我们提出了一个系统，赋予人形机器人的能力，定位，分离和识别同时声源。麦克风阵列与实时专用的几何源分离(GSS)实现和多通道后滤波器一起使用，进一步减少了来自其他来源的干扰。基于缺失特征理论(MFT)的自动语音识别器(ASR)通过在后滤波步骤中自动生成缺失特征掩模来实时识别被分离的声音。这种方法对人形机器人的主要优点在于，具有干净声学模型的ASR可以通过咨询后滤波特征掩模来适应分离声音的失真。给出了距离机器人2米的三个同时说话人的识别率。使用后滤波器和缺失的特征掩码可以使错误率平均降低42%(相对)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Enhanced Robot Speech Recognition Based on Microphone Array Source Separation and Missing Feature Theory

A humanoid robot under real-world environments usually hears mixtures of sounds, and thus three capabilities are essential for robot audition; sound source localization, separation, and recognition of separated sounds. While the first two are frequently addressed, the last one has not been studied so much. We present a system that gives a humanoid robot the ability to localize, separate and recognize simultaneous sound sources. A microphone array is used along with a real-time dedicated implementation of Geometric Source Separation (GSS) and a multi-channel post-filter that gives us a further reduction of interferences from other sources. An automatic speech recognizer (ASR) based on the Missing Feature Theory (MFT) recognizes separated sounds in real-time by generating missing feature masks automatically from the post-filtering step. The main advantage of this approach for humanoid robots resides in the fact that the ASR with a clean acoustic model can adapt the distortion of separated sound by consulting the post-filter feature masks. Recognition rates are presented for three simultaneous speakers located at 2m from the robot. Use of both the post-filter and the missing feature mask results in an average reduction in error rate of 42% (relative).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2005 IEEE International Conference on Robotics and Automation

自引率

0.00%

发文量