H. Saruwatari, N. Hirata, Toshiyuki Hatta, Ryo Wakisaka, K. Shikano, T. Takatani
{"title":"基于视觉信息和噪声统计的机器人半盲语音提取","authors":"H. Saruwatari, N. Hirata, Toshiyuki Hatta, Ryo Wakisaka, K. Shikano, T. Takatani","doi":"10.1109/ISSPIT.2011.6151571","DOIUrl":null,"url":null,"abstract":"In this paper, speech recognition accuracy improvement is addressed for ICA-based multichannel noise reduction in spoken-dialogue robot. First, a new permutation solving method using a probability statistics model is proposed for realistic sound mixtures consisting of point-source speech and diffuse noise. Next, to achieve high recognition accuracy for the early utterance of the target speaker, we introduce a new rapid ICA initialization method combining robot video information and a prestored initial separation filter bank. From this image information, an ICA initial filter fitted to the user's direction can be used to save the user's first utterance. The experimental results show that the proposed approaches can markedly improve the word recognition accuracy.","PeriodicalId":288042,"journal":{"name":"2011 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Semi-blind speech extraction for robot using visual information and noise statistics\",\"authors\":\"H. Saruwatari, N. Hirata, Toshiyuki Hatta, Ryo Wakisaka, K. Shikano, T. Takatani\",\"doi\":\"10.1109/ISSPIT.2011.6151571\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, speech recognition accuracy improvement is addressed for ICA-based multichannel noise reduction in spoken-dialogue robot. First, a new permutation solving method using a probability statistics model is proposed for realistic sound mixtures consisting of point-source speech and diffuse noise. Next, to achieve high recognition accuracy for the early utterance of the target speaker, we introduce a new rapid ICA initialization method combining robot video information and a prestored initial separation filter bank. From this image information, an ICA initial filter fitted to the user's direction can be used to save the user's first utterance. The experimental results show that the proposed approaches can markedly improve the word recognition accuracy.\",\"PeriodicalId\":288042,\"journal\":{\"name\":\"2011 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)\",\"volume\":\"75 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-12-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISSPIT.2011.6151571\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSPIT.2011.6151571","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Semi-blind speech extraction for robot using visual information and noise statistics
In this paper, speech recognition accuracy improvement is addressed for ICA-based multichannel noise reduction in spoken-dialogue robot. First, a new permutation solving method using a probability statistics model is proposed for realistic sound mixtures consisting of point-source speech and diffuse noise. Next, to achieve high recognition accuracy for the early utterance of the target speaker, we introduce a new rapid ICA initialization method combining robot video information and a prestored initial separation filter bank. From this image information, an ICA initial filter fitted to the user's direction can be used to save the user's first utterance. The experimental results show that the proposed approaches can markedly improve the word recognition accuracy.