{"title":"通过语音到语音的翻译,将麦克风阵列和摄像机协同导向多语种电话会议","authors":"T. Nishiura, R. Gruhn, S. Nakamura","doi":"10.1109/ASRU.2001.1034602","DOIUrl":null,"url":null,"abstract":"It is very important for multilingual teleconferencing through speech-to-speech translation to capture distant-talking speech with high quality. In addition, the speaker image is also needed to realize a natural communication in such a conference. A microphone array is an ideal candidate for capturing distant-talking speech. Uttered speech can be enhanced and speaker images can be captured by steering a microphone array and a video camera in the speaker direction. However, to realize automatic steering, it is necessary to localize the talker. To overcome this problem, we propose collaborative steering of the microphone array and the video camera in real-time for a multilingual teleconference through speech-to-speech translation. We conducted experiments in a real room environment. The speaker localization rate (i.e., speaker image capturing rate) was 97.7%, speech recognition rate was 90.0%, and TOEIC score was 530/spl sim/540 points, subject to locating the speaker at a 2.0 meter distance from the microphone array.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Collaborative steering of microphone array and video camera toward multi-lingual tele-conference through speech-to-speech translation\",\"authors\":\"T. Nishiura, R. Gruhn, S. Nakamura\",\"doi\":\"10.1109/ASRU.2001.1034602\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"It is very important for multilingual teleconferencing through speech-to-speech translation to capture distant-talking speech with high quality. In addition, the speaker image is also needed to realize a natural communication in such a conference. A microphone array is an ideal candidate for capturing distant-talking speech. Uttered speech can be enhanced and speaker images can be captured by steering a microphone array and a video camera in the speaker direction. However, to realize automatic steering, it is necessary to localize the talker. To overcome this problem, we propose collaborative steering of the microphone array and the video camera in real-time for a multilingual teleconference through speech-to-speech translation. We conducted experiments in a real room environment. The speaker localization rate (i.e., speaker image capturing rate) was 97.7%, speech recognition rate was 90.0%, and TOEIC score was 530/spl sim/540 points, subject to locating the speaker at a 2.0 meter distance from the microphone array.\",\"PeriodicalId\":118671,\"journal\":{\"name\":\"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.\",\"volume\":\"35 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2001-12-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASRU.2001.1034602\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2001.1034602","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Collaborative steering of microphone array and video camera toward multi-lingual tele-conference through speech-to-speech translation
It is very important for multilingual teleconferencing through speech-to-speech translation to capture distant-talking speech with high quality. In addition, the speaker image is also needed to realize a natural communication in such a conference. A microphone array is an ideal candidate for capturing distant-talking speech. Uttered speech can be enhanced and speaker images can be captured by steering a microphone array and a video camera in the speaker direction. However, to realize automatic steering, it is necessary to localize the talker. To overcome this problem, we propose collaborative steering of the microphone array and the video camera in real-time for a multilingual teleconference through speech-to-speech translation. We conducted experiments in a real room environment. The speaker localization rate (i.e., speaker image capturing rate) was 97.7%, speech recognition rate was 90.0%, and TOEIC score was 530/spl sim/540 points, subject to locating the speaker at a 2.0 meter distance from the microphone array.