{"title":"基于三维n -最优搜索算法的远距离多声源语音同时识别","authors":"P. Heracleous, S. Nakamura, K. Shikano","doi":"10.1109/ASRU.2001.1034600","DOIUrl":null,"url":null,"abstract":"This paper deals with the simultaneous recognition of distant-talking speech of multiple talkers using the 3D N-best search algorithm. We describe the basic idea of the 3D N-best search and we address two additional techniques implemented into the baseline system. Namely, a path distance-based clustering and a likelihood normalization technique appeared to be necessary in order to build an efficient system for our purpose. In previous works we introduced the results of experiments carried out on simulated data. In this paper we introduce the results of the experiments carried out using reverberated data. The reverberated data are those simulated by the image method and recorded in a real room. The image method was used to find out the accuracy-reverberation time relationship, and the real data was used to evaluate the real performance of our algorithm. The obtained Top 3 results of the simultaneous word accuracy was 73.02% under 162 ms reverberation time and using the image method.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Simultaneous recognition of distant talking speech of multiple sound sources based on 3-D N-best search algorithm\",\"authors\":\"P. Heracleous, S. Nakamura, K. Shikano\",\"doi\":\"10.1109/ASRU.2001.1034600\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper deals with the simultaneous recognition of distant-talking speech of multiple talkers using the 3D N-best search algorithm. We describe the basic idea of the 3D N-best search and we address two additional techniques implemented into the baseline system. Namely, a path distance-based clustering and a likelihood normalization technique appeared to be necessary in order to build an efficient system for our purpose. In previous works we introduced the results of experiments carried out on simulated data. In this paper we introduce the results of the experiments carried out using reverberated data. The reverberated data are those simulated by the image method and recorded in a real room. The image method was used to find out the accuracy-reverberation time relationship, and the real data was used to evaluate the real performance of our algorithm. The obtained Top 3 results of the simultaneous word accuracy was 73.02% under 162 ms reverberation time and using the image method.\",\"PeriodicalId\":118671,\"journal\":{\"name\":\"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2001-12-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASRU.2001.1034600\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2001.1034600","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Simultaneous recognition of distant talking speech of multiple sound sources based on 3-D N-best search algorithm
This paper deals with the simultaneous recognition of distant-talking speech of multiple talkers using the 3D N-best search algorithm. We describe the basic idea of the 3D N-best search and we address two additional techniques implemented into the baseline system. Namely, a path distance-based clustering and a likelihood normalization technique appeared to be necessary in order to build an efficient system for our purpose. In previous works we introduced the results of experiments carried out on simulated data. In this paper we introduce the results of the experiments carried out using reverberated data. The reverberated data are those simulated by the image method and recorded in a real room. The image method was used to find out the accuracy-reverberation time relationship, and the real data was used to evaluate the real performance of our algorithm. The obtained Top 3 results of the simultaneous word accuracy was 73.02% under 162 ms reverberation time and using the image method.