S. Araki, Nobutaka Ono, K. Kinoshita, Marc Delcroix
{"title":"采用异步分布式麦克风阵列进行会议识别","authors":"S. Araki, Nobutaka Ono, K. Kinoshita, Marc Delcroix","doi":"10.1109/ASRU.2017.8268913","DOIUrl":null,"url":null,"abstract":"Recently, recognition of conversational speech such as meetings has widely been studied. However, most existing approaches rely on using a single close talking microphone or a distant microphone array where all the microphones are synchronous. In contrast, this paper tackles a recognition task of conversational speech recorded with asynchronous distributed microphones, to which conventional array processing is not directly applicable. We demonstrate that we can significantly improve recognition performance even when microphones are asynchronous by combining blind synchronization and state-of-the-art microphone array speech enhancement techniques such as independent vector analysis (IVA) and a time-frequency mask based minimum variance distortionless response (MVDR) beamformer. Using such a front-end, we could reduce the word error rate from 42.2 % to 29.9 % for real meeting recordings.","PeriodicalId":290868,"journal":{"name":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":"{\"title\":\"Meeting recognition with asynchronous distributed microphone array\",\"authors\":\"S. Araki, Nobutaka Ono, K. Kinoshita, Marc Delcroix\",\"doi\":\"10.1109/ASRU.2017.8268913\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, recognition of conversational speech such as meetings has widely been studied. However, most existing approaches rely on using a single close talking microphone or a distant microphone array where all the microphones are synchronous. In contrast, this paper tackles a recognition task of conversational speech recorded with asynchronous distributed microphones, to which conventional array processing is not directly applicable. We demonstrate that we can significantly improve recognition performance even when microphones are asynchronous by combining blind synchronization and state-of-the-art microphone array speech enhancement techniques such as independent vector analysis (IVA) and a time-frequency mask based minimum variance distortionless response (MVDR) beamformer. Using such a front-end, we could reduce the word error rate from 42.2 % to 29.9 % for real meeting recordings.\",\"PeriodicalId\":290868,\"journal\":{\"name\":\"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"volume\":\"61 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASRU.2017.8268913\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2017.8268913","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Meeting recognition with asynchronous distributed microphone array
Recently, recognition of conversational speech such as meetings has widely been studied. However, most existing approaches rely on using a single close talking microphone or a distant microphone array where all the microphones are synchronous. In contrast, this paper tackles a recognition task of conversational speech recorded with asynchronous distributed microphones, to which conventional array processing is not directly applicable. We demonstrate that we can significantly improve recognition performance even when microphones are asynchronous by combining blind synchronization and state-of-the-art microphone array speech enhancement techniques such as independent vector analysis (IVA) and a time-frequency mask based minimum variance distortionless response (MVDR) beamformer. Using such a front-end, we could reduce the word error rate from 42.2 % to 29.9 % for real meeting recordings.