{"title":"改进使用未校准麦克风阵列的会议对话中重叠语音的分离","authors":"Keisuke Nakamura, R. Gomez","doi":"10.1109/ASRU.2017.8268916","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a novel approach of sound source separation for meeting conversations even when using an uncalibrated microphone array. Our method can blindly estimate three parameters for separation, namely Steering Vectors (SVs), speaker indices, and activity periods of each speaker. First, we estimate the number of speakers and SVs by clustering Time Delay Of Arrival (TDOA) of the observed signal and selecting major clusters to compute TDOA-based SVs. Then, speaker indices and activity periods are estimated by thresholding spatial spectrum using estimated SVs, whose threshold is blindly obtained. Finally, we separate overlapped speeches/noise based on dynamic design of noise correlation matrices of the minimum variance distortionless response (MVDR) beamformer using blindly estimated parameters. The proposed algorithm was evaluated in both separation objective measure and recognition correct rate and showed improvements in both single and simultaneous speech scenarios in a reverberant meeting room. Moreover, the blindly estimated parameters improved separation and recognition compared to geometrically obtained parameters.","PeriodicalId":290868,"journal":{"name":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Improving separation of overlapped speech for meeting conversations using uncalibrated microphone array\",\"authors\":\"Keisuke Nakamura, R. Gomez\",\"doi\":\"10.1109/ASRU.2017.8268916\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we propose a novel approach of sound source separation for meeting conversations even when using an uncalibrated microphone array. Our method can blindly estimate three parameters for separation, namely Steering Vectors (SVs), speaker indices, and activity periods of each speaker. First, we estimate the number of speakers and SVs by clustering Time Delay Of Arrival (TDOA) of the observed signal and selecting major clusters to compute TDOA-based SVs. Then, speaker indices and activity periods are estimated by thresholding spatial spectrum using estimated SVs, whose threshold is blindly obtained. Finally, we separate overlapped speeches/noise based on dynamic design of noise correlation matrices of the minimum variance distortionless response (MVDR) beamformer using blindly estimated parameters. The proposed algorithm was evaluated in both separation objective measure and recognition correct rate and showed improvements in both single and simultaneous speech scenarios in a reverberant meeting room. Moreover, the blindly estimated parameters improved separation and recognition compared to geometrically obtained parameters.\",\"PeriodicalId\":290868,\"journal\":{\"name\":\"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASRU.2017.8268916\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2017.8268916","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Improving separation of overlapped speech for meeting conversations using uncalibrated microphone array
In this paper, we propose a novel approach of sound source separation for meeting conversations even when using an uncalibrated microphone array. Our method can blindly estimate three parameters for separation, namely Steering Vectors (SVs), speaker indices, and activity periods of each speaker. First, we estimate the number of speakers and SVs by clustering Time Delay Of Arrival (TDOA) of the observed signal and selecting major clusters to compute TDOA-based SVs. Then, speaker indices and activity periods are estimated by thresholding spatial spectrum using estimated SVs, whose threshold is blindly obtained. Finally, we separate overlapped speeches/noise based on dynamic design of noise correlation matrices of the minimum variance distortionless response (MVDR) beamformer using blindly estimated parameters. The proposed algorithm was evaluated in both separation objective measure and recognition correct rate and showed improvements in both single and simultaneous speech scenarios in a reverberant meeting room. Moreover, the blindly estimated parameters improved separation and recognition compared to geometrically obtained parameters.