Speaker indexing and speech enhancement in real meetings / conversations

2008 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2008-05-12 DOI:10.1109/ICASSP.2008.4517554

S. Araki, M. Fujimoto, K. Ishizuka, H. Sawada, S. Makino

引用次数: 24

Abstract

This paper presents a speaker indexing method that uses a small number of microphones to estimate who spoke when. Our proposed speaker indexing is realized by using a noise robust voice activity detector (VAD), a QCC-PHAT based direction of arrival (DOA) estimator, and a DOA classifier. Using the estimated speaker indexing information, we can also enhance the utterances of each speaker with a maximum signal-to-noise-ratio (MaxSNR) beamformer. This paper applies our system to real recorded meetings / conversations recorded in a room with a reverberation time of 350 ms, and evaluates the performance by a standard measure: the diarization error rate (DER). Even for the real conversations, which have many speaker turn-takings and overlaps, the speaker error time was very small with our proposed system. We are planning to demonstrate a real-time speaker indexing system at ICASSP2008.

查看原文本刊更多论文

真实会议/对话中的演讲者索引和语音增强

本文提出了一种利用少量麦克风来估计发言时间的说话人索引方法。我们提出的说话人索引是通过使用噪声鲁棒语音活动检测器(VAD)、基于QCC-PHAT的到达方向(DOA)估计器和DOA分类器实现的。利用估计的说话人索引信息，我们还可以使用最大信噪比波束形成器来增强每个说话人的话语。本文将该系统应用于混响时间为350 ms的室内会议/对话的真实录音，并通过拨号错误率(DER)的标准度量来评估其性能。即使是在真实的对话中，在有很多说话人轮流和重叠的情况下，我们提出的系统也能使说话人的错误时间非常小。我们计划在ICASSP2008上演示一个实时说话人索引系统。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2008 IEEE International Conference on Acoustics, Speech and Signal Processing

自引率

0.00%

发文量