Contour modeling of prosodic and acoustic features for speaker recognition

2008 IEEE Spoken Language Technology Workshop Pub Date : 2008-12-01 DOI:10.1109/SLT.2008.4777836

M. Kockmann, L. Burget

引用次数: 12

Abstract

In this paper we use acoustic and prosodic features jointly in a long-temporal lexical context for automatic speaker recognition from speech. The contours of pitch, energy and cepstral coefficients are continuously modeled over the time span of a syllable to capture the speaking style on phonetic level. As these features are affected by session variability, established channel compensation techniques are examined. Results for the combination of different features on a syllable-level as well as for channel compensation are presented for the NIST SRE 2006 speaker identification task. To show the complementary character of the features, the proposed system is fused with an acoustic short-time system, leading to a relative improvement of 10.4%.

查看原文本刊更多论文

用于说话人识别的韵律和声学特征轮廓建模

在本文中，我们将声学和韵律特征结合在一个长时间的词汇语境中用于语音的自动识别。音高、能量和倒谱系数的轮廓在一个音节的时间跨度上连续建模，以捕捉语音水平上的说话风格。由于这些特征受到会话可变性的影响，因此对已建立的通道补偿技术进行了检查。在NIST SRE 2006的说话人识别任务中，给出了不同特征在音节水平上的组合以及信道补偿的结果。为了显示特征的互补性，将所提出的系统与声学短时系统相融合，相对提高了10.4%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2008 IEEE Spoken Language Technology Workshop

自引率

0.00%

发文量