自组织单元的MLLR变换作为特征在说话人识别中

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2012-03-25 DOI:10.1109/ICASSP.2012.6288891

M. Siu, Omer Lang, H. Gish, S. Lowe, Arthur Chan, O. Kimball

{"title":"自组织单元的MLLR变换作为特征在说话人识别中","authors":"M. Siu, Omer Lang, H. Gish, S. Lowe, Arthur Chan, O. Kimball","doi":"10.1109/ICASSP.2012.6288891","DOIUrl":null,"url":null,"abstract":"Using speaker adaptation parameters, such as maximum likelihood linear regression (MLLR) adaptation matrices, as features for speaker recognition (SR) has been shown to perform well and can also provide complementary information for fusion with other acoustic-based SR systems, such as GMM-based systems. In order to estimate the adaptation parameters, a speech recognizer in the SR domain is required which in turn requires transcribed training data for recognizer training. This limits the approach only to domains where training transcriptions are available. To generalize the adaptation parameter approach to domains without transcriptions, we propose the use of self-organized unit recognizers that can be trained without supervision (or transcribed data). We report results on the 2002 NIST speaker recognition evaluation (SRE2002) extended data set and show that using MLLR parameters estimated from SOU recognizers give comparable performance to systems using a matched recognizers. SOU recognizers also outperform those using cross-lingual recognizers. When we fused the SOU- and word recognizers, SR equal error rate (EER) can be reduced by another 15%. This suggests SOU recognizers can be useful whether or not transcribed data for recognition training are available.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"267 1","pages":"4385-4388"},"PeriodicalIF":0.0000,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"MLLR transforms of self-organized units as features in speaker recognition\",\"authors\":\"M. Siu, Omer Lang, H. Gish, S. Lowe, Arthur Chan, O. Kimball\",\"doi\":\"10.1109/ICASSP.2012.6288891\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Using speaker adaptation parameters, such as maximum likelihood linear regression (MLLR) adaptation matrices, as features for speaker recognition (SR) has been shown to perform well and can also provide complementary information for fusion with other acoustic-based SR systems, such as GMM-based systems. In order to estimate the adaptation parameters, a speech recognizer in the SR domain is required which in turn requires transcribed training data for recognizer training. This limits the approach only to domains where training transcriptions are available. To generalize the adaptation parameter approach to domains without transcriptions, we propose the use of self-organized unit recognizers that can be trained without supervision (or transcribed data). We report results on the 2002 NIST speaker recognition evaluation (SRE2002) extended data set and show that using MLLR parameters estimated from SOU recognizers give comparable performance to systems using a matched recognizers. SOU recognizers also outperform those using cross-lingual recognizers. When we fused the SOU- and word recognizers, SR equal error rate (EER) can be reduced by another 15%. This suggests SOU recognizers can be useful whether or not transcribed data for recognition training are available.\",\"PeriodicalId\":6443,\"journal\":{\"name\":\"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"volume\":\"267 1\",\"pages\":\"4385-4388\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-03-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP.2012.6288891\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2012.6288891","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

使用说话人自适应参数，如最大似然线性回归(MLLR)自适应矩阵，作为说话人识别(SR)的特征已被证明表现良好，并且还可以为与其他基于声学的SR系统(如基于gmm的系统)的融合提供补充信息。为了估计自适应参数，需要一个SR域中的语音识别器，这反过来又需要转录的训练数据用于识别器的训练。这限制了该方法仅适用于训练转录可用的领域。为了将自适应参数方法推广到没有转录的域，我们建议使用可以在没有监督(或转录数据)的情况下训练的自组织单元识别器。我们报告了2002年NIST说话人识别评估(SRE2002)扩展数据集的结果，并表明使用从SOU识别器估计的MLLR参数与使用匹配识别器的系统具有相当的性能。SOU识别器的表现也优于那些使用跨语言识别器的识别器。将词识别器与词识别器融合后，平均误差率(EER)又降低了15%。这表明无论是否有识别训练的转录数据可用，SOU识别器都是有用的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

MLLR transforms of self-organized units as features in speaker recognition

Using speaker adaptation parameters, such as maximum likelihood linear regression (MLLR) adaptation matrices, as features for speaker recognition (SR) has been shown to perform well and can also provide complementary information for fusion with other acoustic-based SR systems, such as GMM-based systems. In order to estimate the adaptation parameters, a speech recognizer in the SR domain is required which in turn requires transcribed training data for recognizer training. This limits the approach only to domains where training transcriptions are available. To generalize the adaptation parameter approach to domains without transcriptions, we propose the use of self-organized unit recognizers that can be trained without supervision (or transcribed data). We report results on the 2002 NIST speaker recognition evaluation (SRE2002) extended data set and show that using MLLR parameters estimated from SOU recognizers give comparable performance to systems using a matched recognizers. SOU recognizers also outperform those using cross-lingual recognizers. When we fused the SOU- and word recognizers, SR equal error rate (EER) can be reduced by another 15%. This suggests SOU recognizers can be useful whether or not transcribed data for recognition training are available.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

自引率

0.00%

发文量