自动说话人识别性能与标准功能的比较

2012 IEEE 10th Jubilee International Symposium on Intelligent Systems and Informatics Pub Date : 2012-10-25 DOI:10.1109/SISY.2012.6339541

Milan M. Dobrovic, V. Delić, N. Jakovljević, I. Jokic

{"title":"自动说话人识别性能与标准功能的比较","authors":"Milan M. Dobrovic, V. Delić, N. Jakovljević, I. Jokic","doi":"10.1109/SISY.2012.6339541","DOIUrl":null,"url":null,"abstract":"This paper presents a study of speaker recognition accuracy depending on the choice of features, window width and model complexity. The standard features were considered, such as linear and perceptual prediction coefficients (LPC and PLP) and mel-frequency cepstral coefficients (MFCC). Gaussian mixture model (GMM), with the use of HTK tools, was chosen for speaker modelling. Speech database S70W100s120, recorded at the Electrical Engineering Department of Belgrade University, was used for purposes of system training and testing. Ten speaker models and the universal background model (UBM) were trained.","PeriodicalId":207630,"journal":{"name":"2012 IEEE 10th Jubilee International Symposium on Intelligent Systems and Informatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Comparison of the automatic speaker recognition performance over standard features\",\"authors\":\"Milan M. Dobrovic, V. Delić, N. Jakovljević, I. Jokic\",\"doi\":\"10.1109/SISY.2012.6339541\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a study of speaker recognition accuracy depending on the choice of features, window width and model complexity. The standard features were considered, such as linear and perceptual prediction coefficients (LPC and PLP) and mel-frequency cepstral coefficients (MFCC). Gaussian mixture model (GMM), with the use of HTK tools, was chosen for speaker modelling. Speech database S70W100s120, recorded at the Electrical Engineering Department of Belgrade University, was used for purposes of system training and testing. Ten speaker models and the universal background model (UBM) were trained.\",\"PeriodicalId\":207630,\"journal\":{\"name\":\"2012 IEEE 10th Jubilee International Symposium on Intelligent Systems and Informatics\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-10-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE 10th Jubilee International Symposium on Intelligent Systems and Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SISY.2012.6339541\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 10th Jubilee International Symposium on Intelligent Systems and Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SISY.2012.6339541","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

本文研究了基于特征选择、窗宽和模型复杂度的说话人识别精度。考虑了线性和感知预测系数(LPC和PLP)以及mel-frequency倒谱系数(MFCC)等标准特征。使用HTK工具，选择高斯混合模型(GMM)对说话人进行建模。语音数据库S70W100s120记录于贝尔格莱德大学电气工程系，用于系统培训和测试。训练10个说话人模型和通用背景模型(UBM)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Comparison of the automatic speaker recognition performance over standard features

This paper presents a study of speaker recognition accuracy depending on the choice of features, window width and model complexity. The standard features were considered, such as linear and perceptual prediction coefficients (LPC and PLP) and mel-frequency cepstral coefficients (MFCC). Gaussian mixture model (GMM), with the use of HTK tools, was chosen for speaker modelling. Speech database S70W100s120, recorded at the Electrical Engineering Department of Belgrade University, was used for purposes of system training and testing. Ten speaker models and the universal background model (UBM) were trained.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2012 IEEE 10th Jubilee International Symposium on Intelligent Systems and Informatics

自引率

0.00%

发文量