Fusion features for robust speaker identification

IF 0.6 Q3 Engineering
I. Fredj, Youssef Zouhir, K. Ouni
{"title":"Fusion features for robust speaker identification","authors":"I. Fredj, Youssef Zouhir, K. Ouni","doi":"10.1504/IJSISE.2018.10013027","DOIUrl":null,"url":null,"abstract":"Speaker's identification systems aim to identify, through a set of speech parameters, the speaker's identity. Thus, a relevant speech representation is required. For this purpose, we suggest to combine spectral parameters as the Mel frequency Cepstral coefficients (MFCC) and the perceptual linear predictive (PLP) coefficients and prosodic parameter such as the signal fundamental frequency (F0). There are two main classes for F0 estimation divided into temporal and spectral methods. We employ the sawtooth waveform inspired pitch estimator (SWIPE) algorithm for F0 estimation. It is based on the pitch estimation in the frequency domain. In addition, we evaluate the Gaussian mixture model-universal background model (GMM-UBM) for the modelling purpose. Experiments are involved in Timit database. Identification rates are promising and prove the benefit of the combination for MFCC and PLP rather than using each feature separately and this mainly for noisy data.","PeriodicalId":56359,"journal":{"name":"International Journal of Signal and Imaging Systems Engineering","volume":"11 1","pages":"65"},"PeriodicalIF":0.6000,"publicationDate":"2018-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Signal and Imaging Systems Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/IJSISE.2018.10013027","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Engineering","Score":null,"Total":0}
引用次数: 0

Abstract

Speaker's identification systems aim to identify, through a set of speech parameters, the speaker's identity. Thus, a relevant speech representation is required. For this purpose, we suggest to combine spectral parameters as the Mel frequency Cepstral coefficients (MFCC) and the perceptual linear predictive (PLP) coefficients and prosodic parameter such as the signal fundamental frequency (F0). There are two main classes for F0 estimation divided into temporal and spectral methods. We employ the sawtooth waveform inspired pitch estimator (SWIPE) algorithm for F0 estimation. It is based on the pitch estimation in the frequency domain. In addition, we evaluate the Gaussian mixture model-universal background model (GMM-UBM) for the modelling purpose. Experiments are involved in Timit database. Identification rates are promising and prove the benefit of the combination for MFCC and PLP rather than using each feature separately and this mainly for noisy data.
用于鲁棒说话人识别的融合特征
说话人识别系统旨在通过一组语音参数来识别说话人的身份。因此,需要相关的语音表示。为此,我们建议将频谱参数组合为梅尔频率倒谱系数(MFCC)和感知线性预测(PLP)系数以及韵律参数,如信号基频(F0)。F0估计主要有两类,分为时间法和谱法。我们采用锯齿波形启发的基音估计器(SWIPE)算法进行F0估计。它基于频域中的基音估计。此外,我们还评估了用于建模目的的高斯混合模型通用背景模型(GMM-UBM)。Timit数据库中包含实验。识别率是有希望的,并证明了MFCC和PLP组合的好处,而不是单独使用每个特征,这主要用于噪声数据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
2.10
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信