Multi-dimensional I-vector closed set speaker identification based on an extreme learning machine with and without fusion technologies

Musab T. S. Al-Kaltakchi, W. L. Woo, S. Dlay, J. Chambers
{"title":"Multi-dimensional I-vector closed set speaker identification based on an extreme learning machine with and without fusion technologies","authors":"Musab T. S. Al-Kaltakchi, W. L. Woo, S. Dlay, J. Chambers","doi":"10.1109/INTELLISYS.2017.8324273","DOIUrl":null,"url":null,"abstract":"In this article, I-vector Speaker Identification (SID) is exploited as a compact, low dimension, fixed length and modern state of the art system. The main structures for this study consist of four combinations of features which depend on Power Normalization Cepstral Coefficient (PNCC) and Mel Frequency Cepstral Coefficient (MFCC) features, with two different compensation approaches which have been previously proposed. The main system is modelled by I-vectors with low dimensions, and we also propose fusion strategies with different higher I-vector dimensions to improve the recognition rate. In addition, cumulative, concatenated, and interleaved fusion techniques are investigated to improve the conventional late fusion presented in our previous work. Moreover, the proposed system employs an Extreme Learning Machine (ELM) for classification purpose, which is efficient, less complex and less time consuming compared with traditional neural network based approaches. The system is evaluated on the TIMIT database for clean and AWGN environments and achieved a recognition rate of 96.67% and 80.83% respectively. The system shows improvements compared with the Gaussian Mixture Model-Universal Background Model (GMM-UBM) in our previously proposed scheme, with an improvement of 1.76% in clean speech and 2.1% for 30dB AWGN and with the highest improvement at 10dB with 43.81%.","PeriodicalId":131825,"journal":{"name":"2017 Intelligent Systems Conference (IntelliSys)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 Intelligent Systems Conference (IntelliSys)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INTELLISYS.2017.8324273","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

In this article, I-vector Speaker Identification (SID) is exploited as a compact, low dimension, fixed length and modern state of the art system. The main structures for this study consist of four combinations of features which depend on Power Normalization Cepstral Coefficient (PNCC) and Mel Frequency Cepstral Coefficient (MFCC) features, with two different compensation approaches which have been previously proposed. The main system is modelled by I-vectors with low dimensions, and we also propose fusion strategies with different higher I-vector dimensions to improve the recognition rate. In addition, cumulative, concatenated, and interleaved fusion techniques are investigated to improve the conventional late fusion presented in our previous work. Moreover, the proposed system employs an Extreme Learning Machine (ELM) for classification purpose, which is efficient, less complex and less time consuming compared with traditional neural network based approaches. The system is evaluated on the TIMIT database for clean and AWGN environments and achieved a recognition rate of 96.67% and 80.83% respectively. The system shows improvements compared with the Gaussian Mixture Model-Universal Background Model (GMM-UBM) in our previously proposed scheme, with an improvement of 1.76% in clean speech and 2.1% for 30dB AWGN and with the highest improvement at 10dB with 43.81%.
基于融合和不融合技术的极限学习机多维i向量闭集说话人识别
在本文中,i -向量说话人识别(SID)是一个紧凑的、低维的、固定长度的、现代化的系统。本研究的主要结构包括依赖于功率归一化倒谱系数(PNCC)和Mel频率倒谱系数(MFCC)特征的四种特征组合,以及之前提出的两种不同的补偿方法。该系统主要采用低维i向量建模,并提出了不同高维i向量的融合策略以提高识别率。此外,我们还研究了累积、串联和交错融合技术,以改进我们之前工作中提出的传统晚期融合。此外,该系统采用极限学习机(ELM)进行分类,与传统的基于神经网络的方法相比,该方法效率高,复杂度低,耗时短。该系统在TIMIT数据库中对clean和AWGN环境进行了评价,识别率分别为96.67%和80.83%。与我们之前提出的高斯混合模型-通用背景模型(GMM-UBM)相比,该系统在干净语音下提高了1.76%,在30dB AWGN下提高了2.1%,在10dB时提高幅度最大,达到43.81%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信