Mismatched feature detection with finer granularity for emotional speaker recognition

Li Chen, Yingchun Yang, Zhaohui Wu
{"title":"Mismatched feature detection with finer granularity for emotional speaker recognition","authors":"Li Chen, Yingchun Yang, Zhaohui Wu","doi":"10.1631/jzus.C1400002","DOIUrl":null,"url":null,"abstract":"The shapes of speakers’ vocal organs change under their different emotional states, which leads to the deviation of the emotional acoustic space of short-time features from the neutral acoustic space and thereby the degradation of the speaker recognition performance. Features deviating greatly from the neutral acoustic space are considered as mismatched features, and they negatively affect speaker recognition systems. Emotion variation produces different feature deformations for different phonemes, so it is reasonable to build a finer model to detect mismatched features under each phoneme. However, given the difficulty of phoneme recognition, three sorts of acoustic class recognition—phoneme classes, Gaussian mixture model (GMM) tokenizer, and probabilistic GMM tokenizer—are proposed to replace phoneme recognition. We propose feature pruning and feature regulation methods to process the mismatched features to improve speaker recognition performance. As for the feature regulation method, a strategy of maximizing the between-class distance and minimizing the within-class distance is adopted to train the transformation matrix to regulate the mismatched features. Experiments conducted on the Mandarin affective speech corpus (MASC) show that our feature pruning and feature regulation methods increase the identification rate (IR) by 3.64% and 6.77%, compared with the baseline GMM-UBM (universal background model) algorithm. Also, corresponding IR increases of 2.09% and 3.32% can be obtained with our methods when applied to the state-of-the-art algorithm i-vector.","PeriodicalId":49947,"journal":{"name":"Journal of Zhejiang University-Science C-Computers & Electronics","volume":"15 1","pages":"903 - 916"},"PeriodicalIF":0.0000,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1631/jzus.C1400002","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Zhejiang University-Science C-Computers & Electronics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1631/jzus.C1400002","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

The shapes of speakers’ vocal organs change under their different emotional states, which leads to the deviation of the emotional acoustic space of short-time features from the neutral acoustic space and thereby the degradation of the speaker recognition performance. Features deviating greatly from the neutral acoustic space are considered as mismatched features, and they negatively affect speaker recognition systems. Emotion variation produces different feature deformations for different phonemes, so it is reasonable to build a finer model to detect mismatched features under each phoneme. However, given the difficulty of phoneme recognition, three sorts of acoustic class recognition—phoneme classes, Gaussian mixture model (GMM) tokenizer, and probabilistic GMM tokenizer—are proposed to replace phoneme recognition. We propose feature pruning and feature regulation methods to process the mismatched features to improve speaker recognition performance. As for the feature regulation method, a strategy of maximizing the between-class distance and minimizing the within-class distance is adopted to train the transformation matrix to regulate the mismatched features. Experiments conducted on the Mandarin affective speech corpus (MASC) show that our feature pruning and feature regulation methods increase the identification rate (IR) by 3.64% and 6.77%, compared with the baseline GMM-UBM (universal background model) algorithm. Also, corresponding IR increases of 2.09% and 3.32% can be obtained with our methods when applied to the state-of-the-art algorithm i-vector.
更细粒度的不匹配特征检测用于情感说话人识别
说话人在不同的情绪状态下,其发声器官的形状会发生变化,导致短时特征的情绪声空间偏离中性声空间,从而导致说话人识别性能的下降。大大偏离中性声空间的特征被认为是不匹配的特征,它们对说话人识别系统有负面影响。情绪变化对不同的音素会产生不同的特征变形,因此建立更精细的模型来检测每个音素下的不匹配特征是合理的。然而,考虑到音素识别的困难,提出了三种声学类识别方法——音素类、高斯混合模型(GMM)分词器和概率GMM分词器来代替音素识别。我们提出了特征修剪和特征调节的方法来处理不匹配的特征,以提高说话人识别的性能。特征调节方法采用类间距离最大化、类内距离最小化的策略对变换矩阵进行训练,以调节不匹配的特征。在普通话情感语音语料库(MASC)上进行的实验表明,与通用背景模型(GMM-UBM)算法相比,我们的特征修剪和特征调节方法的识别率分别提高了3.64%和6.77%。当应用于最先进的i-vector算法时,我们的方法可以获得相应的2.09%和3.32%的红外增益。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
审稿时长
2.66667 months
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信