情感说话人识别的频谱特征

P. Sandhya, V. Spoorthy, S. Koolagudi, N. Sobhana
{"title":"情感说话人识别的频谱特征","authors":"P. Sandhya, V. Spoorthy, S. Koolagudi, N. Sobhana","doi":"10.1109/ICAECC50550.2020.9339502","DOIUrl":null,"url":null,"abstract":"Speaker recognition in an emotive environment is a bit challenging task because of influence of emotions in a speech. Identifying the speaker from the speech can be done by analyzing the features of the speech signal. In normal conditions, identifying a speaker is not a tedious task. Whereas, identifying the speaker in an emotional environment such as happy, sad, anger, surprise, sarcastic, fear etc. is really challenging, since speech becomes altered under emotions and noise. The spectral features of speech signal include Mel Frequency Cepstral Co-efficients(MFCC), Shifted Delta Cepstral Coefficients (SDCC), spectral centroid, spectral roll off, spectral flatness, spectral contrast, spectral bandwidth, chroma-stft, zero crossing rate, root mean square energy, Linear Prediction Cepstral Coefficients (LPCC), spectral subband centroid, Teager energy based MFCC, line spectral frequencies, single frequency cepstral coefficients, formant frequencies, Power Normalized Cepstral Coefficients (PNCC), etc. The features that are extracted from the speech signal are classified using classifiers. Support Vector Machine(SVM), Gaussian Mixture Model, Gaussian Naive Bayes, K-Nearest Neighbour, Random Forest and a simple Neural Network using Keras is used for classification. The important application include security systems in which a person can be identified by biometrics that is voice of the person. The work aims to identify the speaker in an emotional environment using spectral features and classify using any of the classification techniques and to achieve a high speaker recognition rate. Feature combinations can also be used to improve accuracy. The proposed model performed better than most of the state-of-the-art methods.","PeriodicalId":196343,"journal":{"name":"2020 Third International Conference on Advances in Electronics, Computers and Communications (ICAECC)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Spectral Features for Emotional Speaker Recognition\",\"authors\":\"P. Sandhya, V. Spoorthy, S. Koolagudi, N. Sobhana\",\"doi\":\"10.1109/ICAECC50550.2020.9339502\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speaker recognition in an emotive environment is a bit challenging task because of influence of emotions in a speech. Identifying the speaker from the speech can be done by analyzing the features of the speech signal. In normal conditions, identifying a speaker is not a tedious task. Whereas, identifying the speaker in an emotional environment such as happy, sad, anger, surprise, sarcastic, fear etc. is really challenging, since speech becomes altered under emotions and noise. The spectral features of speech signal include Mel Frequency Cepstral Co-efficients(MFCC), Shifted Delta Cepstral Coefficients (SDCC), spectral centroid, spectral roll off, spectral flatness, spectral contrast, spectral bandwidth, chroma-stft, zero crossing rate, root mean square energy, Linear Prediction Cepstral Coefficients (LPCC), spectral subband centroid, Teager energy based MFCC, line spectral frequencies, single frequency cepstral coefficients, formant frequencies, Power Normalized Cepstral Coefficients (PNCC), etc. The features that are extracted from the speech signal are classified using classifiers. Support Vector Machine(SVM), Gaussian Mixture Model, Gaussian Naive Bayes, K-Nearest Neighbour, Random Forest and a simple Neural Network using Keras is used for classification. The important application include security systems in which a person can be identified by biometrics that is voice of the person. The work aims to identify the speaker in an emotional environment using spectral features and classify using any of the classification techniques and to achieve a high speaker recognition rate. Feature combinations can also be used to improve accuracy. The proposed model performed better than most of the state-of-the-art methods.\",\"PeriodicalId\":196343,\"journal\":{\"name\":\"2020 Third International Conference on Advances in Electronics, Computers and Communications (ICAECC)\",\"volume\":\"43 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 Third International Conference on Advances in Electronics, Computers and Communications (ICAECC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAECC50550.2020.9339502\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 Third International Conference on Advances in Electronics, Computers and Communications (ICAECC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAECC50550.2020.9339502","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

摘要

由于讲话中情绪的影响,在情绪环境中识别说话人是一项有点挑战性的任务。通过分析语音信号的特征,可以从语音中识别出说话人。在正常情况下,识别说话人并不是一项乏味的任务。然而,在情绪环境中识别说话者,如快乐、悲伤、愤怒、惊讶、讽刺、恐惧等,确实是具有挑战性的,因为言语在情绪和噪音下会发生变化。语音信号的频谱特征包括频谱倒谱系数(MFCC)、移位的δ倒谱系数(SDCC)、频谱质心、频谱滚转、频谱平坦度、频谱对比度、频谱带宽、色度-stft、过零率、均方根能量、线性预测倒谱系数(LPCC)、频谱子带质心、基于Teager能量的MFCC、线谱频率、单频倒谱系数、形成峰频率、功率归一化倒谱系数(PNCC)等。从语音信号中提取的特征使用分类器进行分类。使用支持向量机(SVM)、高斯混合模型、高斯朴素贝叶斯、k近邻、随机森林和使用Keras的简单神经网络进行分类。重要的应用包括安全系统,其中一个人可以通过生物识别技术来识别,这是一个人的声音。本研究旨在利用频谱特征识别情绪环境中的说话人,并利用任意一种分类技术对说话人进行分类,以达到较高的说话人识别率。特征组合也可以用来提高准确性。所提出的模型比大多数最先进的方法表现得更好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Spectral Features for Emotional Speaker Recognition
Speaker recognition in an emotive environment is a bit challenging task because of influence of emotions in a speech. Identifying the speaker from the speech can be done by analyzing the features of the speech signal. In normal conditions, identifying a speaker is not a tedious task. Whereas, identifying the speaker in an emotional environment such as happy, sad, anger, surprise, sarcastic, fear etc. is really challenging, since speech becomes altered under emotions and noise. The spectral features of speech signal include Mel Frequency Cepstral Co-efficients(MFCC), Shifted Delta Cepstral Coefficients (SDCC), spectral centroid, spectral roll off, spectral flatness, spectral contrast, spectral bandwidth, chroma-stft, zero crossing rate, root mean square energy, Linear Prediction Cepstral Coefficients (LPCC), spectral subband centroid, Teager energy based MFCC, line spectral frequencies, single frequency cepstral coefficients, formant frequencies, Power Normalized Cepstral Coefficients (PNCC), etc. The features that are extracted from the speech signal are classified using classifiers. Support Vector Machine(SVM), Gaussian Mixture Model, Gaussian Naive Bayes, K-Nearest Neighbour, Random Forest and a simple Neural Network using Keras is used for classification. The important application include security systems in which a person can be identified by biometrics that is voice of the person. The work aims to identify the speaker in an emotional environment using spectral features and classify using any of the classification techniques and to achieve a high speaker recognition rate. Feature combinations can also be used to improve accuracy. The proposed model performed better than most of the state-of-the-art methods.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信