Mr. M China, Pentu Saheb, P. S. Srujana, P. Lalitha, Siva Jyothi
{"title":"语音情感识别","authors":"Mr. M China, Pentu Saheb, P. S. Srujana, P. Lalitha, Siva Jyothi","doi":"10.48047/ijfans/v11/i12/203","DOIUrl":null,"url":null,"abstract":"Emotions are the best way for people to communicate their thoughts and actions to others. The most important technology in the world today is the ability to recognize emotions from a single speaker's voice. The ability to recognize emotions is very useful in gaining various insightful insights into a person's thoughts. The process of extracting emotions from human speech is called Speech Emotion Recognition (SER). We used the RAVDESS (Ryerson Audio-Visual Database of Emotional Speech and Song) dataset to extract emotions from Speech. Emotions are extracted from speech based on speech parameters such as Mel-Frequency-Cepstral -Coefficients (MFCC) and Mel Spectrogram. After training with a Multilayer Perceptron classifier (MLP), the obtained data had an accuracy of 68.33% and accuracy of 80.64% after training with Convolutional Neural Networks Long Short Term Memory (CNN LSTM).","PeriodicalId":290296,"journal":{"name":"International Journal of Food and Nutritional Sciences","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Speech Emotion Recognition\",\"authors\":\"Mr. M China, Pentu Saheb, P. S. Srujana, P. Lalitha, Siva Jyothi\",\"doi\":\"10.48047/ijfans/v11/i12/203\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Emotions are the best way for people to communicate their thoughts and actions to others. The most important technology in the world today is the ability to recognize emotions from a single speaker's voice. The ability to recognize emotions is very useful in gaining various insightful insights into a person's thoughts. The process of extracting emotions from human speech is called Speech Emotion Recognition (SER). We used the RAVDESS (Ryerson Audio-Visual Database of Emotional Speech and Song) dataset to extract emotions from Speech. Emotions are extracted from speech based on speech parameters such as Mel-Frequency-Cepstral -Coefficients (MFCC) and Mel Spectrogram. After training with a Multilayer Perceptron classifier (MLP), the obtained data had an accuracy of 68.33% and accuracy of 80.64% after training with Convolutional Neural Networks Long Short Term Memory (CNN LSTM).\",\"PeriodicalId\":290296,\"journal\":{\"name\":\"International Journal of Food and Nutritional Sciences\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Food and Nutritional Sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48047/ijfans/v11/i12/203\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Food and Nutritional Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48047/ijfans/v11/i12/203","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
情绪是人们与他人交流思想和行为的最佳方式。当今世界上最重要的技术是能够从单个说话者的声音中识别情绪。识别情绪的能力对于获得对一个人思想的各种深刻见解非常有用。从人类语言中提取情感的过程被称为语音情感识别(SER)。我们使用RAVDESS (Ryerson Audio-Visual Database of Emotional Speech and Song)数据集从语音中提取情感。基于Mel- frequency - cepstral -Coefficients (MFCC)和Mel Spectrogram等语音参数从语音中提取情感。经过多层感知器分类器(Multilayer Perceptron classifier, MLP)的训练,得到的数据准确率为68.33%,经过卷积神经网络长短期记忆(Convolutional Neural Networks Long - Short Term Memory, CNN LSTM)训练得到的数据准确率为80.64%。
Emotions are the best way for people to communicate their thoughts and actions to others. The most important technology in the world today is the ability to recognize emotions from a single speaker's voice. The ability to recognize emotions is very useful in gaining various insightful insights into a person's thoughts. The process of extracting emotions from human speech is called Speech Emotion Recognition (SER). We used the RAVDESS (Ryerson Audio-Visual Database of Emotional Speech and Song) dataset to extract emotions from Speech. Emotions are extracted from speech based on speech parameters such as Mel-Frequency-Cepstral -Coefficients (MFCC) and Mel Spectrogram. After training with a Multilayer Perceptron classifier (MLP), the obtained data had an accuracy of 68.33% and accuracy of 80.64% after training with Convolutional Neural Networks Long Short Term Memory (CNN LSTM).