基于截断正态分布的语音情感识别库计算

IF 1.1 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Hemin Ibrahim, C. Loo
{"title":"基于截断正态分布的语音情感识别库计算","authors":"Hemin Ibrahim, C. Loo","doi":"10.22452/mjcs.vol35no2.3","DOIUrl":null,"url":null,"abstract":"Speech is an effective, quick, and important way for communicating and exchanging complex information between humans. Emotions have always been a part of normal human conversation which makes the speech more attractive. Because of this major role of both speech and emotion, many researchers are inspired by studying Speech Emotion Recognition (SER) which still has plenty of challenges. In this study, we proposed a novel reservoir computing approach with the initialization of random connection weights for the input weight by the truncated normal distribution. Furthermore, Population-Based Training (PBT) is adopted to optimize the hyperparameters of the whole Echo State Network (ESN) model which have a significant impact on the model performance. The proposed model has adopted bidirectional reservoir input to increase the memorization capability, and Sparse Random Projection (SRP) was applied for dimensional reduction as a simple, unsupervised, and low complexity approach. The speaker-independent strategy was employed on EMODB and SAVEE datasets as an acted speech emotion dataset and Aibo as a non-acted dataset. The model achieved 84.8%, 65.95%, and 45.99% unweighted average recalls on the EMODB, SAVEE, and Aibo datasets respectively. The results show that the proposed model outperforms the recent state-of-the-art studies with a cheaper computational cost.","PeriodicalId":49894,"journal":{"name":"Malaysian Journal of Computer Science","volume":" ","pages":""},"PeriodicalIF":1.1000,"publicationDate":"2022-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"RESERVOIR COMPUTING WITH TRUNCATED NORMAL DISTRIBUTION FOR SPEECH EMOTION RECOGNITION\",\"authors\":\"Hemin Ibrahim, C. Loo\",\"doi\":\"10.22452/mjcs.vol35no2.3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speech is an effective, quick, and important way for communicating and exchanging complex information between humans. Emotions have always been a part of normal human conversation which makes the speech more attractive. Because of this major role of both speech and emotion, many researchers are inspired by studying Speech Emotion Recognition (SER) which still has plenty of challenges. In this study, we proposed a novel reservoir computing approach with the initialization of random connection weights for the input weight by the truncated normal distribution. Furthermore, Population-Based Training (PBT) is adopted to optimize the hyperparameters of the whole Echo State Network (ESN) model which have a significant impact on the model performance. The proposed model has adopted bidirectional reservoir input to increase the memorization capability, and Sparse Random Projection (SRP) was applied for dimensional reduction as a simple, unsupervised, and low complexity approach. The speaker-independent strategy was employed on EMODB and SAVEE datasets as an acted speech emotion dataset and Aibo as a non-acted dataset. The model achieved 84.8%, 65.95%, and 45.99% unweighted average recalls on the EMODB, SAVEE, and Aibo datasets respectively. The results show that the proposed model outperforms the recent state-of-the-art studies with a cheaper computational cost.\",\"PeriodicalId\":49894,\"journal\":{\"name\":\"Malaysian Journal of Computer Science\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.1000,\"publicationDate\":\"2022-04-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Malaysian Journal of Computer Science\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.22452/mjcs.vol35no2.3\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Malaysian Journal of Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.22452/mjcs.vol35no2.3","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 1

摘要

语音是人与人之间交流和交换复杂信息的一种有效、快速、重要的方式。情感一直是正常人类对话的一部分,这使演讲更具吸引力。由于语音和情感的双重作用,许多研究人员受到了语音情感识别(SER)研究的启发,但SER仍然存在许多挑战。在这项研究中,我们提出了一种新的储层计算方法,通过截断正态分布初始化输入权重的随机连接权重。此外,采用基于群体的训练(PBT)来优化整个回声状态网络(ESN)模型的超参数,这些超参数对模型性能有显著影响。该模型采用了双向储层输入来提高记忆能力,并将稀疏随机投影(SRP)作为一种简单、无监督、低复杂度的方法应用于降维。在EMODB和SAVE数据集上采用了说话人独立策略作为动作语音情感数据集,Aibo数据集作为非动作数据集。该模型在EMODB、SAVEE和Aibo数据集上分别实现了84.8%、65.95%和45.99%的未加权平均召回率。结果表明,所提出的模型以更低的计算成本优于最近最先进的研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
RESERVOIR COMPUTING WITH TRUNCATED NORMAL DISTRIBUTION FOR SPEECH EMOTION RECOGNITION
Speech is an effective, quick, and important way for communicating and exchanging complex information between humans. Emotions have always been a part of normal human conversation which makes the speech more attractive. Because of this major role of both speech and emotion, many researchers are inspired by studying Speech Emotion Recognition (SER) which still has plenty of challenges. In this study, we proposed a novel reservoir computing approach with the initialization of random connection weights for the input weight by the truncated normal distribution. Furthermore, Population-Based Training (PBT) is adopted to optimize the hyperparameters of the whole Echo State Network (ESN) model which have a significant impact on the model performance. The proposed model has adopted bidirectional reservoir input to increase the memorization capability, and Sparse Random Projection (SRP) was applied for dimensional reduction as a simple, unsupervised, and low complexity approach. The speaker-independent strategy was employed on EMODB and SAVEE datasets as an acted speech emotion dataset and Aibo as a non-acted dataset. The model achieved 84.8%, 65.95%, and 45.99% unweighted average recalls on the EMODB, SAVEE, and Aibo datasets respectively. The results show that the proposed model outperforms the recent state-of-the-art studies with a cheaper computational cost.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Malaysian Journal of Computer Science
Malaysian Journal of Computer Science COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, THEORY & METHODS
CiteScore
2.20
自引率
33.30%
发文量
35
审稿时长
7.5 months
期刊介绍: The Malaysian Journal of Computer Science (ISSN 0127-9084) is published four times a year in January, April, July and October by the Faculty of Computer Science and Information Technology, University of Malaya, since 1985. Over the years, the journal has gained popularity and the number of paper submissions has increased steadily. The rigorous reviews from the referees have helped in ensuring that the high standard of the journal is maintained. The objectives are to promote exchange of information and knowledge in research work, new inventions/developments of Computer Science and on the use of Information Technology towards the structuring of an information-rich society and to assist the academic staff from local and foreign universities, business and industrial sectors, government departments and academic institutions on publishing research results and studies in Computer Science and Information Technology through a scholarly publication.  The journal is being indexed and abstracted by Clarivate Analytics'' Web of Science and Elsevier''s Scopus
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信