Siyuan Zheng, Jun Du, Hengshun Zhou, Xue Bai, Chin-Hui Lee, Shipeng Li
{"title":"Speech Emotion Recognition Based on Acoustic Segment Model","authors":"Siyuan Zheng, Jun Du, Hengshun Zhou, Xue Bai, Chin-Hui Lee, Shipeng Li","doi":"10.1109/ISCSLP49672.2021.9362119","DOIUrl":null,"url":null,"abstract":"Accurate detection of emotion from speech is a challenging task due to the variability in speech and emotion. In this paper, we propose a speech emotion recognition (SER) method based on acoustic segment model (ASM) to deal with this issue. Specifically, speech with different emotions is segmented more finely by ASM. Each of these acoustic segments is modeled by Hidden Markov Models (HMMs) and decoded into a series of ASM sequences in an unsupervised way. Then feature vectors are obtained from these sequences above by latent semantic analysis (LSA). Finally, these feature vectors are fed to a classifier. Validated on the IEMOCAP corpus, results demonstrate the proposed method outperforms the state-of-the-art methods with a weighted accuracy of 73.9% and an unweighted accuracy of 70.8% respectively.","PeriodicalId":279828,"journal":{"name":"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCSLP49672.2021.9362119","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Accurate detection of emotion from speech is a challenging task due to the variability in speech and emotion. In this paper, we propose a speech emotion recognition (SER) method based on acoustic segment model (ASM) to deal with this issue. Specifically, speech with different emotions is segmented more finely by ASM. Each of these acoustic segments is modeled by Hidden Markov Models (HMMs) and decoded into a series of ASM sequences in an unsupervised way. Then feature vectors are obtained from these sequences above by latent semantic analysis (LSA). Finally, these feature vectors are fed to a classifier. Validated on the IEMOCAP corpus, results demonstrate the proposed method outperforms the state-of-the-art methods with a weighted accuracy of 73.9% and an unweighted accuracy of 70.8% respectively.