{"title":"Significance of Word and Syllable Level Information for Expressive Speech Processing","authors":"K. S. Rao, S. Prasanna, T. V. Sagar","doi":"10.1109/ICAPR.2009.47","DOIUrl":null,"url":null,"abstract":"In general, human beings make use of expressions (emotions) through speech, facial movements and gestures for conveying the crucial information. Mostly, expressions in speech can be attributed to longer segments, i.e., suprasegmental features also known to be prosodic features. In this paper we analyze the expressions in speech using prosodic features from utterance level, word level and syllable level. The emotions considered for the analysis are anger,compassion, happy and neutral. The prosodic features used in the analysis are duration, intonation (pitch) and energy. The analysis is performed on SUSE (Speech Under Simulated Emotion) database. The results of the analysis are used for synthesizing the expressions in neutral speech. The synthesis experiments using the features from utterance level to syllable level showed that a steady improvement in the quality of speech for the desired expressions.","PeriodicalId":443926,"journal":{"name":"2009 Seventh International Conference on Advances in Pattern Recognition","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 Seventh International Conference on Advances in Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAPR.2009.47","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In general, human beings make use of expressions (emotions) through speech, facial movements and gestures for conveying the crucial information. Mostly, expressions in speech can be attributed to longer segments, i.e., suprasegmental features also known to be prosodic features. In this paper we analyze the expressions in speech using prosodic features from utterance level, word level and syllable level. The emotions considered for the analysis are anger,compassion, happy and neutral. The prosodic features used in the analysis are duration, intonation (pitch) and energy. The analysis is performed on SUSE (Speech Under Simulated Emotion) database. The results of the analysis are used for synthesizing the expressions in neutral speech. The synthesis experiments using the features from utterance level to syllable level showed that a steady improvement in the quality of speech for the desired expressions.
一般来说,人类通过语言、面部动作和手势来利用表情(情感)来传达关键信息。大多数情况下,言语中的表达可以归因于较长的片段,即超片段特征,也称为韵律特征。本文从话语层次、词层次和音节层次三个方面分析了语音中的韵律特征。用于分析的情绪包括愤怒、同情、快乐和中性。分析中使用的韵律特征是音长、语调(音高)和能量。在SUSE (Speech Under simulation Emotion)数据库上进行分析。分析结果可用于合成中性言语中的表达。从话语层面到音节层面的特征综合实验表明,期望表达的语音质量稳步提高。