{"title":"应用PAD三维情感模型转换情感言语的韵律","authors":"Xiaoyong Lu, Hongwu Yang, Aibao Zhou","doi":"10.1109/ICOT.2014.6956606","DOIUrl":null,"url":null,"abstract":"Happiness has attracted much attention of the researchers in various fields. This paper realizes prosodic conversion of emotional speech for happiness computing on speech communication. An emotional speech corpus includes 11 kinds of typical emotional utterances is designed, where each utterance is labeled the emotional information with PAD value in a psychological sense. A five-scale tone model is employed to model the pitch contour of emotional utterances on the syllable level. A generalized regression neural network (GRNN) based prosody conversion model is built to realize the transformation of pitch contour, duration and pause duration of emotional utterance, in which the PAD values of emotion and context parameter are adopted to predict the prosodic features. Emotional utterance is then re-synthesized with the STRAIGHT algorithm by modifying pitch contour, duration and pause duration. Experimental results on Emotional Mean Opining Score (EMOS) demonstrate that the prosody conversion effect of proposed method can express corresponding feelings.","PeriodicalId":343641,"journal":{"name":"2014 International Conference on Orange Technologies","volume":"313 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Applying PAD three dimensional emotion model to convert prosody of emotional speech\",\"authors\":\"Xiaoyong Lu, Hongwu Yang, Aibao Zhou\",\"doi\":\"10.1109/ICOT.2014.6956606\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Happiness has attracted much attention of the researchers in various fields. This paper realizes prosodic conversion of emotional speech for happiness computing on speech communication. An emotional speech corpus includes 11 kinds of typical emotional utterances is designed, where each utterance is labeled the emotional information with PAD value in a psychological sense. A five-scale tone model is employed to model the pitch contour of emotional utterances on the syllable level. A generalized regression neural network (GRNN) based prosody conversion model is built to realize the transformation of pitch contour, duration and pause duration of emotional utterance, in which the PAD values of emotion and context parameter are adopted to predict the prosodic features. Emotional utterance is then re-synthesized with the STRAIGHT algorithm by modifying pitch contour, duration and pause duration. Experimental results on Emotional Mean Opining Score (EMOS) demonstrate that the prosody conversion effect of proposed method can express corresponding feelings.\",\"PeriodicalId\":343641,\"journal\":{\"name\":\"2014 International Conference on Orange Technologies\",\"volume\":\"313 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-11-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 International Conference on Orange Technologies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICOT.2014.6956606\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Orange Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOT.2014.6956606","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Applying PAD three dimensional emotion model to convert prosody of emotional speech
Happiness has attracted much attention of the researchers in various fields. This paper realizes prosodic conversion of emotional speech for happiness computing on speech communication. An emotional speech corpus includes 11 kinds of typical emotional utterances is designed, where each utterance is labeled the emotional information with PAD value in a psychological sense. A five-scale tone model is employed to model the pitch contour of emotional utterances on the syllable level. A generalized regression neural network (GRNN) based prosody conversion model is built to realize the transformation of pitch contour, duration and pause duration of emotional utterance, in which the PAD values of emotion and context parameter are adopted to predict the prosodic features. Emotional utterance is then re-synthesized with the STRAIGHT algorithm by modifying pitch contour, duration and pause duration. Experimental results on Emotional Mean Opining Score (EMOS) demonstrate that the prosody conversion effect of proposed method can express corresponding feelings.