J. Kessens, Mark Antonius Neerincx, R. Looije, M. Kroes, G. Bloothooft
{"title":"语音中综合情感表达的感知:范畴和维度注释","authors":"J. Kessens, Mark Antonius Neerincx, R. Looije, M. Kroes, G. Bloothooft","doi":"10.1109/ACII.2009.5349594","DOIUrl":null,"url":null,"abstract":"In this paper, both categorical and dimensional annotations have been made of neutral and emotional speech synthesis (anger, fear, sad, happy and relaxed). With various prosodic emotion manipulation techniques we found emotion classification rates of 40%, which is significantly above chance level (17%). The classification rates are higher for sentences that have a semantics matching the synthetic emotion. By manipulating the pitch and duration, differences in arousal were perceived whereas differences in valence were hardly perceived. Of the investigated emotion manipulation methods, EmoFilt and EmoSpeak performed very similar, except for the emotion fear. Copy synthesis did not perform well, probably caused by suboptimal alignments and the use of multiple speakers.","PeriodicalId":330737,"journal":{"name":"2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Perception of synthetic emotion expressions in speech: Categorical and dimensional annotations\",\"authors\":\"J. Kessens, Mark Antonius Neerincx, R. Looije, M. Kroes, G. Bloothooft\",\"doi\":\"10.1109/ACII.2009.5349594\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, both categorical and dimensional annotations have been made of neutral and emotional speech synthesis (anger, fear, sad, happy and relaxed). With various prosodic emotion manipulation techniques we found emotion classification rates of 40%, which is significantly above chance level (17%). The classification rates are higher for sentences that have a semantics matching the synthetic emotion. By manipulating the pitch and duration, differences in arousal were perceived whereas differences in valence were hardly perceived. Of the investigated emotion manipulation methods, EmoFilt and EmoSpeak performed very similar, except for the emotion fear. Copy synthesis did not perform well, probably caused by suboptimal alignments and the use of multiple speakers.\",\"PeriodicalId\":330737,\"journal\":{\"name\":\"2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-12-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ACII.2009.5349594\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACII.2009.5349594","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Perception of synthetic emotion expressions in speech: Categorical and dimensional annotations
In this paper, both categorical and dimensional annotations have been made of neutral and emotional speech synthesis (anger, fear, sad, happy and relaxed). With various prosodic emotion manipulation techniques we found emotion classification rates of 40%, which is significantly above chance level (17%). The classification rates are higher for sentences that have a semantics matching the synthetic emotion. By manipulating the pitch and duration, differences in arousal were perceived whereas differences in valence were hardly perceived. Of the investigated emotion manipulation methods, EmoFilt and EmoSpeak performed very similar, except for the emotion fear. Copy synthesis did not perform well, probably caused by suboptimal alignments and the use of multiple speakers.