一种控制合成语音中情感表达的方法——一种深度学习方法

2019 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW) Pub Date : 2019-07-05 DOI:10.1109/ACIIW.2019.8925241

Noé Tits

{"title":"一种控制合成语音中情感表达的方法——一种深度学习方法","authors":"Noé Tits","doi":"10.1109/ACIIW.2019.8925241","DOIUrl":null,"url":null,"abstract":"In this project, we aim to build a Text-to-Speech system able to produce speech with a controllable emotional expressiveness. We propose a methodology for solving this problem in three main steps. The first is the collection of emotional speech data. We discuss the various formats of existing datasets and their usability in speech generation. The second step is the development of a system to automatically annotate data with emotion/expressiveness features. We compare several techniques using transfer learning to extract such a representation through other tasks and propose a method to visualize and interpret the correlation between vocal and emotional features. The third step is the development of a deep learning-based system taking text and emotion/expressiveness as input and producing speech as output. We study the impact of fine tuning from a neutral TTS towards an emotional TTS in terms of intelligibility and perception of the emotion.","PeriodicalId":193568,"journal":{"name":"2019 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"A Methodology for Controlling the Emotional Expressiveness in Synthetic Speech - a Deep Learning approach\",\"authors\":\"Noé Tits\",\"doi\":\"10.1109/ACIIW.2019.8925241\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this project, we aim to build a Text-to-Speech system able to produce speech with a controllable emotional expressiveness. We propose a methodology for solving this problem in three main steps. The first is the collection of emotional speech data. We discuss the various formats of existing datasets and their usability in speech generation. The second step is the development of a system to automatically annotate data with emotion/expressiveness features. We compare several techniques using transfer learning to extract such a representation through other tasks and propose a method to visualize and interpret the correlation between vocal and emotional features. The third step is the development of a deep learning-based system taking text and emotion/expressiveness as input and producing speech as output. We study the impact of fine tuning from a neutral TTS towards an emotional TTS in terms of intelligibility and perception of the emotion.\",\"PeriodicalId\":193568,\"journal\":{\"name\":\"2019 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)\",\"volume\":\"40 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ACIIW.2019.8925241\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACIIW.2019.8925241","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

摘要

在这个项目中，我们的目标是建立一个能够产生具有可控情感表达的语音的文本到语音系统。我们提出了一种解决这一问题的方法，分为三个主要步骤。首先是情绪言语数据的收集。我们讨论了现有数据集的各种格式及其在语音生成中的可用性。第二步是开发一个带有情感/表达特征的自动标注数据的系统。我们比较了几种使用迁移学习的技术，通过其他任务提取这种表示，并提出了一种可视化和解释声音和情感特征之间相关性的方法。第三步是开发基于深度学习的系统，将文本和情感/表达作为输入，并产生语音作为输出。我们研究了从中性TTS到情绪TTS的微调在可理解性和情绪感知方面的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Methodology for Controlling the Emotional Expressiveness in Synthetic Speech - a Deep Learning approach

In this project, we aim to build a Text-to-Speech system able to produce speech with a controllable emotional expressiveness. We propose a methodology for solving this problem in three main steps. The first is the collection of emotional speech data. We discuss the various formats of existing datasets and their usability in speech generation. The second step is the development of a system to automatically annotate data with emotion/expressiveness features. We compare several techniques using transfer learning to extract such a representation through other tasks and propose a method to visualize and interpret the correlation between vocal and emotional features. The third step is the development of a deep learning-based system taking text and emotion/expressiveness as input and producing speech as output. We study the impact of fine tuning from a neutral TTS towards an emotional TTS in terms of intelligibility and perception of the emotion.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)

自引率

0.00%

发文量