Yuhao Liu, Huan Wang, Zhikai Huang, Jia-Liang Chen, Chaofan Yu, Jiasi Sun, HaoXuan Yan, Bin Hu
{"title":"Application of Personalized Emotion Speech Synthesis Technology in Human-Computer Interaction","authors":"Yuhao Liu, Huan Wang, Zhikai Huang, Jia-Liang Chen, Chaofan Yu, Jiasi Sun, HaoXuan Yan, Bin Hu","doi":"10.1109/iccet58756.2023.00033","DOIUrl":null,"url":null,"abstract":"Aiming at the problem that machine speech synthesis technology does not have emotion in human-computer interaction scenes at present, we propose a framework for personalized speech synthesis with emotion in human-computer interaction. Firstly, the emotion that the machine needs to convey is determined by the spoken text that has returned during the interaction process. Next, the Fastspeech 2 speech synthesis model is used to train the relevant individualized voice with emotion. The customized voice with emotion is then synthesized according to the emotion inferred from the text. In real-world scenarios involving emotional human-computer interactions, this technology has worked well.","PeriodicalId":170939,"journal":{"name":"2023 6th International Conference on Communication Engineering and Technology (ICCET)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 6th International Conference on Communication Engineering and Technology (ICCET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iccet58756.2023.00033","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Aiming at the problem that machine speech synthesis technology does not have emotion in human-computer interaction scenes at present, we propose a framework for personalized speech synthesis with emotion in human-computer interaction. Firstly, the emotion that the machine needs to convey is determined by the spoken text that has returned during the interaction process. Next, the Fastspeech 2 speech synthesis model is used to train the relevant individualized voice with emotion. The customized voice with emotion is then synthesized according to the emotion inferred from the text. In real-world scenarios involving emotional human-computer interactions, this technology has worked well.