情境语音合成:会话式TTS评价中的语境因素研究

12th ISCA Speech Synthesis Workshop (SSW2023) Pub Date : 2023-08-26 DOI:10.21437/ssw.2023-11

Harm Lameris, Ambika Kirkland, Joakim Gustafson, Éva Székely

{"title":"情境语音合成:会话式TTS评价中的语境因素研究","authors":"Harm Lameris, Ambika Kirkland, Joakim Gustafson, Éva Székely","doi":"10.21437/ssw.2023-11","DOIUrl":null,"url":null,"abstract":"Speech synthesis evaluation methods have lagged behind the development of TTS systems, with single sentence read-speech MOS naturalness evaluation on crowdsourcing platforms being the industry standard. For TTS to successfully be applied in social contexts, evaluation methods need to be socially embedded in the situation where they will be deployed. Due to the time and cost constraints of conducting an in-person interaction evaluation for TTS, we examine the effect of introducing situational context and preceding sentence context to participants in a subjective listening experiment. We conduct a suitability evaluation for a robot game guide that explains game rules to participants using two synthesized spontaneous voices: an instruction-specific and a general spontaneous voice. Results indicate that the inclusion of context influences user ratings, highlighting the need for context-aware evaluations. However, the type of context did not significantly affect the results.","PeriodicalId":346639,"journal":{"name":"12th ISCA Speech Synthesis Workshop (SSW2023)","volume":"12 1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Situating Speech Synthesis: Investigating Contextual Factors in the Evaluation of Conversational TTS\",\"authors\":\"Harm Lameris, Ambika Kirkland, Joakim Gustafson, Éva Székely\",\"doi\":\"10.21437/ssw.2023-11\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speech synthesis evaluation methods have lagged behind the development of TTS systems, with single sentence read-speech MOS naturalness evaluation on crowdsourcing platforms being the industry standard. For TTS to successfully be applied in social contexts, evaluation methods need to be socially embedded in the situation where they will be deployed. Due to the time and cost constraints of conducting an in-person interaction evaluation for TTS, we examine the effect of introducing situational context and preceding sentence context to participants in a subjective listening experiment. We conduct a suitability evaluation for a robot game guide that explains game rules to participants using two synthesized spontaneous voices: an instruction-specific and a general spontaneous voice. Results indicate that the inclusion of context influences user ratings, highlighting the need for context-aware evaluations. However, the type of context did not significantly affect the results.\",\"PeriodicalId\":346639,\"journal\":{\"name\":\"12th ISCA Speech Synthesis Workshop (SSW2023)\",\"volume\":\"12 1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-08-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"12th ISCA Speech Synthesis Workshop (SSW2023)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21437/ssw.2023-11\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"12th ISCA Speech Synthesis Workshop (SSW2023)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/ssw.2023-11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

语音合成评价方法落后于TTS系统的发展，在众包平台上的单句读-语音MOS自然度评价是行业标准。要使TTS成功地应用于社会环境，评估方法需要融入到它们将被部署的环境中。由于对TTS进行面对面互动评价的时间和成本限制，我们研究了在主观听力实验中引入情景语境和前句语境对参与者的影响。我们对机器人游戏指南进行了适用性评估，该指南使用两种合成的自发声音向参与者解释游戏规则:特定指令和一般自发声音。结果表明，包含上下文影响用户评级，突出了上下文感知评估的必要性。然而，语境的类型并没有显著影响结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Situating Speech Synthesis: Investigating Contextual Factors in the Evaluation of Conversational TTS

Speech synthesis evaluation methods have lagged behind the development of TTS systems, with single sentence read-speech MOS naturalness evaluation on crowdsourcing platforms being the industry standard. For TTS to successfully be applied in social contexts, evaluation methods need to be socially embedded in the situation where they will be deployed. Due to the time and cost constraints of conducting an in-person interaction evaluation for TTS, we examine the effect of introducing situational context and preceding sentence context to participants in a subjective listening experiment. We conduct a suitability evaluation for a robot game guide that explains game rules to participants using two synthesized spontaneous voices: an instruction-specific and a general spontaneous voice. Results indicate that the inclusion of context influences user ratings, highlighting the need for context-aware evaluations. However, the type of context did not significantly affect the results.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

12th ISCA Speech Synthesis Workshop (SSW2023)

自引率

0.00%

发文量