面向文本到语音系统的神经启发的无参考仪器质量测量

Rishabh Gupta, Anderson R. Avila, T. Falk
{"title":"面向文本到语音系统的神经启发的无参考仪器质量测量","authors":"Rishabh Gupta, Anderson R. Avila, T. Falk","doi":"10.1109/QoMEX.2018.8463392","DOIUrl":null,"url":null,"abstract":"Subjective evaluation of synthesized speech is not an easy task as various quality dimensions can be affected, including naturalness, prosody, pronunciation, and continuity, to name a few. Evaluations typically rely on naive listeners, thus more closely representing the consumers of commercial products. As such, while the results of these costly and time consuming tests may provide text-to-speech (TTS) system developers with feedback on the perceived quality and acceptability of their devices, it provides little information on what the source of the problems are and what can be done about it. In this paper, we propose the use of neuroimaging to probe the unconscious cognitive processing of naive listeners as they listen to synthesized speech generated by different systems of varying quality. The obtained neural insights have allowed us to extract a small subset of very relevant features from the speech signals and to use these features to build a simple, no-reference instrumental quality metric specifically tailored to TTS speech. The metric is tested on an unseen dataset and shown to significantly outperform a benchmark algorithm.","PeriodicalId":6618,"journal":{"name":"2018 Tenth International Conference on Quality of Multimedia Experience (QoMEX)","volume":"45 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Towards a Neuro-Inspired No-Reference Instrumental Quality Measure for Text-to-Speech Systems\",\"authors\":\"Rishabh Gupta, Anderson R. Avila, T. Falk\",\"doi\":\"10.1109/QoMEX.2018.8463392\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Subjective evaluation of synthesized speech is not an easy task as various quality dimensions can be affected, including naturalness, prosody, pronunciation, and continuity, to name a few. Evaluations typically rely on naive listeners, thus more closely representing the consumers of commercial products. As such, while the results of these costly and time consuming tests may provide text-to-speech (TTS) system developers with feedback on the perceived quality and acceptability of their devices, it provides little information on what the source of the problems are and what can be done about it. In this paper, we propose the use of neuroimaging to probe the unconscious cognitive processing of naive listeners as they listen to synthesized speech generated by different systems of varying quality. The obtained neural insights have allowed us to extract a small subset of very relevant features from the speech signals and to use these features to build a simple, no-reference instrumental quality metric specifically tailored to TTS speech. The metric is tested on an unseen dataset and shown to significantly outperform a benchmark algorithm.\",\"PeriodicalId\":6618,\"journal\":{\"name\":\"2018 Tenth International Conference on Quality of Multimedia Experience (QoMEX)\",\"volume\":\"45 1\",\"pages\":\"1-6\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 Tenth International Conference on Quality of Multimedia Experience (QoMEX)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/QoMEX.2018.8463392\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 Tenth International Conference on Quality of Multimedia Experience (QoMEX)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/QoMEX.2018.8463392","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

对合成语音进行主观评价并不是一件容易的事情,因为各种质量维度都会受到影响,包括自然度、韵律、发音和连续性等等。评估通常依赖于天真的听众,因此更能代表商业产品的消费者。因此,虽然这些昂贵而耗时的测试结果可能会为文本到语音(TTS)系统开发人员提供有关其设备的感知质量和可接受性的反馈,但它几乎没有提供关于问题根源和如何解决问题的信息。在本文中,我们建议使用神经成像来探测天真听众在听由不同质量的不同系统生成的合成语音时的无意识认知加工。获得的神经洞察力使我们能够从语音信号中提取出一小部分非常相关的特征,并使用这些特征构建一个简单的,无参考的仪器质量指标,专门针对TTS语音。该指标在一个未见过的数据集上进行了测试,结果显示其性能明显优于基准算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Towards a Neuro-Inspired No-Reference Instrumental Quality Measure for Text-to-Speech Systems
Subjective evaluation of synthesized speech is not an easy task as various quality dimensions can be affected, including naturalness, prosody, pronunciation, and continuity, to name a few. Evaluations typically rely on naive listeners, thus more closely representing the consumers of commercial products. As such, while the results of these costly and time consuming tests may provide text-to-speech (TTS) system developers with feedback on the perceived quality and acceptability of their devices, it provides little information on what the source of the problems are and what can be done about it. In this paper, we propose the use of neuroimaging to probe the unconscious cognitive processing of naive listeners as they listen to synthesized speech generated by different systems of varying quality. The obtained neural insights have allowed us to extract a small subset of very relevant features from the speech signals and to use these features to build a simple, no-reference instrumental quality metric specifically tailored to TTS speech. The metric is tested on an unseen dataset and shown to significantly outperform a benchmark algorithm.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信