用量化方法改进肌电-语音转换中的基频生成

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2019-12-01 DOI:10.1109/ASRU46091.2019.9003804

Lorenz Diener, Tejas Umesh, Tanja Schultz

{"title":"用量化方法改进肌电-语音转换中的基频生成","authors":"Lorenz Diener, Tejas Umesh, Tanja Schultz","doi":"10.1109/ASRU46091.2019.9003804","DOIUrl":null,"url":null,"abstract":"We present a novel approach to generating fundamental frequency (intonation and voicing) trajectories in an EMG-to-Speech conversion Silent Speech Interface, based on quantizing the EMG-to-F0 mappings target values and thus turning a regression problem into a recognition problem. We present this method and evaluate its performance with regard to the accuracy of the voicing information obtained as well as the performance in generating plausible intonation trajectories within voiced sections of the signal. To this end, we also present a new measure for overall F0 trajectory plausibility, the trajectory-label accuracy (TLAcc), and compare it with human evaluations. Our new F0 generation method achieves a significantly better performance than a baseline approach in terms of voicing accuracy, correlation of voiced sections, trajectory-label accuracy and, most importantly, human evaluations.","PeriodicalId":150913,"journal":{"name":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Improving Fundamental Frequency Generation in EMG-to-Speech Conversion Using a Quantization Approach\",\"authors\":\"Lorenz Diener, Tejas Umesh, Tanja Schultz\",\"doi\":\"10.1109/ASRU46091.2019.9003804\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present a novel approach to generating fundamental frequency (intonation and voicing) trajectories in an EMG-to-Speech conversion Silent Speech Interface, based on quantizing the EMG-to-F0 mappings target values and thus turning a regression problem into a recognition problem. We present this method and evaluate its performance with regard to the accuracy of the voicing information obtained as well as the performance in generating plausible intonation trajectories within voiced sections of the signal. To this end, we also present a new measure for overall F0 trajectory plausibility, the trajectory-label accuracy (TLAcc), and compare it with human evaluations. Our new F0 generation method achieves a significantly better performance than a baseline approach in terms of voicing accuracy, correlation of voiced sections, trajectory-label accuracy and, most importantly, human evaluations.\",\"PeriodicalId\":150913,\"journal\":{\"name\":\"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASRU46091.2019.9003804\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU46091.2019.9003804","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

摘要

我们提出了一种在无声语音转换界面中生成基频(语调和发声)轨迹的新方法，该方法基于量化肌电到f0映射的目标值，从而将回归问题转化为识别问题。我们提出了这种方法，并评估了它在获得的语音信息的准确性方面的性能，以及在信号的浊音部分产生可信的语调轨迹的性能。为此，我们还提出了一种新的衡量整体F0轨迹合理性的方法，轨迹标签精度(TLAcc)，并将其与人类评估进行比较。我们的新F0生成方法在语音准确性、浊音部分的相关性、轨迹标签准确性以及最重要的人类评估方面取得了比基线方法更好的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Improving Fundamental Frequency Generation in EMG-to-Speech Conversion Using a Quantization Approach

We present a novel approach to generating fundamental frequency (intonation and voicing) trajectories in an EMG-to-Speech conversion Silent Speech Interface, based on quantizing the EMG-to-F0 mappings target values and thus turning a regression problem into a recognition problem. We present this method and evaluate its performance with regard to the accuracy of the voicing information obtained as well as the performance in generating plausible intonation trajectories within voiced sections of the signal. To this end, we also present a new measure for overall F0 trajectory plausibility, the trajectory-label accuracy (TLAcc), and compare it with human evaluations. Our new F0 generation method achieves a significantly better performance than a baseline approach in terms of voicing accuracy, correlation of voiced sections, trajectory-label accuracy and, most importantly, human evaluations.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

自引率

0.00%

发文量