基于元音拉伸的数据增强提高儿童语音识别

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2019-12-01 DOI:10.1109/ASRU46091.2019.9003741

Tohru Nagano, Takashi Fukuda, Masayuki Suzuki, Gakuto Kurata

{"title":"基于元音拉伸的数据增强提高儿童语音识别","authors":"Tohru Nagano, Takashi Fukuda, Masayuki Suzuki, Gakuto Kurata","doi":"10.1109/ASRU46091.2019.9003741","DOIUrl":null,"url":null,"abstract":"Prolongation is a speech disfluency that lengthens some portions of speech utterances. It is frequently observed in children's spontaneous speech, while it is rare in read speech. To make acoustic models more robust to children's spontaneous speech, collecting a large amount of children's speech data containing prolongation is usually required, which is very impractical in many cases. To tackle this problem, we propose a novel data augmentation method that virtually generates additional data by simulating prolongation. The method inserts pseudo frames into specific positions of speech utterances to simulate prolongation. The acoustic features of the inserted frames are calculated from the original frames on both sides. This is based on our analysis that many of vowels are actually stretched in children's spontaneous speech. Our proposed procedure can generate partially stretched utterances with low computational costs, unlike a conventional speed or tempo perturbation method that extends and shrinks entire utterances at a uniform rate. The effectiveness of the proposed method were confirmed with the experiments of acoustic model adaptations, in which our proposed method focusing on vowel stretch showed consistent improvement compared with conventional speed and tempo perturbation approach.","PeriodicalId":150913,"journal":{"name":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Data Augmentation Based on Vowel Stretch for Improving Children's Speech Recognition\",\"authors\":\"Tohru Nagano, Takashi Fukuda, Masayuki Suzuki, Gakuto Kurata\",\"doi\":\"10.1109/ASRU46091.2019.9003741\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Prolongation is a speech disfluency that lengthens some portions of speech utterances. It is frequently observed in children's spontaneous speech, while it is rare in read speech. To make acoustic models more robust to children's spontaneous speech, collecting a large amount of children's speech data containing prolongation is usually required, which is very impractical in many cases. To tackle this problem, we propose a novel data augmentation method that virtually generates additional data by simulating prolongation. The method inserts pseudo frames into specific positions of speech utterances to simulate prolongation. The acoustic features of the inserted frames are calculated from the original frames on both sides. This is based on our analysis that many of vowels are actually stretched in children's spontaneous speech. Our proposed procedure can generate partially stretched utterances with low computational costs, unlike a conventional speed or tempo perturbation method that extends and shrinks entire utterances at a uniform rate. The effectiveness of the proposed method were confirmed with the experiments of acoustic model adaptations, in which our proposed method focusing on vowel stretch showed consistent improvement compared with conventional speed and tempo perturbation approach.\",\"PeriodicalId\":150913,\"journal\":{\"name\":\"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"volume\":\"35 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASRU46091.2019.9003741\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU46091.2019.9003741","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

摘要

延长是一种言语不流畅，延长了言语的某些部分。这种现象在儿童的自发言语中很常见，而在阅读言语中却很少见。为了使声学模型对儿童自发语音具有更强的鲁棒性，通常需要收集大量包含延长的儿童语音数据，这在很多情况下是非常不切实际的。为了解决这个问题，我们提出了一种新的数据增强方法，通过模拟延长来虚拟地产生额外的数据。该方法将伪帧插入语音的特定位置以模拟延长。插入框架的声学特征是由两侧的原始框架计算的。这是基于我们的分析，即在儿童的自发语言中，许多元音实际上是被拉长的。与传统的速度或节奏扰动方法不同，我们提出的方法可以以较低的计算成本生成部分拉伸的话语，而传统的速度或节奏扰动方法以统一的速度扩展和收缩整个话语。声学模型适应实验证实了该方法的有效性，与传统的速度和节奏扰动方法相比，我们提出的专注于元音拉伸的方法表现出一致的改善。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Data Augmentation Based on Vowel Stretch for Improving Children's Speech Recognition

Prolongation is a speech disfluency that lengthens some portions of speech utterances. It is frequently observed in children's spontaneous speech, while it is rare in read speech. To make acoustic models more robust to children's spontaneous speech, collecting a large amount of children's speech data containing prolongation is usually required, which is very impractical in many cases. To tackle this problem, we propose a novel data augmentation method that virtually generates additional data by simulating prolongation. The method inserts pseudo frames into specific positions of speech utterances to simulate prolongation. The acoustic features of the inserted frames are calculated from the original frames on both sides. This is based on our analysis that many of vowels are actually stretched in children's spontaneous speech. Our proposed procedure can generate partially stretched utterances with low computational costs, unlike a conventional speed or tempo perturbation method that extends and shrinks entire utterances at a uniform rate. The effectiveness of the proposed method were confirmed with the experiments of acoustic model adaptations, in which our proposed method focusing on vowel stretch showed consistent improvement compared with conventional speed and tempo perturbation approach.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

自引率

0.00%

发文量