基于自注意的低资源语音识别声学模型训练

IF 0.2 Q4 ACOUSTICS

Journal of the Acoustical Society of Korea Pub Date : 2020-09-01 DOI:10.7776/ASK.2020.39.5.483

Hosung Kim

{"title":"基于自注意的低资源语音识别声学模型训练","authors":"Hosung Kim","doi":"10.7776/ASK.2020.39.5.483","DOIUrl":null,"url":null,"abstract":"This paper proposes acoustic model training using self-attention for low-resource speech recognition. In low-resource speech recognition, it is difficult for acoustic model to distinguish certain phones. For example, plosive /d/ and /t/, plosive /g/ and /k/ and affricate /z/ and /ch/. In acoustic model training, the self-attention generates attention weights from the deep neural network model. In this study, these weights handle the similar pronunciation error for low-resource speech recognition. When the proposed method was applied to Time Delay Neural Network-Output gate Projected Gated Recurrent Unit (TNDD-OPGRU)-based acoustic model, the proposed model showed a 5.98 % word error rate. It shows absolute improvement of 0.74 % compared with TDNN-OPGRU model.","PeriodicalId":42689,"journal":{"name":"Journal of the Acoustical Society of Korea","volume":"39 1","pages":"483-489"},"PeriodicalIF":0.2000,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Acoustic model training using self-attention for low-resource speech recognition\",\"authors\":\"Hosung Kim\",\"doi\":\"10.7776/ASK.2020.39.5.483\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposes acoustic model training using self-attention for low-resource speech recognition. In low-resource speech recognition, it is difficult for acoustic model to distinguish certain phones. For example, plosive /d/ and /t/, plosive /g/ and /k/ and affricate /z/ and /ch/. In acoustic model training, the self-attention generates attention weights from the deep neural network model. In this study, these weights handle the similar pronunciation error for low-resource speech recognition. When the proposed method was applied to Time Delay Neural Network-Output gate Projected Gated Recurrent Unit (TNDD-OPGRU)-based acoustic model, the proposed model showed a 5.98 % word error rate. It shows absolute improvement of 0.74 % compared with TDNN-OPGRU model.\",\"PeriodicalId\":42689,\"journal\":{\"name\":\"Journal of the Acoustical Society of Korea\",\"volume\":\"39 1\",\"pages\":\"483-489\"},\"PeriodicalIF\":0.2000,\"publicationDate\":\"2020-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the Acoustical Society of Korea\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.7776/ASK.2020.39.5.483\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"ACOUSTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Acoustical Society of Korea","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.7776/ASK.2020.39.5.483","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ACOUSTICS","Score":null,"Total":0}

引用次数: 0

摘要

本文提出了一种基于自注意的声学模型训练方法，用于低资源语音识别。在低资源语音识别中，声学模型难以区分特定的电话。例如，爆破音/d/和/t/，爆破音/g/和/k/，不灭音/z/和/ch/。在声学模型训练中，自注意从深度神经网络模型中生成注意权值。在本研究中，这些权重处理了低资源语音识别的类似发音错误。将该方法应用于基于时延神经网络输出门投影门控循环单元(TNDD-OPGRU)的声学模型，该模型的单词错误率为5.98%。与TDNN-OPGRU模型相比，该模型的绝对改进率为0.74%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Acoustic model training using self-attention for low-resource speech recognition

This paper proposes acoustic model training using self-attention for low-resource speech recognition. In low-resource speech recognition, it is difficult for acoustic model to distinguish certain phones. For example, plosive /d/ and /t/, plosive /g/ and /k/ and affricate /z/ and /ch/. In acoustic model training, the self-attention generates attention weights from the deep neural network model. In this study, these weights handle the similar pronunciation error for low-resource speech recognition. When the proposed method was applied to Time Delay Neural Network-Output gate Projected Gated Recurrent Unit (TNDD-OPGRU)-based acoustic model, the proposed model showed a 5.98 % word error rate. It shows absolute improvement of 0.74 % compared with TDNN-OPGRU model.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of the Acoustical Society of Korea ACOUSTICS-

CiteScore

0.60

自引率

50.00%

发文量