Accent neutralization for speech recognition of non-native speakers

Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services Pub Date : 2019-12-02 DOI:10.1145/3366030.3366083

K. Radzikowski, Mateusz Forc, Le Wang, O. Yoshie, R. Nowak

引用次数: 0

Abstract

These days, automatic speech recognition (ASR) systems achieve higher and higher accuracy rates. The score drops significantly, in case when the ASR system is being used with a non-native speaker of the language to be recognized. The main reason is specific pronunciation and accent features. A limited volume of labeled non-native speech datasets makes it difficult to train new ASR systems for non-native speakers. In our research, we tried tackling the problem and its influence on the accuracy of ASR systems, using the style transfer methodology. We designed a pipeline for modifying the speech of a non-native speaker, so that it resembles the native speech to a higher extent. Our methodology can be used as a wrapper for any existing ASR system, which reduces the necessity of training new algorithms for non-native speech. The modification can be thus performed before passing the data forward to the speech recognition system itself.

查看原文本刊更多论文

非母语语音识别的口音中和

如今，自动语音识别(ASR)系统的准确率越来越高。如果ASR系统与待识别语言的非母语人士一起使用，分数会显著下降。主要原因是具体的发音和口音特点。有限数量的标记非母语语音数据集使得为非母语人士训练新的ASR系统变得困难。在我们的研究中，我们尝试使用风格迁移方法来解决这个问题及其对ASR系统准确性的影响。我们设计了一个管道来修改非母语者的语音，使其更接近母语。我们的方法可以作为任何现有ASR系统的包装，从而减少了为非母语语音训练新算法的必要性。因此，可以在将数据向前传递给语音识别系统本身之前执行修改。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services

自引率

0.00%

发文量