基于递归神经网络的端到端语音识别

Proceedings of Intelligent Computing and Technologies Conference Pub Date : 2021-07-12 DOI:10.21467/proceedings.115.20

Rene Avalloni de Morais, B. Saha

{"title":"基于递归神经网络的端到端语音识别","authors":"Rene Avalloni de Morais, B. Saha","doi":"10.21467/proceedings.115.20","DOIUrl":null,"url":null,"abstract":"Deep learning algorithms have received dramatic progress in the area of natural language processing and automatic human speech recognition. However, the accuracy of the deep learning algorithms depends on the amount and quality of the data and training deep models requires high-performance computing resources. In this backdrop, this paper adresses an end-to-end speech recognition system where we finetune Mozilla DeepSpeech architecture using two different datasets: LibriSpeech clean dataset and Harvard speech dataset. We train Long Short Term Memory (LSTM) based deep Recurrent Neural Netowrk (RNN) models in Google Colab platform and use their GPU resources. Extensive experimental results demonstrate that Mozilla DeepSpeech model could be fine-tuned for different audio datasets to recognize speeches successfully.","PeriodicalId":413368,"journal":{"name":"Proceedings of Intelligent Computing and Technologies Conference","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"End-to-End Speech Recognition Using Recurrent Neural Network (RNN)\",\"authors\":\"Rene Avalloni de Morais, B. Saha\",\"doi\":\"10.21467/proceedings.115.20\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep learning algorithms have received dramatic progress in the area of natural language processing and automatic human speech recognition. However, the accuracy of the deep learning algorithms depends on the amount and quality of the data and training deep models requires high-performance computing resources. In this backdrop, this paper adresses an end-to-end speech recognition system where we finetune Mozilla DeepSpeech architecture using two different datasets: LibriSpeech clean dataset and Harvard speech dataset. We train Long Short Term Memory (LSTM) based deep Recurrent Neural Netowrk (RNN) models in Google Colab platform and use their GPU resources. Extensive experimental results demonstrate that Mozilla DeepSpeech model could be fine-tuned for different audio datasets to recognize speeches successfully.\",\"PeriodicalId\":413368,\"journal\":{\"name\":\"Proceedings of Intelligent Computing and Technologies Conference\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-07-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of Intelligent Computing and Technologies Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21467/proceedings.115.20\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of Intelligent Computing and Technologies Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21467/proceedings.115.20","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

深度学习算法在自然语言处理和人类语音自动识别领域取得了巨大进展。然而，深度学习算法的准确性取决于数据的数量和质量，训练深度模型需要高性能的计算资源。在此背景下，本文讨论了一个端到端语音识别系统，我们使用两个不同的数据集:librisspeech干净数据集和哈佛语音数据集来微调Mozilla DeepSpeech架构。我们在Google Colab平台上训练基于长短期记忆(LSTM)的深度递归神经网络(RNN)模型，并使用其GPU资源。大量的实验结果表明，Mozilla DeepSpeech模型可以针对不同的音频数据集进行微调，从而成功地识别语音。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

End-to-End Speech Recognition Using Recurrent Neural Network (RNN)

Deep learning algorithms have received dramatic progress in the area of natural language processing and automatic human speech recognition. However, the accuracy of the deep learning algorithms depends on the amount and quality of the data and training deep models requires high-performance computing resources. In this backdrop, this paper adresses an end-to-end speech recognition system where we finetune Mozilla DeepSpeech architecture using two different datasets: LibriSpeech clean dataset and Harvard speech dataset. We train Long Short Term Memory (LSTM) based deep Recurrent Neural Netowrk (RNN) models in Google Colab platform and use their GPU resources. Extensive experimental results demonstrate that Mozilla DeepSpeech model could be fine-tuned for different audio datasets to recognize speeches successfully.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of Intelligent Computing and Technologies Conference

自引率

0.00%

发文量