Jiahua Xu, Kaveen Matta, Shaiful Islam, A. Nürnberger
{"title":"German Speech Recognition System using DeepSpeech","authors":"Jiahua Xu, Kaveen Matta, Shaiful Islam, A. Nürnberger","doi":"10.1145/3443279.3443313","DOIUrl":null,"url":null,"abstract":"Speech recognition focus on the translation of speech from an audio format to a text. Popular models are available for the English language as open source in the domain of voice/speech recognition; however, German language open models and training schemes are rather rare. An end-to-end real-time German speech-to-text system based on multiple German language datasets is worthy of more attention and further investigation. In this paper, we combined multiple German datasets on the market and optimizes the Deep-speech for training a real-time German speech-to-text model. A GUI is also proposed for functionality demonstration. Our model performs considerably well compared to other state-of-the-art since we utilized noisy data to replicate real-life scenarios. We released our fully trained German model along with its parameter configurations to promote the diversification of the open-source model for the German language.","PeriodicalId":414366,"journal":{"name":"Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3443279.3443313","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Speech recognition focus on the translation of speech from an audio format to a text. Popular models are available for the English language as open source in the domain of voice/speech recognition; however, German language open models and training schemes are rather rare. An end-to-end real-time German speech-to-text system based on multiple German language datasets is worthy of more attention and further investigation. In this paper, we combined multiple German datasets on the market and optimizes the Deep-speech for training a real-time German speech-to-text model. A GUI is also proposed for functionality demonstration. Our model performs considerably well compared to other state-of-the-art since we utilized noisy data to replicate real-life scenarios. We released our fully trained German model along with its parameter configurations to promote the diversification of the open-source model for the German language.