Jiahua Xu, Kaveen Matta, Shaiful Islam, A. Nürnberger
{"title":"使用DeepSpeech的德语语音识别系统","authors":"Jiahua Xu, Kaveen Matta, Shaiful Islam, A. Nürnberger","doi":"10.1145/3443279.3443313","DOIUrl":null,"url":null,"abstract":"Speech recognition focus on the translation of speech from an audio format to a text. Popular models are available for the English language as open source in the domain of voice/speech recognition; however, German language open models and training schemes are rather rare. An end-to-end real-time German speech-to-text system based on multiple German language datasets is worthy of more attention and further investigation. In this paper, we combined multiple German datasets on the market and optimizes the Deep-speech for training a real-time German speech-to-text model. A GUI is also proposed for functionality demonstration. Our model performs considerably well compared to other state-of-the-art since we utilized noisy data to replicate real-life scenarios. We released our fully trained German model along with its parameter configurations to promote the diversification of the open-source model for the German language.","PeriodicalId":414366,"journal":{"name":"Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"German Speech Recognition System using DeepSpeech\",\"authors\":\"Jiahua Xu, Kaveen Matta, Shaiful Islam, A. Nürnberger\",\"doi\":\"10.1145/3443279.3443313\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speech recognition focus on the translation of speech from an audio format to a text. Popular models are available for the English language as open source in the domain of voice/speech recognition; however, German language open models and training schemes are rather rare. An end-to-end real-time German speech-to-text system based on multiple German language datasets is worthy of more attention and further investigation. In this paper, we combined multiple German datasets on the market and optimizes the Deep-speech for training a real-time German speech-to-text model. A GUI is also proposed for functionality demonstration. Our model performs considerably well compared to other state-of-the-art since we utilized noisy data to replicate real-life scenarios. We released our fully trained German model along with its parameter configurations to promote the diversification of the open-source model for the German language.\",\"PeriodicalId\":414366,\"journal\":{\"name\":\"Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval\",\"volume\":\"50 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3443279.3443313\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3443279.3443313","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Speech recognition focus on the translation of speech from an audio format to a text. Popular models are available for the English language as open source in the domain of voice/speech recognition; however, German language open models and training schemes are rather rare. An end-to-end real-time German speech-to-text system based on multiple German language datasets is worthy of more attention and further investigation. In this paper, we combined multiple German datasets on the market and optimizes the Deep-speech for training a real-time German speech-to-text model. A GUI is also proposed for functionality demonstration. Our model performs considerably well compared to other state-of-the-art since we utilized noisy data to replicate real-life scenarios. We released our fully trained German model along with its parameter configurations to promote the diversification of the open-source model for the German language.