Marilena Panaite, Stefan Ruseti, M. Dascalu, Stefan Trausan-Matu
{"title":"Towards a Deep Speech Model for Romanian Language","authors":"Marilena Panaite, Stefan Ruseti, M. Dascalu, Stefan Trausan-Matu","doi":"10.1109/CSCS.2019.00076","DOIUrl":null,"url":null,"abstract":"Automatic speech recognition systems have gained popularity due to their gain in terms of usability and integration in cross domain applications. While traditional approaches are developed over elaborated pipelines that need specific pre-trained models for a language (acoustic model, a phonetic dictionary, etc.), deep learning architectures like Recurrent Neural Networks have been trained for automatic speech recognition using only large datasets of speech corpora (audio and aligned transcript files). Starting from the DeepSpeech architecture, we present the performance of the model trained for Romanian language over the SWARA speech corpus which contains almost 21 hours of speech data using 17 different speakers. The experiments were focused on obtaining the best performance of the network in terms of Word Error Rate by tweaking the parameters of the model on the SWARA dataset. We present preliminary results obtained for this Romanian dataset, alongside with the encountered limitations while training the model on other languages besides English.","PeriodicalId":352411,"journal":{"name":"2019 22nd International Conference on Control Systems and Computer Science (CSCS)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 22nd International Conference on Control Systems and Computer Science (CSCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSCS.2019.00076","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Automatic speech recognition systems have gained popularity due to their gain in terms of usability and integration in cross domain applications. While traditional approaches are developed over elaborated pipelines that need specific pre-trained models for a language (acoustic model, a phonetic dictionary, etc.), deep learning architectures like Recurrent Neural Networks have been trained for automatic speech recognition using only large datasets of speech corpora (audio and aligned transcript files). Starting from the DeepSpeech architecture, we present the performance of the model trained for Romanian language over the SWARA speech corpus which contains almost 21 hours of speech data using 17 different speakers. The experiments were focused on obtaining the best performance of the network in terms of Word Error Rate by tweaking the parameters of the model on the SWARA dataset. We present preliminary results obtained for this Romanian dataset, alongside with the encountered limitations while training the model on other languages besides English.