{"title":"Towards Building an Intelligent Voice System for Kazakh: Acoustic Database and System Design","authors":"Zhandos Yessenbayev, Muslima Karabalayeva, Firuza Shamayeva","doi":"10.1109/EUROSIM.2013.75","DOIUrl":null,"url":null,"abstract":"In this paper we describe our initiative to build an intelligent voice system for Kazakh over the telephone lines. In particular, we collected the first acoustic database of Kazakh telephone speech containing common words and phrases uttered by 169 native speakers to train an acoustic model. The database has more than 17 hours of speech and is balanced according to the gender, region and age groups. The training was performed using CMU Sphinx Toolkits and exploited the context-dependent tied-state continuous Hidden Markov Models with 8 Gaussian mixtures per state. The experiments show that the best WER of 4, 1% on test data is obtained with 2000 senones and the dimension of the feature vectors of 23. Later, this model was used in the system's implementation. While designing the system, we tried to focus on friendly graphical user interface and all-in-one functionality. The system is intended to help easy and fast deployment of speech-enabled applications for the industry, governmental and educational institutions.","PeriodicalId":386945,"journal":{"name":"2013 8th EUROSIM Congress on Modelling and Simulation","volume":"82 9","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 8th EUROSIM Congress on Modelling and Simulation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EUROSIM.2013.75","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper we describe our initiative to build an intelligent voice system for Kazakh over the telephone lines. In particular, we collected the first acoustic database of Kazakh telephone speech containing common words and phrases uttered by 169 native speakers to train an acoustic model. The database has more than 17 hours of speech and is balanced according to the gender, region and age groups. The training was performed using CMU Sphinx Toolkits and exploited the context-dependent tied-state continuous Hidden Markov Models with 8 Gaussian mixtures per state. The experiments show that the best WER of 4, 1% on test data is obtained with 2000 senones and the dimension of the feature vectors of 23. Later, this model was used in the system's implementation. While designing the system, we tried to focus on friendly graphical user interface and all-in-one functionality. The system is intended to help easy and fast deployment of speech-enabled applications for the industry, governmental and educational institutions.