构建哈萨克语智能语音系统:声学数据库与系统设计

2013 8th EUROSIM Congress on Modelling and Simulation Pub Date : 2013-09-10 DOI:10.1109/EUROSIM.2013.75

Zhandos Yessenbayev, Muslima Karabalayeva, Firuza Shamayeva

{"title":"构建哈萨克语智能语音系统:声学数据库与系统设计","authors":"Zhandos Yessenbayev, Muslima Karabalayeva, Firuza Shamayeva","doi":"10.1109/EUROSIM.2013.75","DOIUrl":null,"url":null,"abstract":"In this paper we describe our initiative to build an intelligent voice system for Kazakh over the telephone lines. In particular, we collected the first acoustic database of Kazakh telephone speech containing common words and phrases uttered by 169 native speakers to train an acoustic model. The database has more than 17 hours of speech and is balanced according to the gender, region and age groups. The training was performed using CMU Sphinx Toolkits and exploited the context-dependent tied-state continuous Hidden Markov Models with 8 Gaussian mixtures per state. The experiments show that the best WER of 4, 1% on test data is obtained with 2000 senones and the dimension of the feature vectors of 23. Later, this model was used in the system's implementation. While designing the system, we tried to focus on friendly graphical user interface and all-in-one functionality. The system is intended to help easy and fast deployment of speech-enabled applications for the industry, governmental and educational institutions.","PeriodicalId":386945,"journal":{"name":"2013 8th EUROSIM Congress on Modelling and Simulation","volume":"82 9","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Towards Building an Intelligent Voice System for Kazakh: Acoustic Database and System Design\",\"authors\":\"Zhandos Yessenbayev, Muslima Karabalayeva, Firuza Shamayeva\",\"doi\":\"10.1109/EUROSIM.2013.75\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we describe our initiative to build an intelligent voice system for Kazakh over the telephone lines. In particular, we collected the first acoustic database of Kazakh telephone speech containing common words and phrases uttered by 169 native speakers to train an acoustic model. The database has more than 17 hours of speech and is balanced according to the gender, region and age groups. The training was performed using CMU Sphinx Toolkits and exploited the context-dependent tied-state continuous Hidden Markov Models with 8 Gaussian mixtures per state. The experiments show that the best WER of 4, 1% on test data is obtained with 2000 senones and the dimension of the feature vectors of 23. Later, this model was used in the system's implementation. While designing the system, we tried to focus on friendly graphical user interface and all-in-one functionality. The system is intended to help easy and fast deployment of speech-enabled applications for the industry, governmental and educational institutions.\",\"PeriodicalId\":386945,\"journal\":{\"name\":\"2013 8th EUROSIM Congress on Modelling and Simulation\",\"volume\":\"82 9\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 8th EUROSIM Congress on Modelling and Simulation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/EUROSIM.2013.75\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 8th EUROSIM Congress on Modelling and Simulation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EUROSIM.2013.75","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在本文中，我们描述了我们的倡议，建立一个智能语音系统的哈萨克语通过电话线。特别是，我们收集了第一个哈萨克语电话语音的声学数据库，其中包含169名母语人士发出的常见单词和短语，以训练声学模型。该数据库有超过17个小时的演讲，并根据性别、地区和年龄组进行平衡。使用CMU Sphinx工具包进行训练，并利用每个状态有8个高斯混合的上下文相关的连续隐马尔可夫模型进行训练。实验表明，当特征向量的维数为23时，在2000个senones的测试数据上获得了4.1%的最佳WER。随后，该模型被用于系统的实现。在设计系统时，我们尽量把重点放在友好的图形用户界面和一体化的功能上。该系统旨在帮助工业、政府和教育机构轻松快速地部署语音应用程序。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Towards Building an Intelligent Voice System for Kazakh: Acoustic Database and System Design

In this paper we describe our initiative to build an intelligent voice system for Kazakh over the telephone lines. In particular, we collected the first acoustic database of Kazakh telephone speech containing common words and phrases uttered by 169 native speakers to train an acoustic model. The database has more than 17 hours of speech and is balanced according to the gender, region and age groups. The training was performed using CMU Sphinx Toolkits and exploited the context-dependent tied-state continuous Hidden Markov Models with 8 Gaussian mixtures per state. The experiments show that the best WER of 4, 1% on test data is obtained with 2000 senones and the dimension of the feature vectors of 23. Later, this model was used in the system's implementation. While designing the system, we tried to focus on friendly graphical user interface and all-in-one functionality. The system is intended to help easy and fast deployment of speech-enabled applications for the industry, governmental and educational institutions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2013 8th EUROSIM Congress on Modelling and Simulation

自引率

0.00%

发文量