构建哈萨克语智能语音系统:声学数据库与系统设计

Zhandos Yessenbayev, Muslima Karabalayeva, Firuza Shamayeva
{"title":"构建哈萨克语智能语音系统:声学数据库与系统设计","authors":"Zhandos Yessenbayev, Muslima Karabalayeva, Firuza Shamayeva","doi":"10.1109/EUROSIM.2013.75","DOIUrl":null,"url":null,"abstract":"In this paper we describe our initiative to build an intelligent voice system for Kazakh over the telephone lines. In particular, we collected the first acoustic database of Kazakh telephone speech containing common words and phrases uttered by 169 native speakers to train an acoustic model. The database has more than 17 hours of speech and is balanced according to the gender, region and age groups. The training was performed using CMU Sphinx Toolkits and exploited the context-dependent tied-state continuous Hidden Markov Models with 8 Gaussian mixtures per state. The experiments show that the best WER of 4, 1% on test data is obtained with 2000 senones and the dimension of the feature vectors of 23. Later, this model was used in the system's implementation. While designing the system, we tried to focus on friendly graphical user interface and all-in-one functionality. The system is intended to help easy and fast deployment of speech-enabled applications for the industry, governmental and educational institutions.","PeriodicalId":386945,"journal":{"name":"2013 8th EUROSIM Congress on Modelling and Simulation","volume":"82 9","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Towards Building an Intelligent Voice System for Kazakh: Acoustic Database and System Design\",\"authors\":\"Zhandos Yessenbayev, Muslima Karabalayeva, Firuza Shamayeva\",\"doi\":\"10.1109/EUROSIM.2013.75\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we describe our initiative to build an intelligent voice system for Kazakh over the telephone lines. In particular, we collected the first acoustic database of Kazakh telephone speech containing common words and phrases uttered by 169 native speakers to train an acoustic model. The database has more than 17 hours of speech and is balanced according to the gender, region and age groups. The training was performed using CMU Sphinx Toolkits and exploited the context-dependent tied-state continuous Hidden Markov Models with 8 Gaussian mixtures per state. The experiments show that the best WER of 4, 1% on test data is obtained with 2000 senones and the dimension of the feature vectors of 23. Later, this model was used in the system's implementation. While designing the system, we tried to focus on friendly graphical user interface and all-in-one functionality. The system is intended to help easy and fast deployment of speech-enabled applications for the industry, governmental and educational institutions.\",\"PeriodicalId\":386945,\"journal\":{\"name\":\"2013 8th EUROSIM Congress on Modelling and Simulation\",\"volume\":\"82 9\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 8th EUROSIM Congress on Modelling and Simulation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/EUROSIM.2013.75\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 8th EUROSIM Congress on Modelling and Simulation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EUROSIM.2013.75","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在本文中,我们描述了我们的倡议,建立一个智能语音系统的哈萨克语通过电话线。特别是,我们收集了第一个哈萨克语电话语音的声学数据库,其中包含169名母语人士发出的常见单词和短语,以训练声学模型。该数据库有超过17个小时的演讲,并根据性别、地区和年龄组进行平衡。使用CMU Sphinx工具包进行训练,并利用每个状态有8个高斯混合的上下文相关的连续隐马尔可夫模型进行训练。实验表明,当特征向量的维数为23时,在2000个senones的测试数据上获得了4.1%的最佳WER。随后,该模型被用于系统的实现。在设计系统时,我们尽量把重点放在友好的图形用户界面和一体化的功能上。该系统旨在帮助工业、政府和教育机构轻松快速地部署语音应用程序。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Towards Building an Intelligent Voice System for Kazakh: Acoustic Database and System Design
In this paper we describe our initiative to build an intelligent voice system for Kazakh over the telephone lines. In particular, we collected the first acoustic database of Kazakh telephone speech containing common words and phrases uttered by 169 native speakers to train an acoustic model. The database has more than 17 hours of speech and is balanced according to the gender, region and age groups. The training was performed using CMU Sphinx Toolkits and exploited the context-dependent tied-state continuous Hidden Markov Models with 8 Gaussian mixtures per state. The experiments show that the best WER of 4, 1% on test data is obtained with 2000 senones and the dimension of the feature vectors of 23. Later, this model was used in the system's implementation. While designing the system, we tried to focus on friendly graphical user interface and all-in-one functionality. The system is intended to help easy and fast deployment of speech-enabled applications for the industry, governmental and educational institutions.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信