宁波话单音节语音的统计识别

Qinru Fan, Donghong Wang
{"title":"宁波话单音节语音的统计识别","authors":"Qinru Fan, Donghong Wang","doi":"10.1109/ISKE.2010.5680873","DOIUrl":null,"url":null,"abstract":"So far, the focus of most research on speech recognition was on speech recognition of mandarin Chinese or English. Since the feature of the research is that the same word pronounces the same, influence on speech recognition of the research concerns primarily with environmental factors. Ningbo dialect is very different than mandarin Chinese and English, for Ningbo dialect possesses some regional variations in pronunciation and intonation even in the area of Ningbo, thus pronunciation changes, or intonation changes is a more important factor than other factors. Therefore, finding a modeling way to suit pronunciation changes, or intonation changes is a vital prerequisite for speech recognition of Ningbo dialect. This paper is designed to probe into the speech recognition of Ningbo dialect, focusing on Fenghua county, Cixi county, Yinzhou district, and central Ningbo. We study the modeling method of Ningbo dialect from the angle of pronunciation changes and intonation changes and running time of recognition. In the research, 64 speech samples of 10 digits (1–10) used in the above-mentioned four regions were created, by using Mel frequency cepstrum coefficient (MFCC) to achieve feature of each digital speech. Then depending on the variations of the pronunciation and intonation of the digits, we do a lot of experiments, 20 models of training samples of digits (1–10) are constructed. A simplified Bayes decision rule is used for classification of Ningbo dialect digits. Experiment data suggested that the rate of speech recognition surpassed 75%. The recognition rate is superior to that recognition rate (52.5%) of a general modeling method that modeling of training samples do not consider factor of regional variations in pronunciation and intonation. We have a rise in robustness of speech recognition of Ningbo dialect. The modeling and recognition method used in the paper is easy to handle and get promoted.","PeriodicalId":6417,"journal":{"name":"2010 IEEE International Conference on Intelligent Systems and Knowledge Engineering","volume":"10 1","pages":"266-269"},"PeriodicalIF":0.0000,"publicationDate":"2010-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A statistical speech recognition of Ningbo dialect monosyllables\",\"authors\":\"Qinru Fan, Donghong Wang\",\"doi\":\"10.1109/ISKE.2010.5680873\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"So far, the focus of most research on speech recognition was on speech recognition of mandarin Chinese or English. Since the feature of the research is that the same word pronounces the same, influence on speech recognition of the research concerns primarily with environmental factors. Ningbo dialect is very different than mandarin Chinese and English, for Ningbo dialect possesses some regional variations in pronunciation and intonation even in the area of Ningbo, thus pronunciation changes, or intonation changes is a more important factor than other factors. Therefore, finding a modeling way to suit pronunciation changes, or intonation changes is a vital prerequisite for speech recognition of Ningbo dialect. This paper is designed to probe into the speech recognition of Ningbo dialect, focusing on Fenghua county, Cixi county, Yinzhou district, and central Ningbo. We study the modeling method of Ningbo dialect from the angle of pronunciation changes and intonation changes and running time of recognition. In the research, 64 speech samples of 10 digits (1–10) used in the above-mentioned four regions were created, by using Mel frequency cepstrum coefficient (MFCC) to achieve feature of each digital speech. Then depending on the variations of the pronunciation and intonation of the digits, we do a lot of experiments, 20 models of training samples of digits (1–10) are constructed. A simplified Bayes decision rule is used for classification of Ningbo dialect digits. Experiment data suggested that the rate of speech recognition surpassed 75%. The recognition rate is superior to that recognition rate (52.5%) of a general modeling method that modeling of training samples do not consider factor of regional variations in pronunciation and intonation. We have a rise in robustness of speech recognition of Ningbo dialect. The modeling and recognition method used in the paper is easy to handle and get promoted.\",\"PeriodicalId\":6417,\"journal\":{\"name\":\"2010 IEEE International Conference on Intelligent Systems and Knowledge Engineering\",\"volume\":\"10 1\",\"pages\":\"266-269\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE International Conference on Intelligent Systems and Knowledge Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISKE.2010.5680873\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Conference on Intelligent Systems and Knowledge Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISKE.2010.5680873","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

到目前为止,语音识别研究的重点大多集中在普通话或英语语音识别上。由于本研究的特点是同一个单词发音相同,因此影响本研究语音识别的主要因素是环境因素。宁波话与普通话和英语有很大的不同,因为即使在宁波地区,宁波话在语音和语调上也有一些区域性的差异,因此语音或语调的变化是一个比其他因素更重要的因素。因此,找到适合语音变化或语调变化的建模方法是宁波话语音识别的重要前提。本文以宁波市奉化县、慈溪县、鄞州区和宁波市中部为研究对象,对宁波方言语音识别进行研究。从语音变化、语调变化和识别运行时间的角度研究宁波方言的建模方法。本研究创建了上述四个区域中使用的64个10位数(1-10)的语音样本,利用Mel频率倒频谱系数(MFCC)来实现每个数字语音的特征。然后根据数字语音语调的变化,进行了大量的实验,构建了20个数字(1-10)的训练样本模型。采用简化的贝叶斯决策规则对宁波方言数字进行分类。实验数据表明,语音识别率超过75%。该方法的识别率优于不考虑语音语调区域差异因素的一般建模方法的识别率(52.5%)。宁波方言语音识别的鲁棒性有所提高。本文所采用的建模和识别方法易于操作和推广。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A statistical speech recognition of Ningbo dialect monosyllables
So far, the focus of most research on speech recognition was on speech recognition of mandarin Chinese or English. Since the feature of the research is that the same word pronounces the same, influence on speech recognition of the research concerns primarily with environmental factors. Ningbo dialect is very different than mandarin Chinese and English, for Ningbo dialect possesses some regional variations in pronunciation and intonation even in the area of Ningbo, thus pronunciation changes, or intonation changes is a more important factor than other factors. Therefore, finding a modeling way to suit pronunciation changes, or intonation changes is a vital prerequisite for speech recognition of Ningbo dialect. This paper is designed to probe into the speech recognition of Ningbo dialect, focusing on Fenghua county, Cixi county, Yinzhou district, and central Ningbo. We study the modeling method of Ningbo dialect from the angle of pronunciation changes and intonation changes and running time of recognition. In the research, 64 speech samples of 10 digits (1–10) used in the above-mentioned four regions were created, by using Mel frequency cepstrum coefficient (MFCC) to achieve feature of each digital speech. Then depending on the variations of the pronunciation and intonation of the digits, we do a lot of experiments, 20 models of training samples of digits (1–10) are constructed. A simplified Bayes decision rule is used for classification of Ningbo dialect digits. Experiment data suggested that the rate of speech recognition surpassed 75%. The recognition rate is superior to that recognition rate (52.5%) of a general modeling method that modeling of training samples do not consider factor of regional variations in pronunciation and intonation. We have a rise in robustness of speech recognition of Ningbo dialect. The modeling and recognition method used in the paper is easy to handle and get promoted.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信