古吉拉特语和马拉地语声道长度标准化语音引擎的开发

Shubham Sharma, Maulik C. Madhavi, H. Patil
{"title":"古吉拉特语和马拉地语声道长度标准化语音引擎的开发","authors":"Shubham Sharma, Maulik C. Madhavi, H. Patil","doi":"10.1109/ICSDA.2014.7051439","DOIUrl":null,"url":null,"abstract":"Phonetic engine (PE) is a system that converts speech sound units into symbols without any higher-level information (such as semantic or linguistic details). This paper presents the development of PE in two Indian languages, viz., Gujarati and Marathi. To investigate the performance of PE, speech recorded in three different modes, viz., read, spontaneous and lecture is considered. Database consists of a large number of speakers in each mode for these languages. In order to reduce the effects of speaker differences in the databases, Vocal Tract Length Normalization (VTLN) using Lee-Rose method is incorporated. Here, performances of PEs are tested using state-of-the-art Mel frequency cepstral coefficients (MFCC) and vocal tract length normalized features. Hidden Markov model (HMM)-based approach is used for modeling the phonetic units. On an average, improvement of 3.12 % and 1.32 % is achieved using vocal tract length normalized PE over MFCCs for Gujarati and Marathi, respectively.","PeriodicalId":361187,"journal":{"name":"2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA)","volume":"190 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Development of vocal tract length normalized phonetic engine for Gujarati and Marathi languages\",\"authors\":\"Shubham Sharma, Maulik C. Madhavi, H. Patil\",\"doi\":\"10.1109/ICSDA.2014.7051439\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Phonetic engine (PE) is a system that converts speech sound units into symbols without any higher-level information (such as semantic or linguistic details). This paper presents the development of PE in two Indian languages, viz., Gujarati and Marathi. To investigate the performance of PE, speech recorded in three different modes, viz., read, spontaneous and lecture is considered. Database consists of a large number of speakers in each mode for these languages. In order to reduce the effects of speaker differences in the databases, Vocal Tract Length Normalization (VTLN) using Lee-Rose method is incorporated. Here, performances of PEs are tested using state-of-the-art Mel frequency cepstral coefficients (MFCC) and vocal tract length normalized features. Hidden Markov model (HMM)-based approach is used for modeling the phonetic units. On an average, improvement of 3.12 % and 1.32 % is achieved using vocal tract length normalized PE over MFCCs for Gujarati and Marathi, respectively.\",\"PeriodicalId\":361187,\"journal\":{\"name\":\"2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA)\",\"volume\":\"190 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSDA.2014.7051439\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSDA.2014.7051439","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

语音引擎(PE)是一种将语音单位转换为符号的系统,不需要任何高级信息(如语义或语言细节)。本文介绍了两种印度语言,即古吉拉特语和马拉地语的体育发展。为了研究体育的表现,我们考虑了三种不同模式下的语音记录,即阅读、自发和讲课。数据库由这些语言的每种模式的大量发言者组成。为了减少数据库中说话人差异的影响,采用Lee-Rose方法进行声道长度归一化(VTLN)。在这里,使用最先进的Mel频率倒谱系数(MFCC)和声道长度归一化特征来测试PEs的性能。基于隐马尔可夫模型(HMM)的语音单元建模方法。对于古吉拉特语和马拉地语,使用声道长度标准化PE比mfccc平均分别提高3.12%和1.32%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Development of vocal tract length normalized phonetic engine for Gujarati and Marathi languages
Phonetic engine (PE) is a system that converts speech sound units into symbols without any higher-level information (such as semantic or linguistic details). This paper presents the development of PE in two Indian languages, viz., Gujarati and Marathi. To investigate the performance of PE, speech recorded in three different modes, viz., read, spontaneous and lecture is considered. Database consists of a large number of speakers in each mode for these languages. In order to reduce the effects of speaker differences in the databases, Vocal Tract Length Normalization (VTLN) using Lee-Rose method is incorporated. Here, performances of PEs are tested using state-of-the-art Mel frequency cepstral coefficients (MFCC) and vocal tract length normalized features. Hidden Markov model (HMM)-based approach is used for modeling the phonetic units. On an average, improvement of 3.12 % and 1.32 % is achieved using vocal tract length normalized PE over MFCCs for Gujarati and Marathi, respectively.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信