古吉拉特语和马拉地语声道长度标准化语音引擎的开发

2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA) Pub Date : 2014-09-01 DOI:10.1109/ICSDA.2014.7051439

Shubham Sharma, Maulik C. Madhavi, H. Patil

{"title":"古吉拉特语和马拉地语声道长度标准化语音引擎的开发","authors":"Shubham Sharma, Maulik C. Madhavi, H. Patil","doi":"10.1109/ICSDA.2014.7051439","DOIUrl":null,"url":null,"abstract":"Phonetic engine (PE) is a system that converts speech sound units into symbols without any higher-level information (such as semantic or linguistic details). This paper presents the development of PE in two Indian languages, viz., Gujarati and Marathi. To investigate the performance of PE, speech recorded in three different modes, viz., read, spontaneous and lecture is considered. Database consists of a large number of speakers in each mode for these languages. In order to reduce the effects of speaker differences in the databases, Vocal Tract Length Normalization (VTLN) using Lee-Rose method is incorporated. Here, performances of PEs are tested using state-of-the-art Mel frequency cepstral coefficients (MFCC) and vocal tract length normalized features. Hidden Markov model (HMM)-based approach is used for modeling the phonetic units. On an average, improvement of 3.12 % and 1.32 % is achieved using vocal tract length normalized PE over MFCCs for Gujarati and Marathi, respectively.","PeriodicalId":361187,"journal":{"name":"2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA)","volume":"190 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Development of vocal tract length normalized phonetic engine for Gujarati and Marathi languages\",\"authors\":\"Shubham Sharma, Maulik C. Madhavi, H. Patil\",\"doi\":\"10.1109/ICSDA.2014.7051439\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Phonetic engine (PE) is a system that converts speech sound units into symbols without any higher-level information (such as semantic or linguistic details). This paper presents the development of PE in two Indian languages, viz., Gujarati and Marathi. To investigate the performance of PE, speech recorded in three different modes, viz., read, spontaneous and lecture is considered. Database consists of a large number of speakers in each mode for these languages. In order to reduce the effects of speaker differences in the databases, Vocal Tract Length Normalization (VTLN) using Lee-Rose method is incorporated. Here, performances of PEs are tested using state-of-the-art Mel frequency cepstral coefficients (MFCC) and vocal tract length normalized features. Hidden Markov model (HMM)-based approach is used for modeling the phonetic units. On an average, improvement of 3.12 % and 1.32 % is achieved using vocal tract length normalized PE over MFCCs for Gujarati and Marathi, respectively.\",\"PeriodicalId\":361187,\"journal\":{\"name\":\"2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA)\",\"volume\":\"190 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSDA.2014.7051439\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSDA.2014.7051439","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

语音引擎(PE)是一种将语音单位转换为符号的系统，不需要任何高级信息(如语义或语言细节)。本文介绍了两种印度语言，即古吉拉特语和马拉地语的体育发展。为了研究体育的表现，我们考虑了三种不同模式下的语音记录，即阅读、自发和讲课。数据库由这些语言的每种模式的大量发言者组成。为了减少数据库中说话人差异的影响，采用Lee-Rose方法进行声道长度归一化(VTLN)。在这里，使用最先进的Mel频率倒谱系数(MFCC)和声道长度归一化特征来测试PEs的性能。基于隐马尔可夫模型(HMM)的语音单元建模方法。对于古吉拉特语和马拉地语，使用声道长度标准化PE比mfccc平均分别提高3.12%和1.32%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Development of vocal tract length normalized phonetic engine for Gujarati and Marathi languages

Phonetic engine (PE) is a system that converts speech sound units into symbols without any higher-level information (such as semantic or linguistic details). This paper presents the development of PE in two Indian languages, viz., Gujarati and Marathi. To investigate the performance of PE, speech recorded in three different modes, viz., read, spontaneous and lecture is considered. Database consists of a large number of speakers in each mode for these languages. In order to reduce the effects of speaker differences in the databases, Vocal Tract Length Normalization (VTLN) using Lee-Rose method is incorporated. Here, performances of PEs are tested using state-of-the-art Mel frequency cepstral coefficients (MFCC) and vocal tract length normalized features. Hidden Markov model (HMM)-based approach is used for modeling the phonetic units. On an average, improvement of 3.12 % and 1.32 % is achieved using vocal tract length normalized PE over MFCCs for Gujarati and Marathi, respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA)

自引率

0.00%

发文量