A comparative study on feature dependency of the Manipuri language based phonetic engine

2017 2nd International Conference on Communication Systems, Computing and IT Applications (CSCITA) Pub Date : 2017-04-01 DOI:10.1109/CSCITA.2017.8066533

S. K. Dutta, Salam Nandakishor, L. J. Singh

{"title":"A comparative study on feature dependency of the Manipuri language based phonetic engine","authors":"S. K. Dutta, Salam Nandakishor, L. J. Singh","doi":"10.1109/CSCITA.2017.8066533","DOIUrl":null,"url":null,"abstract":"This paper presents a study on how the performance of Phonetic engine(PE) varies with different set of spectral features selected for it. An exclusive study is carried out with a PE developed in the Manipuri language. Here, we built the PE using phonetic transcriptions and modeling of each phonetic unit by Hidden Markov Model (HMM). The symbols of International Phonetic Alphabet (IPA) (revised in 2005) are used in the transcription of the collected data. A 5-state left to right HMM with 32 mixtures in each state is being used to build a model that represents each phonetic unit. Speech feature extraction is a very important stage in the development of such a PE since it is responsible for the overall accuracy of the system. Therefore, selection of a proper feature extraction technique is very crucial in building the PE. In speech and speaker recognition literature, many techniques available for feature extraction, for example, the Linear Predictive Cepstral Coefficients (LPCC), the Mel-frequency Cepstral Coefficients (MFCC), the Perceptual Linear Prediction (PLP) and the Linear Predictive Coding (LPC) etc., to name a few. In our paper, we attempt to analyze the performance of the Manipuri PE for the three widely used spectral features: MFCC, PLP and LPCC for three different modes of collected data: Read, Lecture and Conversation. Here, we are using 13, 26 and 39 coefficient dimensions for each of the above features. After analyzing the accuracy of our system, we found that the PLP and the MFCC are superior to the LPCC under all conditions.","PeriodicalId":299147,"journal":{"name":"2017 2nd International Conference on Communication Systems, Computing and IT Applications (CSCITA)","volume":"96 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 2nd International Conference on Communication Systems, Computing and IT Applications (CSCITA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSCITA.2017.8066533","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

This paper presents a study on how the performance of Phonetic engine(PE) varies with different set of spectral features selected for it. An exclusive study is carried out with a PE developed in the Manipuri language. Here, we built the PE using phonetic transcriptions and modeling of each phonetic unit by Hidden Markov Model (HMM). The symbols of International Phonetic Alphabet (IPA) (revised in 2005) are used in the transcription of the collected data. A 5-state left to right HMM with 32 mixtures in each state is being used to build a model that represents each phonetic unit. Speech feature extraction is a very important stage in the development of such a PE since it is responsible for the overall accuracy of the system. Therefore, selection of a proper feature extraction technique is very crucial in building the PE. In speech and speaker recognition literature, many techniques available for feature extraction, for example, the Linear Predictive Cepstral Coefficients (LPCC), the Mel-frequency Cepstral Coefficients (MFCC), the Perceptual Linear Prediction (PLP) and the Linear Predictive Coding (LPC) etc., to name a few. In our paper, we attempt to analyze the performance of the Manipuri PE for the three widely used spectral features: MFCC, PLP and LPCC for three different modes of collected data: Read, Lecture and Conversation. Here, we are using 13, 26 and 39 coefficient dimensions for each of the above features. After analyzing the accuracy of our system, we found that the PLP and the MFCC are superior to the LPCC under all conditions.

查看原文本刊更多论文

基于曼尼普尔语语音引擎的特征依赖比较研究

本文研究了语音引擎(PE)在选择不同的频谱特征时性能的变化。在曼尼普尔语开发的PE中进行了独家研究。在这里，我们使用语音转录构建PE，并通过隐马尔可夫模型(HMM)对每个语音单元建模。收集数据的转录采用2005年修订的国际音标(IPA)符号。一个从左到右的5状态HMM，每个状态有32种混合物，用来构建一个代表每个语音单位的模型。语音特征提取是此类PE开发中非常重要的一个阶段，因为它负责系统的整体准确性。因此，选择一种合适的特征提取技术是构建PE的关键。在语音和说话人识别文献中，有许多可用于特征提取的技术，例如线性预测倒谱系数(LPCC)， mel频率倒谱系数(MFCC)，感知线性预测(PLP)和线性预测编码(LPC)等。在本文中，我们试图分析曼尼普尔PE在三种不同的收集数据模式(阅读、演讲和对话)下对三种广泛使用的光谱特征(MFCC、PLP和LPCC)的性能。在这里，我们对上述每个特征分别使用了13、26和39个系数维度。通过对系统的精度分析，我们发现PLP和MFCC在所有条件下都优于LPCC。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 2nd International Conference on Communication Systems, Computing and IT Applications (CSCITA)

自引率

0.00%

发文量