使用特征和模型验证语言识别系统的比较分析

2017 International Conference on Inventive Computing and Informatics (ICICI) Pub Date : 2017-11-01 DOI:10.1109/ICICI.2017.8365224

A. Revathi, C. Jeyalakshmi

{"title":"使用特征和模型验证语言识别系统的比较分析","authors":"A. Revathi, C. Jeyalakshmi","doi":"10.1109/ICICI.2017.8365224","DOIUrl":null,"url":null,"abstract":"Identifying the spoken language from the speech is the emerging research area. For this task of language identification, experiments are implemented with two approaches such as Vector quantization (VQ) based clustering and Gaussian mixture modelling (GMM) with Mel frequency linear predictive cepstrum (MFPLPC), Mel frequency cepstrum (MFCC) and their shifted delta cepstral (SDC) features. Hypothesized language is identified based on minimum of averages and maximum log likelihood value corresponding to the model using minimum distance and Maximum a posteriori probability (MAP) classifiers. Better performance is observed for the basic feature MFPLP and VQ based clustering. The results are projected to indicate that the combined MFCC feature with its SDC component with size 52 has provided the better results using GMM as a modeling technique. Similarly, the combined MFPLP feature with its SDC component of size 52 provides next higher results as compared to the basic MFPLP feature of size 13 using clustering as a modeling technique. Overall performance of the system obtained is 99.81%. The database considered in this work contains speech utterances in seven classical and phonetically rich speaker specific Indian languages such as Bengali, Hindi, Kannada, Malayalam, Marathi, Tamil and Telugu.","PeriodicalId":369524,"journal":{"name":"2017 International Conference on Inventive Computing and Informatics (ICICI)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Comparative analysis on the use of features and models for validating language identification system\",\"authors\":\"A. Revathi, C. Jeyalakshmi\",\"doi\":\"10.1109/ICICI.2017.8365224\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Identifying the spoken language from the speech is the emerging research area. For this task of language identification, experiments are implemented with two approaches such as Vector quantization (VQ) based clustering and Gaussian mixture modelling (GMM) with Mel frequency linear predictive cepstrum (MFPLPC), Mel frequency cepstrum (MFCC) and their shifted delta cepstral (SDC) features. Hypothesized language is identified based on minimum of averages and maximum log likelihood value corresponding to the model using minimum distance and Maximum a posteriori probability (MAP) classifiers. Better performance is observed for the basic feature MFPLP and VQ based clustering. The results are projected to indicate that the combined MFCC feature with its SDC component with size 52 has provided the better results using GMM as a modeling technique. Similarly, the combined MFPLP feature with its SDC component of size 52 provides next higher results as compared to the basic MFPLP feature of size 13 using clustering as a modeling technique. Overall performance of the system obtained is 99.81%. The database considered in this work contains speech utterances in seven classical and phonetically rich speaker specific Indian languages such as Bengali, Hindi, Kannada, Malayalam, Marathi, Tamil and Telugu.\",\"PeriodicalId\":369524,\"journal\":{\"name\":\"2017 International Conference on Inventive Computing and Informatics (ICICI)\",\"volume\":\"34 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 International Conference on Inventive Computing and Informatics (ICICI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICICI.2017.8365224\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Inventive Computing and Informatics (ICICI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICI.2017.8365224","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

从言语中识别口语是一个新兴的研究领域。为了完成语言识别任务，实验采用了基于矢量量化(VQ)的聚类和基于Mel频率线性预测倒谱(MFPLPC)、Mel频率倒谱(MFCC)及其移位δ倒谱(SDC)特征的高斯混合建模(GMM)两种方法。使用最小距离和最大后验概率(MAP)分类器，根据模型对应的最小平均值和最大对数似然值来识别假设语言。基于基本特征的MFPLP和基于VQ的聚类具有更好的性能。结果表明，将MFCC特征与其大小为52的SDC分量组合使用GMM作为建模技术可以提供更好的结果。类似地，与使用聚类作为建模技术的大小为13的基本MFPLP特征相比，将大小为52的SDC组件组合在一起的MFPLP特征提供了更高的结果。系统总体性能为99.81%。在这项工作中考虑的数据库包含七种经典和语音丰富的特定印度语言的语音，如孟加拉语，印地语，卡纳达语，马拉雅拉姆语，马拉地语，泰米尔语和泰卢固语。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Comparative analysis on the use of features and models for validating language identification system

Identifying the spoken language from the speech is the emerging research area. For this task of language identification, experiments are implemented with two approaches such as Vector quantization (VQ) based clustering and Gaussian mixture modelling (GMM) with Mel frequency linear predictive cepstrum (MFPLPC), Mel frequency cepstrum (MFCC) and their shifted delta cepstral (SDC) features. Hypothesized language is identified based on minimum of averages and maximum log likelihood value corresponding to the model using minimum distance and Maximum a posteriori probability (MAP) classifiers. Better performance is observed for the basic feature MFPLP and VQ based clustering. The results are projected to indicate that the combined MFCC feature with its SDC component with size 52 has provided the better results using GMM as a modeling technique. Similarly, the combined MFPLP feature with its SDC component of size 52 provides next higher results as compared to the basic MFPLP feature of size 13 using clustering as a modeling technique. Overall performance of the system obtained is 99.81%. The database considered in this work contains speech utterances in seven classical and phonetically rich speaker specific Indian languages such as Bengali, Hindi, Kannada, Malayalam, Marathi, Tamil and Telugu.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 International Conference on Inventive Computing and Informatics (ICICI)

自引率

0.00%

发文量