Automatic recognition of major language families in India

2012 4th International Conference on Intelligent Human Computer Interaction (IHCI) Pub Date : 2012-12-01 DOI:10.1109/IHCI.2012.6481844

D. Sengupta, G. Saha

{"title":"Automatic recognition of major language families in India","authors":"D. Sengupta, G. Saha","doi":"10.1109/IHCI.2012.6481844","DOIUrl":null,"url":null,"abstract":"India is a vast country with a large number of languages. Among these some languages descend from a single mother language giving rise to a language family. The major official languages in India fall under two language families namely Indo-European and Dravidian. In this paper, we have discussed about a system which takes speech file as input and identifies the language family to which it belongs. We also used this system to find out the influence of Dravidian family on Indo-European family. The system uses a combination of Mel Frequency Cepstral Coefficients (MFCC) and Shifted Delta Coefficients (SDC) as language specific features. Presently, SDC is the most popular feature for language identification. It captures temporal information of speech over a broad range of time. Gaussian Mixture Model based approach is used to effectively model the language families where the distribution of feature vector of a class is approximated using sum of Gaussians. The results give interesting insights of certain Indian languages and applicability of machine learning process in this domain.","PeriodicalId":107245,"journal":{"name":"2012 4th International Conference on Intelligent Human Computer Interaction (IHCI)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 4th International Conference on Intelligent Human Computer Interaction (IHCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IHCI.2012.6481844","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

India is a vast country with a large number of languages. Among these some languages descend from a single mother language giving rise to a language family. The major official languages in India fall under two language families namely Indo-European and Dravidian. In this paper, we have discussed about a system which takes speech file as input and identifies the language family to which it belongs. We also used this system to find out the influence of Dravidian family on Indo-European family. The system uses a combination of Mel Frequency Cepstral Coefficients (MFCC) and Shifted Delta Coefficients (SDC) as language specific features. Presently, SDC is the most popular feature for language identification. It captures temporal information of speech over a broad range of time. Gaussian Mixture Model based approach is used to effectively model the language families where the distribution of feature vector of a class is approximated using sum of Gaussians. The results give interesting insights of certain Indian languages and applicability of machine learning process in this domain.

查看原文本刊更多论文

印度主要语系的自动识别

印度是一个幅员辽阔的国家，有很多语言。在这些语言中，有些是从单一的母语演变而来的，形成了一个语系。印度的主要官方语言分为两个语系，即印欧语和德拉威语。本文讨论了一种以语音文件为输入并识别其所属语族的系统。我们也用这个系统来研究德拉威人家族对印欧人家族的影响。该系统使用Mel频率倒谱系数(MFCC)和移位δ系数(SDC)的组合作为语言特定特征。目前，SDC是最流行的语言识别特性。它在很长一段时间内捕捉语音的时间信息。基于高斯混合模型的方法可以有效地对语言族进行建模，其中一类特征向量的分布近似于高斯分布的和。结果提供了某些印度语言和机器学习过程在该领域的适用性的有趣见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2012 4th International Conference on Intelligent Human Computer Interaction (IHCI)

自引率

0.00%

发文量