印度主要语系的自动识别

D. Sengupta, G. Saha
{"title":"印度主要语系的自动识别","authors":"D. Sengupta, G. Saha","doi":"10.1109/IHCI.2012.6481844","DOIUrl":null,"url":null,"abstract":"India is a vast country with a large number of languages. Among these some languages descend from a single mother language giving rise to a language family. The major official languages in India fall under two language families namely Indo-European and Dravidian. In this paper, we have discussed about a system which takes speech file as input and identifies the language family to which it belongs. We also used this system to find out the influence of Dravidian family on Indo-European family. The system uses a combination of Mel Frequency Cepstral Coefficients (MFCC) and Shifted Delta Coefficients (SDC) as language specific features. Presently, SDC is the most popular feature for language identification. It captures temporal information of speech over a broad range of time. Gaussian Mixture Model based approach is used to effectively model the language families where the distribution of feature vector of a class is approximated using sum of Gaussians. The results give interesting insights of certain Indian languages and applicability of machine learning process in this domain.","PeriodicalId":107245,"journal":{"name":"2012 4th International Conference on Intelligent Human Computer Interaction (IHCI)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Automatic recognition of major language families in India\",\"authors\":\"D. Sengupta, G. Saha\",\"doi\":\"10.1109/IHCI.2012.6481844\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"India is a vast country with a large number of languages. Among these some languages descend from a single mother language giving rise to a language family. The major official languages in India fall under two language families namely Indo-European and Dravidian. In this paper, we have discussed about a system which takes speech file as input and identifies the language family to which it belongs. We also used this system to find out the influence of Dravidian family on Indo-European family. The system uses a combination of Mel Frequency Cepstral Coefficients (MFCC) and Shifted Delta Coefficients (SDC) as language specific features. Presently, SDC is the most popular feature for language identification. It captures temporal information of speech over a broad range of time. Gaussian Mixture Model based approach is used to effectively model the language families where the distribution of feature vector of a class is approximated using sum of Gaussians. The results give interesting insights of certain Indian languages and applicability of machine learning process in this domain.\",\"PeriodicalId\":107245,\"journal\":{\"name\":\"2012 4th International Conference on Intelligent Human Computer Interaction (IHCI)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 4th International Conference on Intelligent Human Computer Interaction (IHCI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IHCI.2012.6481844\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 4th International Conference on Intelligent Human Computer Interaction (IHCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IHCI.2012.6481844","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

印度是一个幅员辽阔的国家,有很多语言。在这些语言中,有些是从单一的母语演变而来的,形成了一个语系。印度的主要官方语言分为两个语系,即印欧语和德拉威语。本文讨论了一种以语音文件为输入并识别其所属语族的系统。我们也用这个系统来研究德拉威人家族对印欧人家族的影响。该系统使用Mel频率倒谱系数(MFCC)和移位δ系数(SDC)的组合作为语言特定特征。目前,SDC是最流行的语言识别特性。它在很长一段时间内捕捉语音的时间信息。基于高斯混合模型的方法可以有效地对语言族进行建模,其中一类特征向量的分布近似于高斯分布的和。结果提供了某些印度语言和机器学习过程在该领域的适用性的有趣见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Automatic recognition of major language families in India
India is a vast country with a large number of languages. Among these some languages descend from a single mother language giving rise to a language family. The major official languages in India fall under two language families namely Indo-European and Dravidian. In this paper, we have discussed about a system which takes speech file as input and identifies the language family to which it belongs. We also used this system to find out the influence of Dravidian family on Indo-European family. The system uses a combination of Mel Frequency Cepstral Coefficients (MFCC) and Shifted Delta Coefficients (SDC) as language specific features. Presently, SDC is the most popular feature for language identification. It captures temporal information of speech over a broad range of time. Gaussian Mixture Model based approach is used to effectively model the language families where the distribution of feature vector of a class is approximated using sum of Gaussians. The results give interesting insights of certain Indian languages and applicability of machine learning process in this domain.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信