{"title":"Automatic recognition of major language families in India","authors":"D. Sengupta, G. Saha","doi":"10.1109/IHCI.2012.6481844","DOIUrl":null,"url":null,"abstract":"India is a vast country with a large number of languages. Among these some languages descend from a single mother language giving rise to a language family. The major official languages in India fall under two language families namely Indo-European and Dravidian. In this paper, we have discussed about a system which takes speech file as input and identifies the language family to which it belongs. We also used this system to find out the influence of Dravidian family on Indo-European family. The system uses a combination of Mel Frequency Cepstral Coefficients (MFCC) and Shifted Delta Coefficients (SDC) as language specific features. Presently, SDC is the most popular feature for language identification. It captures temporal information of speech over a broad range of time. Gaussian Mixture Model based approach is used to effectively model the language families where the distribution of feature vector of a class is approximated using sum of Gaussians. The results give interesting insights of certain Indian languages and applicability of machine learning process in this domain.","PeriodicalId":107245,"journal":{"name":"2012 4th International Conference on Intelligent Human Computer Interaction (IHCI)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 4th International Conference on Intelligent Human Computer Interaction (IHCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IHCI.2012.6481844","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
India is a vast country with a large number of languages. Among these some languages descend from a single mother language giving rise to a language family. The major official languages in India fall under two language families namely Indo-European and Dravidian. In this paper, we have discussed about a system which takes speech file as input and identifies the language family to which it belongs. We also used this system to find out the influence of Dravidian family on Indo-European family. The system uses a combination of Mel Frequency Cepstral Coefficients (MFCC) and Shifted Delta Coefficients (SDC) as language specific features. Presently, SDC is the most popular feature for language identification. It captures temporal information of speech over a broad range of time. Gaussian Mixture Model based approach is used to effectively model the language families where the distribution of feature vector of a class is approximated using sum of Gaussians. The results give interesting insights of certain Indian languages and applicability of machine learning process in this domain.