{"title":"基于非统一区域特征的语言自动识别","authors":"Greeshma Unnikrishnan, A. George, L. Mary","doi":"10.1109/ICMSS53060.2021.9673629","DOIUrl":null,"url":null,"abstract":"An audio utterance can be identified as being spoken in a particular language by using automatic language identification (LID). Each language has its own phoneme set. Hence combination of these phonemes governed by phonotactics will help in distinguishing languages. In this work, we propose an automatic language identification system utilizing features derived from non-uniform speech regions to represent phonotac-tic differences among 4 Indian languages, namely Malayalam, Marathi, Assamese, and Kannada. For this, broad phoneme labels, namely approximant (A), closure (C), fricatives (F), nasals (N), plosive/stop (P), voiced stop (B), vowels (V), and silence (S) are obtained automatically by a broad phoneme classifier (BPC). It is a DNN-based classifier which uses hand-crafted features and Mel-frequency cepstral coefficients (MFCC). In order to automatically segment speech to smaller regions, first it is chopped at every silence regions using the labels obtained from BPC. Later, it is split again at the end of each vowel. Hence, small non-uniform regions are obtained which contain phoneme combinations that may be specific to the language of the utterance. From each region, only a fixed number of frames containing certain combination of phonemes are selected. A DNN classifier is trained using 13-dimensional MFCC features of 12 fixed frames of non-uniform regions for performing LID. An average accuracy of 97.03% is obtained for test utterances of 10 sec duration belonging to 4 languages.","PeriodicalId":274597,"journal":{"name":"2021 Fourth International Conference on Microelectronics, Signals & Systems (ICMSS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Non-uniform Region Based Features for Automatic Language Identification\",\"authors\":\"Greeshma Unnikrishnan, A. George, L. Mary\",\"doi\":\"10.1109/ICMSS53060.2021.9673629\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"An audio utterance can be identified as being spoken in a particular language by using automatic language identification (LID). Each language has its own phoneme set. Hence combination of these phonemes governed by phonotactics will help in distinguishing languages. In this work, we propose an automatic language identification system utilizing features derived from non-uniform speech regions to represent phonotac-tic differences among 4 Indian languages, namely Malayalam, Marathi, Assamese, and Kannada. For this, broad phoneme labels, namely approximant (A), closure (C), fricatives (F), nasals (N), plosive/stop (P), voiced stop (B), vowels (V), and silence (S) are obtained automatically by a broad phoneme classifier (BPC). It is a DNN-based classifier which uses hand-crafted features and Mel-frequency cepstral coefficients (MFCC). In order to automatically segment speech to smaller regions, first it is chopped at every silence regions using the labels obtained from BPC. Later, it is split again at the end of each vowel. Hence, small non-uniform regions are obtained which contain phoneme combinations that may be specific to the language of the utterance. From each region, only a fixed number of frames containing certain combination of phonemes are selected. A DNN classifier is trained using 13-dimensional MFCC features of 12 fixed frames of non-uniform regions for performing LID. An average accuracy of 97.03% is obtained for test utterances of 10 sec duration belonging to 4 languages.\",\"PeriodicalId\":274597,\"journal\":{\"name\":\"2021 Fourth International Conference on Microelectronics, Signals & Systems (ICMSS)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 Fourth International Conference on Microelectronics, Signals & Systems (ICMSS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMSS53060.2021.9673629\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Fourth International Conference on Microelectronics, Signals & Systems (ICMSS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMSS53060.2021.9673629","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Non-uniform Region Based Features for Automatic Language Identification
An audio utterance can be identified as being spoken in a particular language by using automatic language identification (LID). Each language has its own phoneme set. Hence combination of these phonemes governed by phonotactics will help in distinguishing languages. In this work, we propose an automatic language identification system utilizing features derived from non-uniform speech regions to represent phonotac-tic differences among 4 Indian languages, namely Malayalam, Marathi, Assamese, and Kannada. For this, broad phoneme labels, namely approximant (A), closure (C), fricatives (F), nasals (N), plosive/stop (P), voiced stop (B), vowels (V), and silence (S) are obtained automatically by a broad phoneme classifier (BPC). It is a DNN-based classifier which uses hand-crafted features and Mel-frequency cepstral coefficients (MFCC). In order to automatically segment speech to smaller regions, first it is chopped at every silence regions using the labels obtained from BPC. Later, it is split again at the end of each vowel. Hence, small non-uniform regions are obtained which contain phoneme combinations that may be specific to the language of the utterance. From each region, only a fixed number of frames containing certain combination of phonemes are selected. A DNN classifier is trained using 13-dimensional MFCC features of 12 fixed frames of non-uniform regions for performing LID. An average accuracy of 97.03% is obtained for test utterances of 10 sec duration belonging to 4 languages.