基于非统一区域特征的语言自动识别

Greeshma Unnikrishnan, A. George, L. Mary
{"title":"基于非统一区域特征的语言自动识别","authors":"Greeshma Unnikrishnan, A. George, L. Mary","doi":"10.1109/ICMSS53060.2021.9673629","DOIUrl":null,"url":null,"abstract":"An audio utterance can be identified as being spoken in a particular language by using automatic language identification (LID). Each language has its own phoneme set. Hence combination of these phonemes governed by phonotactics will help in distinguishing languages. In this work, we propose an automatic language identification system utilizing features derived from non-uniform speech regions to represent phonotac-tic differences among 4 Indian languages, namely Malayalam, Marathi, Assamese, and Kannada. For this, broad phoneme labels, namely approximant (A), closure (C), fricatives (F), nasals (N), plosive/stop (P), voiced stop (B), vowels (V), and silence (S) are obtained automatically by a broad phoneme classifier (BPC). It is a DNN-based classifier which uses hand-crafted features and Mel-frequency cepstral coefficients (MFCC). In order to automatically segment speech to smaller regions, first it is chopped at every silence regions using the labels obtained from BPC. Later, it is split again at the end of each vowel. Hence, small non-uniform regions are obtained which contain phoneme combinations that may be specific to the language of the utterance. From each region, only a fixed number of frames containing certain combination of phonemes are selected. A DNN classifier is trained using 13-dimensional MFCC features of 12 fixed frames of non-uniform regions for performing LID. An average accuracy of 97.03% is obtained for test utterances of 10 sec duration belonging to 4 languages.","PeriodicalId":274597,"journal":{"name":"2021 Fourth International Conference on Microelectronics, Signals & Systems (ICMSS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Non-uniform Region Based Features for Automatic Language Identification\",\"authors\":\"Greeshma Unnikrishnan, A. George, L. Mary\",\"doi\":\"10.1109/ICMSS53060.2021.9673629\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"An audio utterance can be identified as being spoken in a particular language by using automatic language identification (LID). Each language has its own phoneme set. Hence combination of these phonemes governed by phonotactics will help in distinguishing languages. In this work, we propose an automatic language identification system utilizing features derived from non-uniform speech regions to represent phonotac-tic differences among 4 Indian languages, namely Malayalam, Marathi, Assamese, and Kannada. For this, broad phoneme labels, namely approximant (A), closure (C), fricatives (F), nasals (N), plosive/stop (P), voiced stop (B), vowels (V), and silence (S) are obtained automatically by a broad phoneme classifier (BPC). It is a DNN-based classifier which uses hand-crafted features and Mel-frequency cepstral coefficients (MFCC). In order to automatically segment speech to smaller regions, first it is chopped at every silence regions using the labels obtained from BPC. Later, it is split again at the end of each vowel. Hence, small non-uniform regions are obtained which contain phoneme combinations that may be specific to the language of the utterance. From each region, only a fixed number of frames containing certain combination of phonemes are selected. A DNN classifier is trained using 13-dimensional MFCC features of 12 fixed frames of non-uniform regions for performing LID. An average accuracy of 97.03% is obtained for test utterances of 10 sec duration belonging to 4 languages.\",\"PeriodicalId\":274597,\"journal\":{\"name\":\"2021 Fourth International Conference on Microelectronics, Signals & Systems (ICMSS)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 Fourth International Conference on Microelectronics, Signals & Systems (ICMSS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMSS53060.2021.9673629\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Fourth International Conference on Microelectronics, Signals & Systems (ICMSS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMSS53060.2021.9673629","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

通过使用自动语言识别(LID),可以将音频话语识别为以特定语言说话。每种语言都有自己的音素集。因此,这些由语音策略控制的音素组合将有助于区分语言。在这项工作中,我们提出了一个自动语言识别系统,利用来自非均匀语音区域的特征来表示4种印度语言(马拉雅拉姆语、马拉地语、阿萨姆语和卡纳达语)之间的语音差异。为此,广义音素标签,即近音(A)、闭音(C)、摩擦音(F)、鼻音(N)、爆音/顿音(P)、浊音顿音(B)、元音(V)和静音(S)是由广义音素分类器(BPC)自动获得的。它是一种基于dnn的分类器,使用手工制作的特征和mel频率倒谱系数(MFCC)。为了将语音自动分割成更小的区域,首先使用从BPC中获得的标签在每个沉默区域进行切分。之后,它在每个元音的末尾再次分裂。因此,获得了小的非均匀区域,其中包含可能特定于话语语言的音素组合。从每个区域中,只选择固定数量的包含特定音素组合的帧。利用12个非均匀区域的固定帧的13维MFCC特征训练DNN分类器进行LID。对4种语言的时长为10秒的测试话语,平均准确率为97.03%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Non-uniform Region Based Features for Automatic Language Identification
An audio utterance can be identified as being spoken in a particular language by using automatic language identification (LID). Each language has its own phoneme set. Hence combination of these phonemes governed by phonotactics will help in distinguishing languages. In this work, we propose an automatic language identification system utilizing features derived from non-uniform speech regions to represent phonotac-tic differences among 4 Indian languages, namely Malayalam, Marathi, Assamese, and Kannada. For this, broad phoneme labels, namely approximant (A), closure (C), fricatives (F), nasals (N), plosive/stop (P), voiced stop (B), vowels (V), and silence (S) are obtained automatically by a broad phoneme classifier (BPC). It is a DNN-based classifier which uses hand-crafted features and Mel-frequency cepstral coefficients (MFCC). In order to automatically segment speech to smaller regions, first it is chopped at every silence regions using the labels obtained from BPC. Later, it is split again at the end of each vowel. Hence, small non-uniform regions are obtained which contain phoneme combinations that may be specific to the language of the utterance. From each region, only a fixed number of frames containing certain combination of phonemes are selected. A DNN classifier is trained using 13-dimensional MFCC features of 12 fixed frames of non-uniform regions for performing LID. An average accuracy of 97.03% is obtained for test utterances of 10 sec duration belonging to 4 languages.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信