基于深度颞叶神经网络和多层次判别线索的口语识别

2020 IEEE 3rd International Conference on Information Communication and Signal Processing (ICICSP) Pub Date : 2020-09-01 DOI:10.1109/ICICSP50920.2020.9232093

Linjia Sun

{"title":"基于深度颞叶神经网络和多层次判别线索的口语识别","authors":"Linjia Sun","doi":"10.1109/ICICSP50920.2020.9232093","DOIUrl":null,"url":null,"abstract":"The language cue is an important component in the task of spoken language identification (LID). But it will take a lot of time to align language cue to speech segment by the manual annotation of professional linguists. Instead of annotating the linguistic phonemes, we use the cooccurrence in speech utterances to find the underlying phoneme-like speech units by unsupervised means. Then, we model phonotactic constraint on the set of phoneme-like units for finding the larger speech segments called the suprasegmental phonemes, and extract the multi-levels language cues from them, including phonetic, phonotactic and prosodic. Further, a novel LID system is proposed based on the architecture of TDNN followed by LSTM-RNN. The proposed LID system is built and compared with the acoustic feature based methods and the phonetic feature based methods on the task of NIST LRE07 and Arabic dialect identification. The experimental results show that our LID system helps to capture robust discriminative information for short duration language identification and high accuracy for dialect identification.","PeriodicalId":117760,"journal":{"name":"2020 IEEE 3rd International Conference on Information Communication and Signal Processing (ICICSP)","volume":"430 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Spoken Language Identification with Deep Temporal Neural Network and Multi-levels Discriminative Cues\",\"authors\":\"Linjia Sun\",\"doi\":\"10.1109/ICICSP50920.2020.9232093\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The language cue is an important component in the task of spoken language identification (LID). But it will take a lot of time to align language cue to speech segment by the manual annotation of professional linguists. Instead of annotating the linguistic phonemes, we use the cooccurrence in speech utterances to find the underlying phoneme-like speech units by unsupervised means. Then, we model phonotactic constraint on the set of phoneme-like units for finding the larger speech segments called the suprasegmental phonemes, and extract the multi-levels language cues from them, including phonetic, phonotactic and prosodic. Further, a novel LID system is proposed based on the architecture of TDNN followed by LSTM-RNN. The proposed LID system is built and compared with the acoustic feature based methods and the phonetic feature based methods on the task of NIST LRE07 and Arabic dialect identification. The experimental results show that our LID system helps to capture robust discriminative information for short duration language identification and high accuracy for dialect identification.\",\"PeriodicalId\":117760,\"journal\":{\"name\":\"2020 IEEE 3rd International Conference on Information Communication and Signal Processing (ICICSP)\",\"volume\":\"430 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE 3rd International Conference on Information Communication and Signal Processing (ICICSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICICSP50920.2020.9232093\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 3rd International Conference on Information Communication and Signal Processing (ICICSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICSP50920.2020.9232093","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

语言线索是口语识别任务的重要组成部分。但是，通过专业语言学家的手工注释，将语言线索与语音片段对齐需要花费大量时间。我们不是对语言音素进行标注，而是利用语音话语中的共现现象，通过无监督的方法来寻找潜在的类音素语音单位。然后，我们在类音素单元集上建立音致约束模型，用于寻找更大的语音片段(称为超音段音素)，并从中提取多层次的语言线索，包括语音、音致和韵律。在此基础上，提出了一种基于TDNN和LSTM-RNN的LID系统。以NIST LRE07和阿拉伯语方言识别为任务，建立了基于声学特征和语音特征的LID系统，并与基于声学特征和语音特征的方法进行了比较。实验结果表明，该系统能够较好地捕获短时间语言识别的鲁棒性判别信息，并具有较高的方言识别准确率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Spoken Language Identification with Deep Temporal Neural Network and Multi-levels Discriminative Cues

The language cue is an important component in the task of spoken language identification (LID). But it will take a lot of time to align language cue to speech segment by the manual annotation of professional linguists. Instead of annotating the linguistic phonemes, we use the cooccurrence in speech utterances to find the underlying phoneme-like speech units by unsupervised means. Then, we model phonotactic constraint on the set of phoneme-like units for finding the larger speech segments called the suprasegmental phonemes, and extract the multi-levels language cues from them, including phonetic, phonotactic and prosodic. Further, a novel LID system is proposed based on the architecture of TDNN followed by LSTM-RNN. The proposed LID system is built and compared with the acoustic feature based methods and the phonetic feature based methods on the task of NIST LRE07 and Arabic dialect identification. The experimental results show that our LID system helps to capture robust discriminative information for short duration language identification and high accuracy for dialect identification.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE 3rd International Conference on Information Communication and Signal Processing (ICICSP)

自引率

0.00%

发文量