{"title":"无监督类音素序列与TDNN-LSTM-RNN语言识别","authors":"Linjia Sun","doi":"10.1109/ICSP48669.2020.9320919","DOIUrl":null,"url":null,"abstract":"A novel language identification (LID) method is proposed that accepts the architecture of time delay neural network (TDNN) followed by long short term memory (LSTM) recurrent neural network (RNN) to learn long-term phonetic patterns and model the phonetic dynamics for different languages. Instead of the linguistic phonemes, the phoneme-like speech units are used to train the TDNN-LSTM-RNN, which can be found without prior linguistic knowledge and manual transcriptions. Compared with PPRLM, the experiment results show that the phoneme-like speech units by unsupervised discovering and the linguistic phonemes by manual annotation have the same effect in the LID task. Furtherly, the proposed LID method is built and reported the test results on the NIST LRE07 and the task of dialect identification. We compare the proposed LID method with other state-of-the-art methods, including the acoustic feature based LID methods and the phonetic feature based LID methods. The experimental results show that our method provides competitive performance with the existing methods in the LID task. In particular, our method helps to capture robust discriminative information for short duration language identification and high accuracy for dialect identification.","PeriodicalId":237073,"journal":{"name":"2020 15th IEEE International Conference on Signal Processing (ICSP)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Language Identification with Unsupervised Phoneme-like Sequence and TDNN-LSTM-RNN\",\"authors\":\"Linjia Sun\",\"doi\":\"10.1109/ICSP48669.2020.9320919\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A novel language identification (LID) method is proposed that accepts the architecture of time delay neural network (TDNN) followed by long short term memory (LSTM) recurrent neural network (RNN) to learn long-term phonetic patterns and model the phonetic dynamics for different languages. Instead of the linguistic phonemes, the phoneme-like speech units are used to train the TDNN-LSTM-RNN, which can be found without prior linguistic knowledge and manual transcriptions. Compared with PPRLM, the experiment results show that the phoneme-like speech units by unsupervised discovering and the linguistic phonemes by manual annotation have the same effect in the LID task. Furtherly, the proposed LID method is built and reported the test results on the NIST LRE07 and the task of dialect identification. We compare the proposed LID method with other state-of-the-art methods, including the acoustic feature based LID methods and the phonetic feature based LID methods. The experimental results show that our method provides competitive performance with the existing methods in the LID task. In particular, our method helps to capture robust discriminative information for short duration language identification and high accuracy for dialect identification.\",\"PeriodicalId\":237073,\"journal\":{\"name\":\"2020 15th IEEE International Conference on Signal Processing (ICSP)\",\"volume\":\"37 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 15th IEEE International Conference on Signal Processing (ICSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSP48669.2020.9320919\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 15th IEEE International Conference on Signal Processing (ICSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSP48669.2020.9320919","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Language Identification with Unsupervised Phoneme-like Sequence and TDNN-LSTM-RNN
A novel language identification (LID) method is proposed that accepts the architecture of time delay neural network (TDNN) followed by long short term memory (LSTM) recurrent neural network (RNN) to learn long-term phonetic patterns and model the phonetic dynamics for different languages. Instead of the linguistic phonemes, the phoneme-like speech units are used to train the TDNN-LSTM-RNN, which can be found without prior linguistic knowledge and manual transcriptions. Compared with PPRLM, the experiment results show that the phoneme-like speech units by unsupervised discovering and the linguistic phonemes by manual annotation have the same effect in the LID task. Furtherly, the proposed LID method is built and reported the test results on the NIST LRE07 and the task of dialect identification. We compare the proposed LID method with other state-of-the-art methods, including the acoustic feature based LID methods and the phonetic feature based LID methods. The experimental results show that our method provides competitive performance with the existing methods in the LID task. In particular, our method helps to capture robust discriminative information for short duration language identification and high accuracy for dialect identification.