{"title":"基于双重注意机制的语言识别研究","authors":"Mijit Ablimit, Ma Xueli, A. Hamdulla","doi":"10.1109/PRML52754.2021.9520699","DOIUrl":null,"url":null,"abstract":"Language identification(LID) is an important branch of speech technology. A key problem of language identification is how to extract effective speech segment representation from a given speech and improve the model performance. In recent years, deep learning has made significant progress in the application of language identification. Neural networks can be used to extract relevant features and effectively improve system performance. In order to solve the problem of poor feature extraction ability and low recognition rate, this paper considers both features and models, through the comparison of features such as MFCC, Fbank to determine spectrogram as the best input feature, and proposes a language identification method based on dual attention mechanism. This method first takes the spectrogram of the speech spectrogram, and converts it into a gray-scale spectrogram as input, uses a multi-level convolutional neural network to capture local features, extracts dual attention in channel and spatial dimension of the feature map through the CBAM module, catches temporal characteristics with bidirectional gated recurrent units, then transfers the local characteristics and timing characteristics jointly to a fully connected layer, and uses the fully connected layer to output language classes. This paper conducts experiments on the Common voice dataset and AP17-OLR dataset, it demonstrates that dual attention mechanism’s language identification method can achieve good results, increase the feature extraction ability and improve the performance of language identification.","PeriodicalId":429603,"journal":{"name":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Language Identification Research Based on Dual Attention Mechanism\",\"authors\":\"Mijit Ablimit, Ma Xueli, A. Hamdulla\",\"doi\":\"10.1109/PRML52754.2021.9520699\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Language identification(LID) is an important branch of speech technology. A key problem of language identification is how to extract effective speech segment representation from a given speech and improve the model performance. In recent years, deep learning has made significant progress in the application of language identification. Neural networks can be used to extract relevant features and effectively improve system performance. In order to solve the problem of poor feature extraction ability and low recognition rate, this paper considers both features and models, through the comparison of features such as MFCC, Fbank to determine spectrogram as the best input feature, and proposes a language identification method based on dual attention mechanism. This method first takes the spectrogram of the speech spectrogram, and converts it into a gray-scale spectrogram as input, uses a multi-level convolutional neural network to capture local features, extracts dual attention in channel and spatial dimension of the feature map through the CBAM module, catches temporal characteristics with bidirectional gated recurrent units, then transfers the local characteristics and timing characteristics jointly to a fully connected layer, and uses the fully connected layer to output language classes. This paper conducts experiments on the Common voice dataset and AP17-OLR dataset, it demonstrates that dual attention mechanism’s language identification method can achieve good results, increase the feature extraction ability and improve the performance of language identification.\",\"PeriodicalId\":429603,\"journal\":{\"name\":\"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-07-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PRML52754.2021.9520699\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PRML52754.2021.9520699","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Language Identification Research Based on Dual Attention Mechanism
Language identification(LID) is an important branch of speech technology. A key problem of language identification is how to extract effective speech segment representation from a given speech and improve the model performance. In recent years, deep learning has made significant progress in the application of language identification. Neural networks can be used to extract relevant features and effectively improve system performance. In order to solve the problem of poor feature extraction ability and low recognition rate, this paper considers both features and models, through the comparison of features such as MFCC, Fbank to determine spectrogram as the best input feature, and proposes a language identification method based on dual attention mechanism. This method first takes the spectrogram of the speech spectrogram, and converts it into a gray-scale spectrogram as input, uses a multi-level convolutional neural network to capture local features, extracts dual attention in channel and spatial dimension of the feature map through the CBAM module, catches temporal characteristics with bidirectional gated recurrent units, then transfers the local characteristics and timing characteristics jointly to a fully connected layer, and uses the fully connected layer to output language classes. This paper conducts experiments on the Common voice dataset and AP17-OLR dataset, it demonstrates that dual attention mechanism’s language identification method can achieve good results, increase the feature extraction ability and improve the performance of language identification.