{"title":"Multiclass Language Identification Using CNN-Bigru-Attention Model on Spectrogram of Audio Signals","authors":"Ma Xueli, Mijit Ablimit, A. Hamdulla","doi":"10.1109/PRML52754.2021.9520702","DOIUrl":null,"url":null,"abstract":"Aiming at the problems of low recognition rate and uneven distribution of language information in language identification tasks, a language recognition method based on the CNN-Bigru-Attention model is proposed. This method first extracts the spectrogram of audio signals and converts it into a gray-scale spectrogram as input, then uses CNN (convolutional neural network) to capture the local features, and extracts the temporal features through the Bigru (Bidirectional gated recurrent unit), and then local features and temporal features are passed to the attention mechanism layer to focus on the information related to the language features and suppress useless information. Finally the classes of language is output through the fully connected layer. Experiments on the Common voice dataset show that the method has achieved good results and improves the performance of language identification.","PeriodicalId":429603,"journal":{"name":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","volume":"91 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PRML52754.2021.9520702","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Aiming at the problems of low recognition rate and uneven distribution of language information in language identification tasks, a language recognition method based on the CNN-Bigru-Attention model is proposed. This method first extracts the spectrogram of audio signals and converts it into a gray-scale spectrogram as input, then uses CNN (convolutional neural network) to capture the local features, and extracts the temporal features through the Bigru (Bidirectional gated recurrent unit), and then local features and temporal features are passed to the attention mechanism layer to focus on the information related to the language features and suppress useless information. Finally the classes of language is output through the fully connected layer. Experiments on the Common voice dataset show that the method has achieved good results and improves the performance of language identification.