Yutian Wang, Huan Zhou, Zheng Wang, Jingling Wang, Hui Wang
{"title":"CNN-Based End-To-End Language Identification","authors":"Yutian Wang, Huan Zhou, Zheng Wang, Jingling Wang, Hui Wang","doi":"10.1109/ITNEC.2019.8729388","DOIUrl":null,"url":null,"abstract":"Recently, language identification (LID) on long utterances has archived very low error rate, however, it is still a challenging task under short-duration condition. In this paper, we propose an end-to-end short-duration language identification system based on deep convolutional neural network (DCNN), where the whole network is trained with multi-class cross-entroy loss. Besides, we compare three kinds of input features: Mel-Frequency Cepstral Coefficients (MFCC), log Mel-scale Filter Bank energies (FBANK) and spectrogram energies. The experimental results indicate that spectrogram energies achieves the best performance among them In order to enhance the robustness of system, at the training stage, the databases are augmented by applying time-scale modification (TSM) method. Based on APl 7-OLR databases, under 1-second condition, the proposed system has improved 32.7% than traditional i-vector system, and compared with other neural network systems, it peforms equally well and even better.","PeriodicalId":202966,"journal":{"name":"2019 IEEE 3rd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 3rd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITNEC.2019.8729388","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Recently, language identification (LID) on long utterances has archived very low error rate, however, it is still a challenging task under short-duration condition. In this paper, we propose an end-to-end short-duration language identification system based on deep convolutional neural network (DCNN), where the whole network is trained with multi-class cross-entroy loss. Besides, we compare three kinds of input features: Mel-Frequency Cepstral Coefficients (MFCC), log Mel-scale Filter Bank energies (FBANK) and spectrogram energies. The experimental results indicate that spectrogram energies achieves the best performance among them In order to enhance the robustness of system, at the training stage, the databases are augmented by applying time-scale modification (TSM) method. Based on APl 7-OLR databases, under 1-second condition, the proposed system has improved 32.7% than traditional i-vector system, and compared with other neural network systems, it peforms equally well and even better.