{"title":"面向情感独立语言识别系统","authors":"P. Jain, K. Gurugubelli, A. Vuppala","doi":"10.1109/SPCOM50965.2020.9179550","DOIUrl":null,"url":null,"abstract":"Language Identification (LID) is an integral part of multilingual speech systems. There are various conditions under which the performance of LID systems are sub-optimal, such as short duration, noise, channel variation, and so on. There has been effort to improve performance under these conditions, but the impact of speaker emotion variation on the performance of LID systems has not been studied. It is observed that the performance of LID systems degrade in the presence of emotional mismatch between train and test conditions. To that effect, we investigated adaptation approaches for improving the performance of LID systems by incorporating emotional utterances in form of adaptation dataset. Hence, we studied a prosody modification technique called Flexible Analysis Synthesis Tool (FAST) to vary the emotional characteristics of an utterance in order to improve the performance, but the results were inconsistent and not satisfactory. In this work, we propose a combination of Recurrent Convolutional Neural Network (RCNN) based architecture with multi stage training methodology, which outperformed state-ofart LID systems such as i-vectors, time delay neural network, long short term memory, and deep neural network x-vector.","PeriodicalId":208527,"journal":{"name":"2020 International Conference on Signal Processing and Communications (SPCOM)","volume":"145 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Towards Emotion Independent Language Identification System\",\"authors\":\"P. Jain, K. Gurugubelli, A. Vuppala\",\"doi\":\"10.1109/SPCOM50965.2020.9179550\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Language Identification (LID) is an integral part of multilingual speech systems. There are various conditions under which the performance of LID systems are sub-optimal, such as short duration, noise, channel variation, and so on. There has been effort to improve performance under these conditions, but the impact of speaker emotion variation on the performance of LID systems has not been studied. It is observed that the performance of LID systems degrade in the presence of emotional mismatch between train and test conditions. To that effect, we investigated adaptation approaches for improving the performance of LID systems by incorporating emotional utterances in form of adaptation dataset. Hence, we studied a prosody modification technique called Flexible Analysis Synthesis Tool (FAST) to vary the emotional characteristics of an utterance in order to improve the performance, but the results were inconsistent and not satisfactory. In this work, we propose a combination of Recurrent Convolutional Neural Network (RCNN) based architecture with multi stage training methodology, which outperformed state-ofart LID systems such as i-vectors, time delay neural network, long short term memory, and deep neural network x-vector.\",\"PeriodicalId\":208527,\"journal\":{\"name\":\"2020 International Conference on Signal Processing and Communications (SPCOM)\",\"volume\":\"145 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 International Conference on Signal Processing and Communications (SPCOM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SPCOM50965.2020.9179550\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Signal Processing and Communications (SPCOM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPCOM50965.2020.9179550","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Towards Emotion Independent Language Identification System
Language Identification (LID) is an integral part of multilingual speech systems. There are various conditions under which the performance of LID systems are sub-optimal, such as short duration, noise, channel variation, and so on. There has been effort to improve performance under these conditions, but the impact of speaker emotion variation on the performance of LID systems has not been studied. It is observed that the performance of LID systems degrade in the presence of emotional mismatch between train and test conditions. To that effect, we investigated adaptation approaches for improving the performance of LID systems by incorporating emotional utterances in form of adaptation dataset. Hence, we studied a prosody modification technique called Flexible Analysis Synthesis Tool (FAST) to vary the emotional characteristics of an utterance in order to improve the performance, but the results were inconsistent and not satisfactory. In this work, we propose a combination of Recurrent Convolutional Neural Network (RCNN) based architecture with multi stage training methodology, which outperformed state-ofart LID systems such as i-vectors, time delay neural network, long short term memory, and deep neural network x-vector.