Musatafa Abbas Abbood Albadr, S. Tiun, M. Ayob, Fahad Taha Al-Dhief, Taj-Aldeen Naser Abdali, A. F. Abbas
{"title":"基于情感语音数据的语言自动识别极限学习机","authors":"Musatafa Abbas Abbood Albadr, S. Tiun, M. Ayob, Fahad Taha Al-Dhief, Taj-Aldeen Naser Abdali, A. F. Abbas","doi":"10.1109/ICECCE52056.2021.9514107","DOIUrl":null,"url":null,"abstract":"The technique used for recognizing a language by utilizing pronounced speech is called spoken Language Identification (LID). This field has a high significance in the interaction between human and computer. Besides, it can be implemented in several applications such as call centers, speaker diarization in multilingual environments, and in translation systems using a speech-to-speech manner. However, most studies that used LID systems are used and focused on neutral speech only. Moreover, the application of emotional speech in LID systems is crucial in real applications. Therefore, this study aims to investigate the performance of Extreme Learning Machine (ELM) in LID system by utilizing emotional speech. The system is evaluated based on two different languages (Germany and English language). This study has used the Berlin Emotional Speech Dataset (BESD) for the Germany language while the Ryerson Audio-Visual Dataset of Emotional Speech and Song (RAVDESS) for the English language. Four different evaluation scenarios (All Dataset (AD), Normal-Speech Dependent (N-SD), Gender-Female Dependent (G-FD), and Gender-Male Dependent (G-MD) scenario) have been conducted in order to evaluate the system. The experiments results have shown that the highest performance was achieved an accuracy of 99.08%, 100.00%, 98.22%, and 99.37% for AD, N-SD, G-FD, and G-MD scenario, respectively.","PeriodicalId":302947,"journal":{"name":"2021 International Conference on Electrical, Communication, and Computer Engineering (ICECCE)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Extreme Learning Machine for Automatic Language Identification Utilizing Emotion Speech Data\",\"authors\":\"Musatafa Abbas Abbood Albadr, S. Tiun, M. Ayob, Fahad Taha Al-Dhief, Taj-Aldeen Naser Abdali, A. F. Abbas\",\"doi\":\"10.1109/ICECCE52056.2021.9514107\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The technique used for recognizing a language by utilizing pronounced speech is called spoken Language Identification (LID). This field has a high significance in the interaction between human and computer. Besides, it can be implemented in several applications such as call centers, speaker diarization in multilingual environments, and in translation systems using a speech-to-speech manner. However, most studies that used LID systems are used and focused on neutral speech only. Moreover, the application of emotional speech in LID systems is crucial in real applications. Therefore, this study aims to investigate the performance of Extreme Learning Machine (ELM) in LID system by utilizing emotional speech. The system is evaluated based on two different languages (Germany and English language). This study has used the Berlin Emotional Speech Dataset (BESD) for the Germany language while the Ryerson Audio-Visual Dataset of Emotional Speech and Song (RAVDESS) for the English language. Four different evaluation scenarios (All Dataset (AD), Normal-Speech Dependent (N-SD), Gender-Female Dependent (G-FD), and Gender-Male Dependent (G-MD) scenario) have been conducted in order to evaluate the system. The experiments results have shown that the highest performance was achieved an accuracy of 99.08%, 100.00%, 98.22%, and 99.37% for AD, N-SD, G-FD, and G-MD scenario, respectively.\",\"PeriodicalId\":302947,\"journal\":{\"name\":\"2021 International Conference on Electrical, Communication, and Computer Engineering (ICECCE)\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Electrical, Communication, and Computer Engineering (ICECCE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICECCE52056.2021.9514107\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Electrical, Communication, and Computer Engineering (ICECCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICECCE52056.2021.9514107","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Extreme Learning Machine for Automatic Language Identification Utilizing Emotion Speech Data
The technique used for recognizing a language by utilizing pronounced speech is called spoken Language Identification (LID). This field has a high significance in the interaction between human and computer. Besides, it can be implemented in several applications such as call centers, speaker diarization in multilingual environments, and in translation systems using a speech-to-speech manner. However, most studies that used LID systems are used and focused on neutral speech only. Moreover, the application of emotional speech in LID systems is crucial in real applications. Therefore, this study aims to investigate the performance of Extreme Learning Machine (ELM) in LID system by utilizing emotional speech. The system is evaluated based on two different languages (Germany and English language). This study has used the Berlin Emotional Speech Dataset (BESD) for the Germany language while the Ryerson Audio-Visual Dataset of Emotional Speech and Song (RAVDESS) for the English language. Four different evaluation scenarios (All Dataset (AD), Normal-Speech Dependent (N-SD), Gender-Female Dependent (G-FD), and Gender-Male Dependent (G-MD) scenario) have been conducted in order to evaluate the system. The experiments results have shown that the highest performance was achieved an accuracy of 99.08%, 100.00%, 98.22%, and 99.37% for AD, N-SD, G-FD, and G-MD scenario, respectively.