{"title":"Urdu Speech Emotion Recognition using Speech Spectral Features and Deep Learning Techniques","authors":"Soonh Taj, G. Shaikh, Saif Hassan, Nimra","doi":"10.1109/iCoMET57998.2023.10099289","DOIUrl":null,"url":null,"abstract":"Speech Emotion Recognition (SER) is a process for recognizing emotions hidden in speech. The main approaches used for SER include speech signal processing which utilizes acoustic speech features. Much research is being conducted to find emotions from famous and widely spoken languages like English, German, and others. However, SER for low-resource languages is still in the growing phase. In this regard, few authors have worked on SER of low resources languages like Persian, Arabic, Urdu, Punjabi, Pushto, and Sindhi. The existing work has limitations like few publicly available datasets and a lack of robustness in their SER model. This study contributes to developing a robust SER model for the Urdu language, leveraging spectral speech features' power and the latest deep learning techniques based on 1D-CNN (Convolutional Neural Network) architecture to recognize Urdu speech emotions. This study uses the first Urdu language benchmark speech dataset, “URDU”, publicly available for SER research. The effectiveness and robustness of the proposed model are proved from experiments. The proposed model based on 1D-CNN architecture achieved the highest ever accuracy of 97% compared to existing work and improved baseline accuracy for the “URDU” dataset.","PeriodicalId":369792,"journal":{"name":"2023 4th International Conference on Computing, Mathematics and Engineering Technologies (iCoMET)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 4th International Conference on Computing, Mathematics and Engineering Technologies (iCoMET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iCoMET57998.2023.10099289","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Speech Emotion Recognition (SER) is a process for recognizing emotions hidden in speech. The main approaches used for SER include speech signal processing which utilizes acoustic speech features. Much research is being conducted to find emotions from famous and widely spoken languages like English, German, and others. However, SER for low-resource languages is still in the growing phase. In this regard, few authors have worked on SER of low resources languages like Persian, Arabic, Urdu, Punjabi, Pushto, and Sindhi. The existing work has limitations like few publicly available datasets and a lack of robustness in their SER model. This study contributes to developing a robust SER model for the Urdu language, leveraging spectral speech features' power and the latest deep learning techniques based on 1D-CNN (Convolutional Neural Network) architecture to recognize Urdu speech emotions. This study uses the first Urdu language benchmark speech dataset, “URDU”, publicly available for SER research. The effectiveness and robustness of the proposed model are proved from experiments. The proposed model based on 1D-CNN architecture achieved the highest ever accuracy of 97% compared to existing work and improved baseline accuracy for the “URDU” dataset.