Alexander Wurst, Michael Hopwood, Sifan Wu, Fei Li, Yuan Yao
{"title":"深度学习在人类语言中的情感检测:音频样本持续时间和英语与意大利语的影响","authors":"Alexander Wurst, Michael Hopwood, Sifan Wu, Fei Li, Yuan Yao","doi":"10.1109/WOCC58016.2023.10139686","DOIUrl":null,"url":null,"abstract":"Identification of emotion types is important in the diagnosis and treatment of certain mental illnesses. This study uses audio data and deep learning methods such as convolutional neural networks (CNN) and long short-term memory (LSTM) to classify the emotion of human speech. We use the IEMOCAP and DEMoS datasets, consisting of English and Italian audio speech data in our experiments to classify speech into one of up to four emotions: angry, happy, neutral, and sad. The classification performance results demonstrate the effectiveness of the deep learning methods and our experiments yield between 62 and 92 percent classification accuracies. We specifically investigate the impact of the audio sample duration on the classification accuracy. In addition, we examine and compare the classification accuracy for English versus Italian languages.","PeriodicalId":226792,"journal":{"name":"2023 32nd Wireless and Optical Communications Conference (WOCC)","volume":"163 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep Learning for the Detection of Emotion in Human Speech: The Impact of Audio Sample Duration and English versus Italian Languages\",\"authors\":\"Alexander Wurst, Michael Hopwood, Sifan Wu, Fei Li, Yuan Yao\",\"doi\":\"10.1109/WOCC58016.2023.10139686\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Identification of emotion types is important in the diagnosis and treatment of certain mental illnesses. This study uses audio data and deep learning methods such as convolutional neural networks (CNN) and long short-term memory (LSTM) to classify the emotion of human speech. We use the IEMOCAP and DEMoS datasets, consisting of English and Italian audio speech data in our experiments to classify speech into one of up to four emotions: angry, happy, neutral, and sad. The classification performance results demonstrate the effectiveness of the deep learning methods and our experiments yield between 62 and 92 percent classification accuracies. We specifically investigate the impact of the audio sample duration on the classification accuracy. In addition, we examine and compare the classification accuracy for English versus Italian languages.\",\"PeriodicalId\":226792,\"journal\":{\"name\":\"2023 32nd Wireless and Optical Communications Conference (WOCC)\",\"volume\":\"163 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 32nd Wireless and Optical Communications Conference (WOCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WOCC58016.2023.10139686\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 32nd Wireless and Optical Communications Conference (WOCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WOCC58016.2023.10139686","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Deep Learning for the Detection of Emotion in Human Speech: The Impact of Audio Sample Duration and English versus Italian Languages
Identification of emotion types is important in the diagnosis and treatment of certain mental illnesses. This study uses audio data and deep learning methods such as convolutional neural networks (CNN) and long short-term memory (LSTM) to classify the emotion of human speech. We use the IEMOCAP and DEMoS datasets, consisting of English and Italian audio speech data in our experiments to classify speech into one of up to four emotions: angry, happy, neutral, and sad. The classification performance results demonstrate the effectiveness of the deep learning methods and our experiments yield between 62 and 92 percent classification accuracies. We specifically investigate the impact of the audio sample duration on the classification accuracy. In addition, we examine and compare the classification accuracy for English versus Italian languages.