{"title":"基于图的语音情感识别特征研究","authors":"A. Pentari, George P. Kafentzis, M. Tsiknakis","doi":"10.1109/BHI56158.2022.9926795","DOIUrl":null,"url":null,"abstract":"During the last decades, automatic speech emotion recognition (SER) has gained an increased interest by the research community. Specifically, SER aims to recognize the emotional state of a speaker directly from a speech recording. The most prominent approaches in the literature include feature extraction of speech signals in time and/or frequency domain that are successively applied as input into a classification scheme. In this paper, we propose to exploit graph theory and structures as alternative forms of speech representations. We suggest applying the so-called Visibility Graph (VG) theory to represent speech data using an adjacency matrix and extract well-known graph-based features from the latter. Finally, these features are fed into a Support Vector Machine (SVM) classifier in a leave-one-speaker-out, multi-class fashion. Our proposed feature set is compared with a well-known acoustic feature set named the Geneva Minimalistic Acoustic Parameter Set (GeMAPS). We test both approaches on two publicly available speech datasets: SAVEE and EMOVO. The experimental results show that the proposed graph-based features provide better results, namely a classification accuracy of 70% and 98%, respectively, yielding an increase by 29.2% and 60.6%, respectively, when compared to GeMAPS.","PeriodicalId":347210,"journal":{"name":"2022 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Investigating Graph-based Features for Speech Emotion Recognition\",\"authors\":\"A. Pentari, George P. Kafentzis, M. Tsiknakis\",\"doi\":\"10.1109/BHI56158.2022.9926795\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"During the last decades, automatic speech emotion recognition (SER) has gained an increased interest by the research community. Specifically, SER aims to recognize the emotional state of a speaker directly from a speech recording. The most prominent approaches in the literature include feature extraction of speech signals in time and/or frequency domain that are successively applied as input into a classification scheme. In this paper, we propose to exploit graph theory and structures as alternative forms of speech representations. We suggest applying the so-called Visibility Graph (VG) theory to represent speech data using an adjacency matrix and extract well-known graph-based features from the latter. Finally, these features are fed into a Support Vector Machine (SVM) classifier in a leave-one-speaker-out, multi-class fashion. Our proposed feature set is compared with a well-known acoustic feature set named the Geneva Minimalistic Acoustic Parameter Set (GeMAPS). We test both approaches on two publicly available speech datasets: SAVEE and EMOVO. The experimental results show that the proposed graph-based features provide better results, namely a classification accuracy of 70% and 98%, respectively, yielding an increase by 29.2% and 60.6%, respectively, when compared to GeMAPS.\",\"PeriodicalId\":347210,\"journal\":{\"name\":\"2022 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BHI56158.2022.9926795\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BHI56158.2022.9926795","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Investigating Graph-based Features for Speech Emotion Recognition
During the last decades, automatic speech emotion recognition (SER) has gained an increased interest by the research community. Specifically, SER aims to recognize the emotional state of a speaker directly from a speech recording. The most prominent approaches in the literature include feature extraction of speech signals in time and/or frequency domain that are successively applied as input into a classification scheme. In this paper, we propose to exploit graph theory and structures as alternative forms of speech representations. We suggest applying the so-called Visibility Graph (VG) theory to represent speech data using an adjacency matrix and extract well-known graph-based features from the latter. Finally, these features are fed into a Support Vector Machine (SVM) classifier in a leave-one-speaker-out, multi-class fashion. Our proposed feature set is compared with a well-known acoustic feature set named the Geneva Minimalistic Acoustic Parameter Set (GeMAPS). We test both approaches on two publicly available speech datasets: SAVEE and EMOVO. The experimental results show that the proposed graph-based features provide better results, namely a classification accuracy of 70% and 98%, respectively, yielding an increase by 29.2% and 60.6%, respectively, when compared to GeMAPS.