基于图的语音情感识别特征研究

2022 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI) Pub Date : 2022-09-27 DOI:10.1109/BHI56158.2022.9926795

A. Pentari, George P. Kafentzis, M. Tsiknakis

{"title":"基于图的语音情感识别特征研究","authors":"A. Pentari, George P. Kafentzis, M. Tsiknakis","doi":"10.1109/BHI56158.2022.9926795","DOIUrl":null,"url":null,"abstract":"During the last decades, automatic speech emotion recognition (SER) has gained an increased interest by the research community. Specifically, SER aims to recognize the emotional state of a speaker directly from a speech recording. The most prominent approaches in the literature include feature extraction of speech signals in time and/or frequency domain that are successively applied as input into a classification scheme. In this paper, we propose to exploit graph theory and structures as alternative forms of speech representations. We suggest applying the so-called Visibility Graph (VG) theory to represent speech data using an adjacency matrix and extract well-known graph-based features from the latter. Finally, these features are fed into a Support Vector Machine (SVM) classifier in a leave-one-speaker-out, multi-class fashion. Our proposed feature set is compared with a well-known acoustic feature set named the Geneva Minimalistic Acoustic Parameter Set (GeMAPS). We test both approaches on two publicly available speech datasets: SAVEE and EMOVO. The experimental results show that the proposed graph-based features provide better results, namely a classification accuracy of 70% and 98%, respectively, yielding an increase by 29.2% and 60.6%, respectively, when compared to GeMAPS.","PeriodicalId":347210,"journal":{"name":"2022 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Investigating Graph-based Features for Speech Emotion Recognition\",\"authors\":\"A. Pentari, George P. Kafentzis, M. Tsiknakis\",\"doi\":\"10.1109/BHI56158.2022.9926795\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"During the last decades, automatic speech emotion recognition (SER) has gained an increased interest by the research community. Specifically, SER aims to recognize the emotional state of a speaker directly from a speech recording. The most prominent approaches in the literature include feature extraction of speech signals in time and/or frequency domain that are successively applied as input into a classification scheme. In this paper, we propose to exploit graph theory and structures as alternative forms of speech representations. We suggest applying the so-called Visibility Graph (VG) theory to represent speech data using an adjacency matrix and extract well-known graph-based features from the latter. Finally, these features are fed into a Support Vector Machine (SVM) classifier in a leave-one-speaker-out, multi-class fashion. Our proposed feature set is compared with a well-known acoustic feature set named the Geneva Minimalistic Acoustic Parameter Set (GeMAPS). We test both approaches on two publicly available speech datasets: SAVEE and EMOVO. The experimental results show that the proposed graph-based features provide better results, namely a classification accuracy of 70% and 98%, respectively, yielding an increase by 29.2% and 60.6%, respectively, when compared to GeMAPS.\",\"PeriodicalId\":347210,\"journal\":{\"name\":\"2022 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BHI56158.2022.9926795\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BHI56158.2022.9926795","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

在过去的几十年里，自动语音情感识别(SER)越来越受到研究界的关注。具体来说，SER旨在直接从演讲录音中识别说话者的情绪状态。文献中最突出的方法包括在时间和/或频域中提取语音信号的特征，并将其依次作为输入应用于分类方案。在本文中，我们建议利用图论和结构作为语音表示的替代形式。我们建议应用所谓的可见性图(VG)理论，使用邻接矩阵来表示语音数据，并从中提取众所周知的基于图的特征。最后，将这些特征以多类方式输入支持向量机(SVM)分类器。将我们提出的特征集与著名的日内瓦极简声学参数集(GeMAPS)进行了比较。我们在两个公开可用的语音数据集:SAVEE和EMOVO上测试了这两种方法。实验结果表明，所提出的基于图的特征提供了更好的结果，分类准确率分别达到70%和98%，比GeMAPS分别提高了29.2%和60.6%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Investigating Graph-based Features for Speech Emotion Recognition

During the last decades, automatic speech emotion recognition (SER) has gained an increased interest by the research community. Specifically, SER aims to recognize the emotional state of a speaker directly from a speech recording. The most prominent approaches in the literature include feature extraction of speech signals in time and/or frequency domain that are successively applied as input into a classification scheme. In this paper, we propose to exploit graph theory and structures as alternative forms of speech representations. We suggest applying the so-called Visibility Graph (VG) theory to represent speech data using an adjacency matrix and extract well-known graph-based features from the latter. Finally, these features are fed into a Support Vector Machine (SVM) classifier in a leave-one-speaker-out, multi-class fashion. Our proposed feature set is compared with a well-known acoustic feature set named the Geneva Minimalistic Acoustic Parameter Set (GeMAPS). We test both approaches on two publicly available speech datasets: SAVEE and EMOVO. The experimental results show that the proposed graph-based features provide better results, namely a classification accuracy of 70% and 98%, respectively, yielding an increase by 29.2% and 60.6%, respectively, when compared to GeMAPS.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI)

自引率

0.00%

发文量