Using the Bag-of-Audio-Words approach for emotion recognition

IF 0.3 Q4 COMPUTER SCIENCE, THEORY & METHODS

Acta Universitatis Sapientiae Informatica Pub Date : 2022-08-01 DOI:10.2478/ausi-2022-0001

Mercedes Vetráb, G. Gosztolya

{"title":"Using the Bag-of-Audio-Words approach for emotion recognition","authors":"Mercedes Vetráb, G. Gosztolya","doi":"10.2478/ausi-2022-0001","DOIUrl":null,"url":null,"abstract":"Abstract The problem of varying length recordings is a well-known issue in paralinguistics. We investigated how to resolve this problem using the bag-of-audio-words feature extraction approach. The steps of this technique involve preprocessing, clustering, quantization and normalization. The bag-of-audio-words technique is competitive in the area of speech emotion recognition, but the method has several parameters that need to be precisely tuned for good efficiency. The main aim of our study was to analyse the effectiveness of bag-of-audio-words method and try to find the best parameter values for emotion recognition. We optimized the parameters one-by-one, but built on the results of each other. We performed the feature extraction, using openSMILE. Next we transformed our features into same-sized vectors with openXBOW, and finally trained and evaluated SVM models with 10-fold-crossvalidation and UAR. In our experiments, we worked with a Hungarian emotion database. According to our results, the emotion classification performance improves with the bag-of-audio-words feature representation. Not all the BoAW parameters have the optimal settings but later we can make clear recommendations on how to set bag-of-audio-words parameters for emotion detection tasks.","PeriodicalId":41480,"journal":{"name":"Acta Universitatis Sapientiae Informatica","volume":"1 1","pages":"1 - 21"},"PeriodicalIF":0.3000,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Acta Universitatis Sapientiae Informatica","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/ausi-2022-0001","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Abstract The problem of varying length recordings is a well-known issue in paralinguistics. We investigated how to resolve this problem using the bag-of-audio-words feature extraction approach. The steps of this technique involve preprocessing, clustering, quantization and normalization. The bag-of-audio-words technique is competitive in the area of speech emotion recognition, but the method has several parameters that need to be precisely tuned for good efficiency. The main aim of our study was to analyse the effectiveness of bag-of-audio-words method and try to find the best parameter values for emotion recognition. We optimized the parameters one-by-one, but built on the results of each other. We performed the feature extraction, using openSMILE. Next we transformed our features into same-sized vectors with openXBOW, and finally trained and evaluated SVM models with 10-fold-crossvalidation and UAR. In our experiments, we worked with a Hungarian emotion database. According to our results, the emotion classification performance improves with the bag-of-audio-words feature representation. Not all the BoAW parameters have the optimal settings but later we can make clear recommendations on how to set bag-of-audio-words parameters for emotion detection tasks.

查看原文本刊更多论文

使用音频词袋方法进行情绪识别

变长记录问题是副语言学中一个众所周知的问题。我们研究了如何使用音频词袋特征提取方法来解决这个问题。该技术的步骤包括预处理、聚类、量化和归一化。音频词袋技术在语音情感识别领域具有竞争力，但该方法有几个参数需要精确调整才能获得良好的效率。本研究的主要目的是分析音频词袋方法的有效性，并试图找到情感识别的最佳参数值。我们逐个优化参数，但都是建立在彼此的结果之上。我们使用openSMILE进行特征提取。接下来，我们使用openXBOW将我们的特征转换成相同大小的向量，最后使用10倍交叉验证和UAR训练和评估SVM模型。在我们的实验中，我们使用了匈牙利情绪数据库。根据我们的研究结果，音频词袋特征表示提高了情感分类性能。并不是所有的BoAW参数都有最佳的设置，但是以后我们可以对如何设置bag-of-audio-words参数进行情绪检测任务给出明确的建议。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Acta Universitatis Sapientiae Informatica COMPUTER SCIENCE, THEORY & METHODS-

自引率

0.00%

发文量