Fusing active orientation models and mid-term audio features for automatic depression estimation

Proceedings of the 9th ACM International Conference on PErvasive Technologies Related to Assistive Environments Pub Date : 2016-06-29 DOI:10.1145/2910674.2935856

C. Smailis, N. Sarafianos, Theodoros Giannakopoulos, S. Perantonis

引用次数: 3

Abstract

In this paper, we predict a human's depression level in the BDI-II scale, using facial and voice features. Active orientation models (AOM) and several voice features were extracted from the video and audio modalities. Long-term and mid-term features were computed and a fusion is performed in the feature space. Videos from the Depression Recognition Sub-Challenge of the 2014 Audio-Visual Emotion Challenge and Workshop (AVEC 2014) were used and support vector regression models were trained to predict the depression level. We demonstrated that the fusion of AOMs with audio features leads to better performance compared to individual modalities. The obtained regression results indicate the robustness of the proposed technique, under different settings, as well as an RMSE improvement compared to the AVEC 2014 video baseline.

查看原文本刊更多论文

融合主动方向模型和中期音频特征的自动洼地估计

在本文中，我们利用面部和声音特征来预测一个人在BDI-II量表中的抑郁程度。从视频和音频模态中提取了主动定向模型(AOM)和若干语音特征。计算长期和中期特征，并在特征空间中进行融合。使用2014视听情感挑战与研讨会(AVEC 2014)抑郁识别子挑战的视频，并训练支持向量回归模型来预测抑郁水平。我们证明了与单个模态相比，AOMs与音频特征的融合可以带来更好的性能。得到的回归结果表明，在不同设置下，所提出的技术具有鲁棒性，并且与AVEC 2014视频基线相比，RMSE有所提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 9th ACM International Conference on PErvasive Technologies Related to Assistive Environments

自引率

0.00%

发文量