Fusing active orientation models and mid-term audio features for automatic depression estimation

C. Smailis, N. Sarafianos, Theodoros Giannakopoulos, S. Perantonis
{"title":"Fusing active orientation models and mid-term audio features for automatic depression estimation","authors":"C. Smailis, N. Sarafianos, Theodoros Giannakopoulos, S. Perantonis","doi":"10.1145/2910674.2935856","DOIUrl":null,"url":null,"abstract":"In this paper, we predict a human's depression level in the BDI-II scale, using facial and voice features. Active orientation models (AOM) and several voice features were extracted from the video and audio modalities. Long-term and mid-term features were computed and a fusion is performed in the feature space. Videos from the Depression Recognition Sub-Challenge of the 2014 Audio-Visual Emotion Challenge and Workshop (AVEC 2014) were used and support vector regression models were trained to predict the depression level. We demonstrated that the fusion of AOMs with audio features leads to better performance compared to individual modalities. The obtained regression results indicate the robustness of the proposed technique, under different settings, as well as an RMSE improvement compared to the AVEC 2014 video baseline.","PeriodicalId":359504,"journal":{"name":"Proceedings of the 9th ACM International Conference on PErvasive Technologies Related to Assistive Environments","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 9th ACM International Conference on PErvasive Technologies Related to Assistive Environments","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2910674.2935856","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

In this paper, we predict a human's depression level in the BDI-II scale, using facial and voice features. Active orientation models (AOM) and several voice features were extracted from the video and audio modalities. Long-term and mid-term features were computed and a fusion is performed in the feature space. Videos from the Depression Recognition Sub-Challenge of the 2014 Audio-Visual Emotion Challenge and Workshop (AVEC 2014) were used and support vector regression models were trained to predict the depression level. We demonstrated that the fusion of AOMs with audio features leads to better performance compared to individual modalities. The obtained regression results indicate the robustness of the proposed technique, under different settings, as well as an RMSE improvement compared to the AVEC 2014 video baseline.
融合主动方向模型和中期音频特征的自动洼地估计
在本文中,我们利用面部和声音特征来预测一个人在BDI-II量表中的抑郁程度。从视频和音频模态中提取了主动定向模型(AOM)和若干语音特征。计算长期和中期特征,并在特征空间中进行融合。使用2014视听情感挑战与研讨会(AVEC 2014)抑郁识别子挑战的视频,并训练支持向量回归模型来预测抑郁水平。我们证明了与单个模态相比,AOMs与音频特征的融合可以带来更好的性能。得到的回归结果表明,在不同设置下,所提出的技术具有鲁棒性,并且与AVEC 2014视频基线相比,RMSE有所提高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信