通过融合音频、视频和文本的高、低水平特征来评估抑郁症

Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge Pub Date : 2016-10-16 DOI:10.1145/2988257.2988266

A. Pampouchidou, Olympia Simantiraki, Amir Fazlollahi, M. Pediaditis, D. Manousos, A. Roniotis, G. Giannakakis, F. Mériaudeau, P. Simos, K. Marias, Fan Yang, M. Tsiknakis

{"title":"通过融合音频、视频和文本的高、低水平特征来评估抑郁症","authors":"A. Pampouchidou, Olympia Simantiraki, Amir Fazlollahi, M. Pediaditis, D. Manousos, A. Roniotis, G. Giannakakis, F. Mériaudeau, P. Simos, K. Marias, Fan Yang, M. Tsiknakis","doi":"10.1145/2988257.2988266","DOIUrl":null,"url":null,"abstract":"Depression is a major cause of disability world-wide. The present paper reports on the results of our participation to the depression sub-challenge of the sixth Audio/Visual Emotion Challenge (AVEC 2016), which was designed to compare feature modalities (audio, visual, interview transcript-based) in gender-based and gender-independent modes using a variety of classification algorithms. In our approach, both high and low level features were assessed in each modality. Audio features were extracted from the low-level descriptors provided by the challenge organizers. Several visual features were extracted and assessed including dynamic characteristics of facial elements (using Landmark Motion History Histograms and Landmark Motion Magnitude), global head motion, and eye blinks. These features were combined with statistically derived features from pre-extracted features (emotions, action units, gaze, and pose). Both speech rate and word-level semantic content were also evaluated. Classification results are reported using four different classification schemes: i) gender-based models for each individual modality, ii) the feature fusion model, ii) the decision fusion model, and iv) the posterior probability classification model. Proposed approaches outperforming the reference classification accuracy include the one utilizing statistical descriptors of low-level audio features. This approach achieved f1-scores of 0.59 for identifying depressed and 0.87 for identifying not-depressed individuals on the development set and 0.52/0.81, respectively for the test set.","PeriodicalId":432793,"journal":{"name":"Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"83","resultStr":"{\"title\":\"Depression Assessment by Fusing High and Low Level Features from Audio, Video, and Text\",\"authors\":\"A. Pampouchidou, Olympia Simantiraki, Amir Fazlollahi, M. Pediaditis, D. Manousos, A. Roniotis, G. Giannakakis, F. Mériaudeau, P. Simos, K. Marias, Fan Yang, M. Tsiknakis\",\"doi\":\"10.1145/2988257.2988266\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Depression is a major cause of disability world-wide. The present paper reports on the results of our participation to the depression sub-challenge of the sixth Audio/Visual Emotion Challenge (AVEC 2016), which was designed to compare feature modalities (audio, visual, interview transcript-based) in gender-based and gender-independent modes using a variety of classification algorithms. In our approach, both high and low level features were assessed in each modality. Audio features were extracted from the low-level descriptors provided by the challenge organizers. Several visual features were extracted and assessed including dynamic characteristics of facial elements (using Landmark Motion History Histograms and Landmark Motion Magnitude), global head motion, and eye blinks. These features were combined with statistically derived features from pre-extracted features (emotions, action units, gaze, and pose). Both speech rate and word-level semantic content were also evaluated. Classification results are reported using four different classification schemes: i) gender-based models for each individual modality, ii) the feature fusion model, ii) the decision fusion model, and iv) the posterior probability classification model. Proposed approaches outperforming the reference classification accuracy include the one utilizing statistical descriptors of low-level audio features. This approach achieved f1-scores of 0.59 for identifying depressed and 0.87 for identifying not-depressed individuals on the development set and 0.52/0.81, respectively for the test set.\",\"PeriodicalId\":432793,\"journal\":{\"name\":\"Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"83\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2988257.2988266\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2988257.2988266","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 83

摘要

抑郁症是全世界致残的一个主要原因。本文报告了我们参与第六届视听情感挑战赛(AVEC 2016)抑郁子挑战的结果，该挑战赛旨在使用各种分类算法比较基于性别和性别独立模式下的特征模式(音频、视觉、基于采访转录的)。在我们的方法中，每个模态都评估了高水平和低水平特征。音频特征是从挑战赛组织者提供的低级描述符中提取的。提取并评估了几个视觉特征，包括面部元素的动态特征(使用地标运动历史直方图和地标运动幅度)、头部整体运动和眨眼。这些特征与从预提取的特征(情绪、动作单位、凝视和姿势)中统计得出的特征相结合。同时对语速和词级语义内容进行了评估。分类结果报告使用四种不同的分类方案:i)每个个体模态的基于性别的模型，ii)特征融合模型，ii)决策融合模型和iv)后验概率分类模型。提出的优于参考分类精度的方法包括利用低级音频特征的统计描述符的方法。该方法在发展集上识别抑郁个体的f1得分为0.59，识别非抑郁个体的f1得分为0.87，测试集的f1得分为0.52/0.81。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Depression Assessment by Fusing High and Low Level Features from Audio, Video, and Text

Depression is a major cause of disability world-wide. The present paper reports on the results of our participation to the depression sub-challenge of the sixth Audio/Visual Emotion Challenge (AVEC 2016), which was designed to compare feature modalities (audio, visual, interview transcript-based) in gender-based and gender-independent modes using a variety of classification algorithms. In our approach, both high and low level features were assessed in each modality. Audio features were extracted from the low-level descriptors provided by the challenge organizers. Several visual features were extracted and assessed including dynamic characteristics of facial elements (using Landmark Motion History Histograms and Landmark Motion Magnitude), global head motion, and eye blinks. These features were combined with statistically derived features from pre-extracted features (emotions, action units, gaze, and pose). Both speech rate and word-level semantic content were also evaluated. Classification results are reported using four different classification schemes: i) gender-based models for each individual modality, ii) the feature fusion model, ii) the decision fusion model, and iv) the posterior probability classification model. Proposed approaches outperforming the reference classification accuracy include the one utilizing statistical descriptors of low-level audio features. This approach achieved f1-scores of 0.59 for identifying depressed and 0.87 for identifying not-depressed individuals on the development set and 0.52/0.81, respectively for the test set.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge

自引率

0.00%

发文量