基于多尺度多区域面部判别表征的抑郁水平自动预测

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2021-06-06 DOI:10.1109/ICASSP39728.2021.9413504

Mingyue Niu, J. Tao, B. Liu

{"title":"基于多尺度多区域面部判别表征的抑郁水平自动预测","authors":"Mingyue Niu, J. Tao, B. Liu","doi":"10.1109/ICASSP39728.2021.9413504","DOIUrl":null,"url":null,"abstract":"Physiological studies have shown that differences in facial activities between depressed patients and normal individuals are manifested in different local facial regions and the durations of these activities are not the same. But most previous works extract features from the entire facial region at a fixed time scale to predict the individual depression level. Thus, they are inadequate in capturing dynamic facial changes. For these reasons, we propose a multi-scale and multi-region fa-cial dynamic representation method to improve the prediction performance. In particular, we firstly use multiple time scales to divide the original long-term video into segments containing different facial regions. Secondly, the segment-level feature is extracted by 3D convolution neural network to characterize the facial activities with different durations in different facial regions. Thirdly, this paper adopts eigen evolution pooling and gradient boosting decision tree to aggregate these segment-level features and select discriminative elements to generate the video-level feature. Finally, the depression level is predicted using support vector regression. Experiments are conducted on AVEC2013 and AVEC2014. The results demonstrate that our method achieves better performance than the previous works.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Multi-Scale and Multi-Region Facial Discriminative Representation for Automatic Depression Level Prediction\",\"authors\":\"Mingyue Niu, J. Tao, B. Liu\",\"doi\":\"10.1109/ICASSP39728.2021.9413504\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Physiological studies have shown that differences in facial activities between depressed patients and normal individuals are manifested in different local facial regions and the durations of these activities are not the same. But most previous works extract features from the entire facial region at a fixed time scale to predict the individual depression level. Thus, they are inadequate in capturing dynamic facial changes. For these reasons, we propose a multi-scale and multi-region fa-cial dynamic representation method to improve the prediction performance. In particular, we firstly use multiple time scales to divide the original long-term video into segments containing different facial regions. Secondly, the segment-level feature is extracted by 3D convolution neural network to characterize the facial activities with different durations in different facial regions. Thirdly, this paper adopts eigen evolution pooling and gradient boosting decision tree to aggregate these segment-level features and select discriminative elements to generate the video-level feature. Finally, the depression level is predicted using support vector regression. Experiments are conducted on AVEC2013 and AVEC2014. The results demonstrate that our method achieves better performance than the previous works.\",\"PeriodicalId\":347060,\"journal\":{\"name\":\"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"volume\":\"44 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP39728.2021.9413504\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP39728.2021.9413504","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

摘要

生理学研究表明，抑郁症患者与正常人面部活动的差异表现在不同的局部面部区域，这些活动的持续时间也不相同。但以往的研究大多是在固定的时间尺度上提取整个面部区域的特征来预测个体的抑郁程度。因此，它们在捕捉动态面部变化方面是不够的。为此，我们提出了一种多尺度、多区域的人脸动态表示方法来提高预测性能。特别是，我们首先使用多个时间尺度将原始的长期视频分割成包含不同面部区域的片段。其次，利用三维卷积神经网络提取片段级特征，表征不同面部区域不同持续时间的面部活动;再次，采用特征进化池和梯度增强决策树对这些片段级特征进行聚合，选择判别元素生成视频级特征。最后，利用支持向量回归对抑郁程度进行预测。在AVEC2013和AVEC2014上进行了实验。结果表明，该方法取得了较好的效果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multi-Scale and Multi-Region Facial Discriminative Representation for Automatic Depression Level Prediction

Physiological studies have shown that differences in facial activities between depressed patients and normal individuals are manifested in different local facial regions and the durations of these activities are not the same. But most previous works extract features from the entire facial region at a fixed time scale to predict the individual depression level. Thus, they are inadequate in capturing dynamic facial changes. For these reasons, we propose a multi-scale and multi-region fa-cial dynamic representation method to improve the prediction performance. In particular, we firstly use multiple time scales to divide the original long-term video into segments containing different facial regions. Secondly, the segment-level feature is extracted by 3D convolution neural network to characterize the facial activities with different durations in different facial regions. Thirdly, this paper adopts eigen evolution pooling and gradient boosting decision tree to aggregate these segment-level features and select discriminative elements to generate the video-level feature. Finally, the depression level is predicted using support vector regression. Experiments are conducted on AVEC2013 and AVEC2014. The results demonstrate that our method achieves better performance than the previous works.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

自引率

0.00%

发文量