C. Smailis, N. Sarafianos, Theodoros Giannakopoulos, S. Perantonis
{"title":"融合主动方向模型和中期音频特征的自动洼地估计","authors":"C. Smailis, N. Sarafianos, Theodoros Giannakopoulos, S. Perantonis","doi":"10.1145/2910674.2935856","DOIUrl":null,"url":null,"abstract":"In this paper, we predict a human's depression level in the BDI-II scale, using facial and voice features. Active orientation models (AOM) and several voice features were extracted from the video and audio modalities. Long-term and mid-term features were computed and a fusion is performed in the feature space. Videos from the Depression Recognition Sub-Challenge of the 2014 Audio-Visual Emotion Challenge and Workshop (AVEC 2014) were used and support vector regression models were trained to predict the depression level. We demonstrated that the fusion of AOMs with audio features leads to better performance compared to individual modalities. The obtained regression results indicate the robustness of the proposed technique, under different settings, as well as an RMSE improvement compared to the AVEC 2014 video baseline.","PeriodicalId":359504,"journal":{"name":"Proceedings of the 9th ACM International Conference on PErvasive Technologies Related to Assistive Environments","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Fusing active orientation models and mid-term audio features for automatic depression estimation\",\"authors\":\"C. Smailis, N. Sarafianos, Theodoros Giannakopoulos, S. Perantonis\",\"doi\":\"10.1145/2910674.2935856\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we predict a human's depression level in the BDI-II scale, using facial and voice features. Active orientation models (AOM) and several voice features were extracted from the video and audio modalities. Long-term and mid-term features were computed and a fusion is performed in the feature space. Videos from the Depression Recognition Sub-Challenge of the 2014 Audio-Visual Emotion Challenge and Workshop (AVEC 2014) were used and support vector regression models were trained to predict the depression level. We demonstrated that the fusion of AOMs with audio features leads to better performance compared to individual modalities. The obtained regression results indicate the robustness of the proposed technique, under different settings, as well as an RMSE improvement compared to the AVEC 2014 video baseline.\",\"PeriodicalId\":359504,\"journal\":{\"name\":\"Proceedings of the 9th ACM International Conference on PErvasive Technologies Related to Assistive Environments\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-06-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 9th ACM International Conference on PErvasive Technologies Related to Assistive Environments\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2910674.2935856\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 9th ACM International Conference on PErvasive Technologies Related to Assistive Environments","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2910674.2935856","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Fusing active orientation models and mid-term audio features for automatic depression estimation
In this paper, we predict a human's depression level in the BDI-II scale, using facial and voice features. Active orientation models (AOM) and several voice features were extracted from the video and audio modalities. Long-term and mid-term features were computed and a fusion is performed in the feature space. Videos from the Depression Recognition Sub-Challenge of the 2014 Audio-Visual Emotion Challenge and Workshop (AVEC 2014) were used and support vector regression models were trained to predict the depression level. We demonstrated that the fusion of AOMs with audio features leads to better performance compared to individual modalities. The obtained regression results indicate the robustness of the proposed technique, under different settings, as well as an RMSE improvement compared to the AVEC 2014 video baseline.