P. Cardinal, N. Dehak, Alessandro Lameiras Koerich, J. Alam, Patrice Boucher
{"title":"ETS系统AV+EC 2015挑战赛","authors":"P. Cardinal, N. Dehak, Alessandro Lameiras Koerich, J. Alam, Patrice Boucher","doi":"10.1145/2808196.2811639","DOIUrl":null,"url":null,"abstract":"This paper presents the system that we have developed for the AV+EC 2015 challenge which is mainly based on deep neural networks (DNNs). We have investigated different options using the audio feature set as a base system. The improvements that were achieved on this specific modality have been applied to other modalities. One of our main findings is that the frame stacking technique improves the quality of the predictions made by our model, and the improvements were also observed in all other modalities. Besides that, we also present a new feature set derived from the cardiac rhythm that were extracted from electrocardiogram readings. Such a new feature set helped us to improve the concordance correlation coefficient from 0.088 to 0.124 (on the development set) for the valence, an improvement of 25%. Finally, the fusion of all modalities has been studied using fusion at feature level using a DNN and at prediction level by training linear and random forest regressors. Both fusion schemes provided promising results.","PeriodicalId":123597,"journal":{"name":"Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge","volume":"159 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"ETS System for AV+EC 2015 Challenge\",\"authors\":\"P. Cardinal, N. Dehak, Alessandro Lameiras Koerich, J. Alam, Patrice Boucher\",\"doi\":\"10.1145/2808196.2811639\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents the system that we have developed for the AV+EC 2015 challenge which is mainly based on deep neural networks (DNNs). We have investigated different options using the audio feature set as a base system. The improvements that were achieved on this specific modality have been applied to other modalities. One of our main findings is that the frame stacking technique improves the quality of the predictions made by our model, and the improvements were also observed in all other modalities. Besides that, we also present a new feature set derived from the cardiac rhythm that were extracted from electrocardiogram readings. Such a new feature set helped us to improve the concordance correlation coefficient from 0.088 to 0.124 (on the development set) for the valence, an improvement of 25%. Finally, the fusion of all modalities has been studied using fusion at feature level using a DNN and at prediction level by training linear and random forest regressors. Both fusion schemes provided promising results.\",\"PeriodicalId\":123597,\"journal\":{\"name\":\"Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge\",\"volume\":\"159 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-10-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2808196.2811639\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2808196.2811639","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
This paper presents the system that we have developed for the AV+EC 2015 challenge which is mainly based on deep neural networks (DNNs). We have investigated different options using the audio feature set as a base system. The improvements that were achieved on this specific modality have been applied to other modalities. One of our main findings is that the frame stacking technique improves the quality of the predictions made by our model, and the improvements were also observed in all other modalities. Besides that, we also present a new feature set derived from the cardiac rhythm that were extracted from electrocardiogram readings. Such a new feature set helped us to improve the concordance correlation coefficient from 0.088 to 0.124 (on the development set) for the valence, an improvement of 25%. Finally, the fusion of all modalities has been studied using fusion at feature level using a DNN and at prediction level by training linear and random forest regressors. Both fusion schemes provided promising results.