Advanced Multi-Instance Learning Method with Multi-features Engineering and Conservative Optimization for Engagement Intensity Prediction

Jianming Wu, Bo Yang, Yanan Wang, Gen Hattori
{"title":"Advanced Multi-Instance Learning Method with Multi-features Engineering and Conservative Optimization for Engagement Intensity Prediction","authors":"Jianming Wu, Bo Yang, Yanan Wang, Gen Hattori","doi":"10.1145/3382507.3417959","DOIUrl":null,"url":null,"abstract":"This paper proposes an advanced multi-instance learning method with multi-features engineering and conservative optimization for engagement intensity prediction. It was applied to the EmotiW Challenge 2020 and the results demonstrated the proposed method's good performance. The task is to predict the engagement level when a subject-student is watching an educational video under a range of conditions and in various environments. As engagement intensity has a strong correlation with facial movements, upper-body posture movements and overall environmental movements in a given time interval, we extract and incorporate these motion features into a deep regression model consisting of layers with a combination of long short-term memory(LSTM), gated recurrent unit (GRU) and a fully connected layer. In order to precisely and robustly predict the engagement level in a long video with various situations such as darkness and complex backgrounds, a multi-features engineering function is used to extract synchronized multi-model features in a given period of time by considering both short-term and long-term dependencies. Based on these well-processed engineered multi-features, in the 1st training stage, we train and generate the best models covering all the model configurations to maximize validation accuracy. Furthermore, in the 2nd training stage, to avoid the overfitting problem attributable to the extremely small engagement dataset, we conduct conservative optimization by applying a single Bi-LSTM layer with only 16 units to minimize the overfitting, and split the engagement dataset (train + validation) with 5-fold cross validation (stratified k-fold) to train a conservative model. The proposed method, by using decision-level ensemble for the two training stages' models, finally win the second place in the challenge (MSE: 0.061110 on the testing set).","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 International Conference on Multimodal Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3382507.3417959","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 22

Abstract

This paper proposes an advanced multi-instance learning method with multi-features engineering and conservative optimization for engagement intensity prediction. It was applied to the EmotiW Challenge 2020 and the results demonstrated the proposed method's good performance. The task is to predict the engagement level when a subject-student is watching an educational video under a range of conditions and in various environments. As engagement intensity has a strong correlation with facial movements, upper-body posture movements and overall environmental movements in a given time interval, we extract and incorporate these motion features into a deep regression model consisting of layers with a combination of long short-term memory(LSTM), gated recurrent unit (GRU) and a fully connected layer. In order to precisely and robustly predict the engagement level in a long video with various situations such as darkness and complex backgrounds, a multi-features engineering function is used to extract synchronized multi-model features in a given period of time by considering both short-term and long-term dependencies. Based on these well-processed engineered multi-features, in the 1st training stage, we train and generate the best models covering all the model configurations to maximize validation accuracy. Furthermore, in the 2nd training stage, to avoid the overfitting problem attributable to the extremely small engagement dataset, we conduct conservative optimization by applying a single Bi-LSTM layer with only 16 units to minimize the overfitting, and split the engagement dataset (train + validation) with 5-fold cross validation (stratified k-fold) to train a conservative model. The proposed method, by using decision-level ensemble for the two training stages' models, finally win the second place in the challenge (MSE: 0.061110 on the testing set).
基于多特征工程和保守优化的交战强度预测的高级多实例学习方法
本文提出了一种基于多特征工程和保守优化的先进多实例学习方法用于接合强度预测。将该方法应用于EmotiW挑战赛2020,结果证明了该方法的良好性能。这项任务是预测学生在各种条件和环境下观看教育视频时的参与程度。由于参与强度与给定时间间隔内的面部运动、上半身姿势运动和整体环境运动有很强的相关性,我们提取并将这些运动特征合并到一个深度回归模型中,该模型由长短期记忆(LSTM)、门控循环单元(GRU)和完全连接层组成。为了准确、稳健地预测长视频在黑暗、复杂背景等多种情况下的参与程度,采用多特征工程函数,同时考虑短、长期依赖关系,提取给定时间内同步的多模型特征。基于这些经过精心处理的工程多特征,在第一个训练阶段,我们训练并生成涵盖所有模型配置的最佳模型,以最大化验证精度。此外,在第二训练阶段,为了避免因交战数据集极小而导致的过拟合问题,我们采用仅16个单元的单个Bi-LSTM层进行保守优化,以最小化过拟合,并将交战数据集(训练+验证)分割为5倍交叉验证(分层k-fold),以训练保守模型。该方法通过对两个训练阶段的模型进行决策级集成,最终在挑战中获得第二名(测试集的MSE: 0.061110)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信