Detection of arousal and valence from facial expressions and physiological responses evoked by different types of stressors

Frontiers in Neuroergonomics Pub Date : 2024-03-15 DOI:10.3389/fnrgo.2024.1338243

Juliette Bruin, I. Stuldreher, Paola Perone, Koen Hogenelst, Marnix Naber, Wim Kamphuis, A. Brouwer

{"title":"Detection of arousal and valence from facial expressions and physiological responses evoked by different types of stressors","authors":"Juliette Bruin, I. Stuldreher, Paola Perone, Koen Hogenelst, Marnix Naber, Wim Kamphuis, A. Brouwer","doi":"10.3389/fnrgo.2024.1338243","DOIUrl":null,"url":null,"abstract":"Automatically detecting mental state such as stress from video images of the face could support evaluating stress responses in applicants for high risk jobs or contribute to timely stress detection in challenging operational settings (e.g., aircrew, command center operators). Challenges in automatically estimating mental state include the generalization of models across contexts and across participants. We here aim to create robust models by training them using data from different contexts and including physiological features. Fifty-one participants were exposed to different types of stressors (cognitive, social evaluative and startle) and baseline variants of the stressors. Video, electrocardiogram (ECG), electrodermal activity (EDA) and self-reports (arousal and valence) were recorded. Logistic regression models aimed to classify between high and low arousal and valence across participants, where “high” and “low” were defined relative to the center of the rating scale. Accuracy scores of different models were evaluated: models trained and tested within a specific context (either a baseline or stressor variant of a task), intermediate context (baseline and stressor variant of a task), or general context (all conditions together). Furthermore, for these different model variants, only the video data was included, only the physiological data, or both video and physiological data. We found that all (video, physiological and video-physio) models could successfully distinguish between high- and low-rated arousal and valence, though performance tended to be better for (1) arousal than valence, (2) specific context than intermediate and general contexts, (3) video-physio data than video or physiological data alone. Automatic feature selection resulted in inclusion of 3–20 features, where the models based on video-physio data usually included features from video, ECG and EDA. Still, performance of video-only models approached the performance of video-physio models. Arousal and valence ratings by three experienced human observers scores based on part of the video data did not match with self-reports. In sum, we showed that it is possible to automatically monitor arousal and valence even in relatively general contexts and better than humans can (in the given circumstances), and that non-contact video images of faces capture an important part of the information, which has practical advantages.","PeriodicalId":207447,"journal":{"name":"Frontiers in Neuroergonomics","volume":"85 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Neuroergonomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fnrgo.2024.1338243","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Automatically detecting mental state such as stress from video images of the face could support evaluating stress responses in applicants for high risk jobs or contribute to timely stress detection in challenging operational settings (e.g., aircrew, command center operators). Challenges in automatically estimating mental state include the generalization of models across contexts and across participants. We here aim to create robust models by training them using data from different contexts and including physiological features. Fifty-one participants were exposed to different types of stressors (cognitive, social evaluative and startle) and baseline variants of the stressors. Video, electrocardiogram (ECG), electrodermal activity (EDA) and self-reports (arousal and valence) were recorded. Logistic regression models aimed to classify between high and low arousal and valence across participants, where “high” and “low” were defined relative to the center of the rating scale. Accuracy scores of different models were evaluated: models trained and tested within a specific context (either a baseline or stressor variant of a task), intermediate context (baseline and stressor variant of a task), or general context (all conditions together). Furthermore, for these different model variants, only the video data was included, only the physiological data, or both video and physiological data. We found that all (video, physiological and video-physio) models could successfully distinguish between high- and low-rated arousal and valence, though performance tended to be better for (1) arousal than valence, (2) specific context than intermediate and general contexts, (3) video-physio data than video or physiological data alone. Automatic feature selection resulted in inclusion of 3–20 features, where the models based on video-physio data usually included features from video, ECG and EDA. Still, performance of video-only models approached the performance of video-physio models. Arousal and valence ratings by three experienced human observers scores based on part of the video data did not match with self-reports. In sum, we showed that it is possible to automatically monitor arousal and valence even in relatively general contexts and better than humans can (in the given circumstances), and that non-contact video images of faces capture an important part of the information, which has practical advantages.

查看原文本刊更多论文

从面部表情和不同类型压力诱发的生理反应中检测唤醒和情绪

从面部视频图像中自动检测心理状态（如压力）可以帮助评估高风险工作申请人的压力反应，或有助于在具有挑战性的操作环境中（如空勤人员、指挥中心操作员）及时发现压力。自动估计心理状态所面临的挑战包括模型在不同情境和不同参与者之间的通用性。在这里，我们的目标是通过使用不同情境下的数据和生理特征对模型进行训练，从而创建稳健的模型。51 名参与者暴露于不同类型的压力源（认知压力源、社会评价压力源和惊吓压力源）和压力源的基线变体。对视频、心电图（ECG）、皮电活动（EDA）和自我报告（唤醒和情绪）进行了记录。逻辑回归模型旨在对参与者的唤醒度和情绪进行高低分类，其中 "高 "和 "低 "是相对于评分量表的中心而言的。对不同模型的准确度进行了评估：在特定情境（任务的基线或压力变体）、中间情境（任务的基线和压力变体）或一般情境（所有条件加在一起）中训练和测试的模型。此外，在这些不同的模型变体中，有的只包含视频数据，有的只包含生理数据，有的则同时包含视频和生理数据。我们发现，所有（视频、生理和视频-生理）模型都能成功区分高分辨率和低分辨率的唤醒度和情绪，但在以下情况下表现往往更好：（1）唤醒度比情绪好；（2）特定情境比中间情境和一般情境好；（3）视频-生理数据比单独的视频或生理数据好。自动特征选择的结果是包含 3-20 个特征，而基于视频-生理数据的模型通常包含来自视频、心电图和 EDA 的特征。尽管如此，纯视频模型的性能仍接近于视频-生理模型的性能。三名经验丰富的人类观察者根据部分视频数据对唤醒和情绪进行的评分与自我报告并不一致。总之，我们的研究表明，即使在相对一般的情况下，也有可能自动监测唤醒和情绪，而且（在特定情况下）比人类做得更好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Frontiers in Neuroergonomics

自引率

0.00%

发文量