Alice Baird, Lukas Stappen, Lukas Christ, Lea Schumann, Eva-Maria Messner, Björn Schuller
{"title":"A Physiologically-Adapted Gold Standard for Arousal during Stress","authors":"Alice Baird, Lukas Stappen, Lukas Christ, Lea Schumann, Eva-Maria Messner, Björn Schuller","doi":"10.1145/3475957.3484446","DOIUrl":null,"url":null,"abstract":"Emotion is an inherently subjective psycho-physiological human state and to produce an agreed-upon representation (gold standard) for continuously perceived emotion requires time-consuming and costly training of multiple human annotators. With this in mind, there is strong evidence in the literature that physiological signals are an objective marker for states of emotion, particularly arousal. In this contribution, we utilise a multimodal dataset captured during a Trier Social Stress Test to explore the benefit of fusing physiological signals - Heartbeats per Minute ($BPM$), Electrodermal Activity (EDA), and Respiration-rate - for recognition of continuously perceived arousal utilising a Long Short-Term Memory, Recurrent Neural Network architecture, and various audio, video, and textual based features. We use the MuSe-Toolbox to create a gold standard that considers annotator delay and agreement weighting. An improvement in Concordance Correlation Coefficient (CCC) is seen across features sets when fusing EDA with arousal, compared to the arousal only gold standard results. Additionally, BERT-based textual features' results improved for arousal plus all physiological signals, obtaining up to .3344 CCC (.2118 CCC for arousal only). Multimodal fusion also improves CCC. Audio plus video features obtain up to .6157 CCC for arousal plus EDA, BPM.","PeriodicalId":313996,"journal":{"name":"Proceedings of the 2nd on Multimodal Sentiment Analysis Challenge","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2nd on Multimodal Sentiment Analysis Challenge","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3475957.3484446","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Emotion is an inherently subjective psycho-physiological human state and to produce an agreed-upon representation (gold standard) for continuously perceived emotion requires time-consuming and costly training of multiple human annotators. With this in mind, there is strong evidence in the literature that physiological signals are an objective marker for states of emotion, particularly arousal. In this contribution, we utilise a multimodal dataset captured during a Trier Social Stress Test to explore the benefit of fusing physiological signals - Heartbeats per Minute ($BPM$), Electrodermal Activity (EDA), and Respiration-rate - for recognition of continuously perceived arousal utilising a Long Short-Term Memory, Recurrent Neural Network architecture, and various audio, video, and textual based features. We use the MuSe-Toolbox to create a gold standard that considers annotator delay and agreement weighting. An improvement in Concordance Correlation Coefficient (CCC) is seen across features sets when fusing EDA with arousal, compared to the arousal only gold standard results. Additionally, BERT-based textual features' results improved for arousal plus all physiological signals, obtaining up to .3344 CCC (.2118 CCC for arousal only). Multimodal fusion also improves CCC. Audio plus video features obtain up to .6157 CCC for arousal plus EDA, BPM.