Companion Publication of the 2020 International Conference on Multimodal Interaction最新文献_第4页

Towards Autonomous Physiological Signal Extraction From Thermal Videos Using Deep Learning 基于深度学习的热视频自主生理信号提取

Companion Publication of the 2020 International Conference on Multimodal Interaction Pub Date : 2023-10-09 DOI: 10.1145/3577190.3614123

Kapotaksha Das, Mohamed Abouelenien, Mihai G. Burzo, John Elson, Kwaku Prakah-Asante, Clay Maranville

{"title":"Towards Autonomous Physiological Signal Extraction From Thermal Videos Using Deep Learning","authors":"Kapotaksha Das, Mohamed Abouelenien, Mihai G. Burzo, John Elson, Kwaku Prakah-Asante, Clay Maranville","doi":"10.1145/3577190.3614123","DOIUrl":"https://doi.org/10.1145/3577190.3614123","url":null,"abstract":"Using the thermal modality in order to extract physiological signals as a noncontact means of remote monitoring is gaining traction in applications, such as healthcare monitoring. However, existing methods rely heavily on traditional tracking and mostly unsupervised signal processing methods, which could be affected significantly by noise and subjects’ movements. Using a novel deep learning architecture based on convolutional long short-term memory networks on a diverse dataset of 36 subjects, we present a personalized approach to extract multimodal signals, including the heart rate, respiration rate, and body temperature from thermal videos. We perform multimodal signal extraction for subjects in states of both active speaking and silence, requiring no parameter tuning in an end-to-end deep learning approach with automatic feature extraction. We experiment with different data sampling methods for training our deep learning models, as well as different network designs. Our results indicate the effectiveness and improved efficiency of the proposed models reaching more than 90% accuracy based on the availability of proper training data for each subject.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multimodal Fusion Interactions: A Study of Human and Automatic Quantification 多模态融合相互作用:人与自动量化研究

Companion Publication of the 2020 International Conference on Multimodal Interaction Pub Date : 2023-10-09 DOI: 10.1145/3577190.3614151

Paul Pu Liang, Yun Cheng, Ruslan Salakhutdinov, Louis-Philippe Morency

{"title":"Multimodal Fusion Interactions: A Study of Human and Automatic Quantification","authors":"Paul Pu Liang, Yun Cheng, Ruslan Salakhutdinov, Louis-Philippe Morency","doi":"10.1145/3577190.3614151","DOIUrl":"https://doi.org/10.1145/3577190.3614151","url":null,"abstract":"In order to perform multimodal fusion of heterogeneous signals, we need to understand their interactions: how each modality individually provides information useful for a task and how this information changes in the presence of other modalities. In this paper, we perform a comparative study of how humans annotate two categorizations of multimodal interactions: (1) partial labels, where different annotators annotate the label given the first, second, and both modalities, and (2) counterfactual labels, where the same annotator annotates the label given the first modality before asking them to explicitly reason about how their answer changes when given the second. We further propose an alternative taxonomy based on (3) information decomposition, where annotators annotate the degrees of redundancy: the extent to which modalities individually and together give the same predictions, uniqueness: the extent to which one modality enables a prediction that the other does not, and synergy: the extent to which both modalities enable one to make a prediction that one would not otherwise make using individual modalities. Through experiments and annotations, we highlight several opportunities and limitations of each approach and propose a method to automatically convert annotations of partial and counterfactual labels to information decomposition, yielding an accurate and efficient method for quantifying multimodal interactions.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135045204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multimodal Analysis and Assessment of Therapist Empathy in Motivational Interviews 动机性访谈中治疗师共情的多模态分析与评估

Companion Publication of the 2020 International Conference on Multimodal Interaction Pub Date : 2023-10-09 DOI: 10.1145/3577190.3614105

Trang Tran, Yufeng Yin, Leili Tavabi, Joannalyn Delacruz, Brian Borsari, Joshua D Woolley, Stefan Scherer, Mohammad Soleymani

{"title":"Multimodal Analysis and Assessment of Therapist Empathy in Motivational Interviews","authors":"Trang Tran, Yufeng Yin, Leili Tavabi, Joannalyn Delacruz, Brian Borsari, Joshua D Woolley, Stefan Scherer, Mohammad Soleymani","doi":"10.1145/3577190.3614105","DOIUrl":"https://doi.org/10.1145/3577190.3614105","url":null,"abstract":"The quality and effectiveness of psychotherapy sessions are highly influenced by the therapists’ ability to meaningfully connect with clients. Automated assessment of therapist empathy provides cost-effective and systematic means of assessing the quality of therapy sessions. In this work, we propose to assess therapist empathy using multimodal behavioral data, i.e. spoken language (text) and audio in real-world motivational interviewing (MI) sessions for alcohol abuse intervention. We first study each modality (text vs. audio) individually and then evaluate a multimodal approach using different fusion strategies for automated recognition of empathy levels (high vs. low). Leveraging recent pre-trained models both for text (DistilRoBERTa) and speech (HuBERT) as strong unimodal baselines, we obtain consistent 2-3 point improvements in F1 scores with early and late fusion, and the highest absolute improvement of 6–12 points over unimodal baselines. Our models obtain F1 scores of 68% when only looking at an early segment of the sessions and up to 72% in a therapist-dependent setting. In addition, our results show that a relatively small portion of sessions, specifically the second quartile, is most important in empathy prediction, outperforming predictions on later segments and on the full sessions. Our analyses in late fusion results show that fusion models rely more on the audio modality in limited-data settings, such as in individual quartiles and when using only therapist turns. Further, we observe the highest misclassification rates for parts of the sessions with MI inconsistent utterances (20% misclassified by all models), likely due to the complex nature of these types of intents in relation to perceived empathy.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135045695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

From Natural to Non-Natural Interaction: Embracing Interaction Design Beyond the Accepted Convention of Natural 从自然到非自然的交互:拥抱超越自然公认惯例的交互设计

Companion Publication of the 2020 International Conference on Multimodal Interaction Pub Date : 2023-10-09 DOI: 10.1145/3577190.3616122

Radu-Daniel Vatavu

引用次数: 0

Interpreting Sign Language Recognition using Transformers and MediaPipe Landmarks 使用变形金刚和MediaPipe地标解释手语识别

Companion Publication of the 2020 International Conference on Multimodal Interaction Pub Date : 2023-10-09 DOI: 10.1145/3577190.3614143

Cristina Luna-Jiménez, Manuel Gil-Martín, Ricardo Kleinlein, Rubén San-Segundo, Fernando Fernández-Martínez

引用次数: 0

Evaluating the Potential of Caption Activation to Mitigate Confusion Inferred from Facial Gestures in Virtual Meetings 评估标题激活的潜力，以减轻虚拟会议中从面部手势推断的混淆

Companion Publication of the 2020 International Conference on Multimodal Interaction Pub Date : 2023-10-09 DOI: 10.1145/3577190.3614142

Melanie Heck, Jinhee Jeong, Christian Becker

{"title":"Evaluating the Potential of Caption Activation to Mitigate Confusion Inferred from Facial Gestures in Virtual Meetings","authors":"Melanie Heck, Jinhee Jeong, Christian Becker","doi":"10.1145/3577190.3614142","DOIUrl":"https://doi.org/10.1145/3577190.3614142","url":null,"abstract":"Following the COVID-19 pandemic, virtual meetings have not only become an integral part of collaboration, but are now also a popular tool for disseminating information to a large audience through webinars, online lectures, and the like. Ideally, the meeting participants should understand discussed topics as smoothly as in physical encounters. However, many experience confusion, but are hesitant to express their doubts. In this paper, we present the results from a user study with 45 Google Meet users that investigates how auto-generated captions can be used to improve comprehension. The results show that captions can help overcome confusion caused by language barriers, but not if it is the result of distorted words. To mitigate negative side effects such as occlusion of important visual information when captions are not strictly needed, we propose to activate them dynamically only when a user effectively experiences confusion. To determine instances that require captioning, we test whether the subliminal cues from facial gestures can be used to detect confusion. We confirm that confusion activates six facial action units (AU4, AU6, AU7, AU10, AU17, and AU23).","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ASMRcade: Interactive Audio Triggers for an Autonomous Sensory Meridian Response ASMRcade:自主感觉经络反应的交互式音频触发器

Companion Publication of the 2020 International Conference on Multimodal Interaction Pub Date : 2023-10-09 DOI: 10.1145/3577190.3614155

Silvan Mertes, Marcel Strobl, Ruben Schlagowski, Elisabeth André

引用次数: 0

Influence of hand representation on a grasping task in augmented reality 手表征对增强现实中抓取任务的影响

Companion Publication of the 2020 International Conference on Multimodal Interaction Pub Date : 2023-10-09 DOI: 10.1145/3577190.3614128

Louis Lafuma, Guillaume Bouyer, Olivier Goguel, Jean-Yves Pascal Didier

引用次数: 0

Annotations from speech and heart rate: impact on multimodal emotion recognition 语音和心率注释:对多模态情感识别的影响

Companion Publication of the 2020 International Conference on Multimodal Interaction Pub Date : 2023-10-09 DOI: 10.1145/3577190.3614165

Kaushal Sharma, Guillaume Chanel

{"title":"Annotations from speech and heart rate: impact on multimodal emotion recognition","authors":"Kaushal Sharma, Guillaume Chanel","doi":"10.1145/3577190.3614165","DOIUrl":"https://doi.org/10.1145/3577190.3614165","url":null,"abstract":"The focus of multimodal emotion recognition has often been on the analysis of several fusion strategies. However, little attention has been paid to the effect of emotional cues, such as physiological and audio cues, on external annotations used to generate the Ground Truths (GTs). In our study, we analyze this effect by collecting six continuous arousal annotations for three groups of emotional cues: speech only, heartbeat sound only and their combination. Our results indicate significant differences between the three groups of annotations, thus giving three distinct cue-specific GTs. The relevance of these GTs is estimated by training multimodal machine learning models to regress speech, heart rate and their multimodal fusion on arousal. Our analysis shows that a cue(s)-specific GT is better predicted by the corresponding modality(s). In addition, the fusion of several emotional cues for the definition of GTs allows to reach a similar performance for both unimodal models and multimodal fusion. In conclusion, our results indicates that heart rate is an efficient cue for the generation of a physiological GT; and that combining several emotional cues for GTs generation is as important as performing input multimodal fusion for emotion prediction.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AIUnet: Asymptotic inference with U2-Net for referring image segmentation AIUnet:基于u2net的参考图像分割渐近推理

Companion Publication of the 2020 International Conference on Multimodal Interaction Pub Date : 2023-10-09 DOI: 10.1145/3577190.3614176

Jiangquan Li, Shimin Shan, Yu Liu, Kaiping Xu, Xiwen Hu, Mingcheng Xue

引用次数: 0