Angela Vujic, S. Tong, Rosalind W. Picard, P. Maes
{"title":"Going with our Guts: Potentials of Wearable Electrogastrography (EGG) for Affect Detection","authors":"Angela Vujic, S. Tong, Rosalind W. Picard, P. Maes","doi":"10.1145/3382507.3418882","DOIUrl":"https://doi.org/10.1145/3382507.3418882","url":null,"abstract":"A hard challenge for wearable systems is to measure differences in emotional valence, i.e. positive and negative affect via physiology. However, the stomach or gastric signal is an unexplored modality that could offer new affective information. We created a wearable device and software to record gastric signals, known as electrogastrography (EGG). An in-laboratory study was conducted to compare EGG with electrodermal activity (EDA) in 33 individuals viewing affective stimuli. We found that negative stimuli attenuate EGG's indicators of parasympathetic activation, or \"rest and digest\" activity. We compare EGG to the remaining physiological signals and describe implications for affect detection. Further, we introduce how wearable EGG may support future applications in areas as diverse as reducing nausea in virtual reality and helping treat emotion-related eating disorders.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131688977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tousif Ahmed, M. Y. Ahmed, Md. Mahbubur Rahman, Ebrahim Nemati, Bashima Islam, K. Vatanparvar, Viswam Nathan, Daniel McCaffrey, Jilong Kuang, J. Gao
{"title":"Automated Time Synchronization of Cough Events from Multimodal Sensors in Mobile Devices","authors":"Tousif Ahmed, M. Y. Ahmed, Md. Mahbubur Rahman, Ebrahim Nemati, Bashima Islam, K. Vatanparvar, Viswam Nathan, Daniel McCaffrey, Jilong Kuang, J. Gao","doi":"10.1145/3382507.3418855","DOIUrl":"https://doi.org/10.1145/3382507.3418855","url":null,"abstract":"Tracking the type and frequency of cough events is critical for monitoring respiratory diseases. Coughs are one of the most common symptoms of respiratory and infectious diseases like COVID-19, and a cough monitoring system could have been vital in remote monitoring during a pandemic like COVID-19. While the existing solutions for cough monitoring use unimodal (e.g., audio) approaches for detecting coughs, a fusion of multimodal sensors (e.g., audio and accelerometer) from multiple devices (e.g., phone and watch) are likely to discover additional insights and can help to track the exacerbation of the respiratory conditions. However, such multimodal and multidevice fusion requires accurate time synchronization, which could be challenging for coughs as coughs are usually concise events (0.3-0.7 seconds). In this paper, we first demonstrate the time synchronization challenges of cough synchronization based on the cough data collected from two studies. Then we highlight the performance of a cross-correlation based time synchronization algorithm on the alignment of cough events. Our algorithm can synchronize 98.9% of cough events with an average synchronization error of 0.046s from two devices.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130574144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Real-Time Multimodal Emotion Recognition among Couples","authors":"George Boateng","doi":"10.1145/3382507.3421154","DOIUrl":"https://doi.org/10.1145/3382507.3421154","url":null,"abstract":"Researchers are interested in understanding the emotions of couples as it relates to relationship quality and dyadic management of chronic diseases. Currently, the process of assessing emotions is manual, time-intensive, and costly. Despite the existence of works on emotion recognition among couples, there exists no ubiquitous system that recognizes the emotions of couples in everyday life while addressing the complexity of dyadic interactions such as turn-taking in couples? conversations. In this work, we seek to develop a smartwatch-based system that leverages multimodal sensor data to recognize each partner's emotions in daily life. We are collecting data from couples in the lab and in the field and we plan to use the data to develop multimodal machine learning models for emotion recognition. Then, we plan to implement the best models in a smartwatch app and evaluate its performance in real-time and everyday life through another field study. Such a system could enable research both in the lab (e.g. couple therapy) or in daily life (assessment of chronic disease management or relationship quality) and enable interventions to improve the emotional well-being, relationship quality, and chronic disease management of couples.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121043215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Chatterjee, Avijoy Chakma, A. Gangopadhyay, Nirmalya Roy, Bivas Mitra, Sandip Chakraborty
{"title":"LASO: Exploiting Locomotive and Acoustic Signatures over the Edge to Annotate IMU Data for Human Activity Recognition","authors":"S. Chatterjee, Avijoy Chakma, A. Gangopadhyay, Nirmalya Roy, Bivas Mitra, Sandip Chakraborty","doi":"10.1145/3382507.3418826","DOIUrl":"https://doi.org/10.1145/3382507.3418826","url":null,"abstract":"Annotated IMU sensor data from smart devices and wearables are essential for developing supervised models for fine-grained human activity recognition, albeit generating sufficient annotated data for diverse human activities under different environments is challenging. Existing approaches primarily use human-in-the-loop based techniques, including active learning; however, they are tedious, costly, and time-consuming. Leveraging the availability of acoustic data from embedded microphones over the data collection devices, in this paper, we propose LASO, a multimodal approach for automated data annotation from acoustic and locomotive information. LASO works over the edge device itself, ensuring that only the annotated IMU data is collected, discarding the acoustic data from the device itself, hence preserving the audio-privacy of the user. In the absence of any pre-existing labeling information, such an auto-annotation is challenging as the IMU data needs to be sessionized for different time-scaled activities in a completely unsupervised manner. We use a change-point detection technique while synchronizing the locomotive information from the IMU data with the acoustic data, and then use pre-trained audio-based activity recognition models for labeling the IMU data while handling the acoustic noises. LASO efficiently annotates IMU data, without any explicit human intervention, with a mean accuracy of $0.93$ ($pm 0.04$) and $0.78$ ($pm 0.05$) for two different real-life datasets from workshop and kitchen environments, respectively.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129873523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shun Katada, S. Okada, Yuki Hirano, Kazunori Komatani
{"title":"Is She Truly Enjoying the Conversation?: Analysis of Physiological Signals toward Adaptive Dialogue Systems","authors":"Shun Katada, S. Okada, Yuki Hirano, Kazunori Komatani","doi":"10.1145/3382507.3418844","DOIUrl":"https://doi.org/10.1145/3382507.3418844","url":null,"abstract":"In human-agent interactions, it is necessary for the systems to identify the current emotional state of the user to adapt their dialogue strategies. Nevertheless, this task is challenging because the current emotional states are not always expressed in a natural setting and change dynamically. Recent accumulated evidence has indicated the usefulness of physiological modalities to realize emotion recognition. However, the contribution of the time series physiological signals in human-agent interaction during a dialogue has not been extensively investigated. This paper presents a machine learning model based on physiological signals to estimate a user's sentiment at every exchange during a dialogue. Using a wearable sensing device, the time series physiological data including the electrodermal activity (EDA) and heart rate in addition to acoustic and visual information during a dialogue were collected. The sentiment labels were annotated by the participants themselves and by external human coders for each exchange consisting of a pair of system and participant utterances. The experimental results showed that a multimodal deep neural network (DNN) model combined with the EDA and visual features achieved an accuracy of 63.2%. In general, this task is challenging, as indicated by the accuracy of 63.0% attained by the external coders. The analysis of the sentiment estimation results for each individual indicated that the human coders often wrongly estimated the negative sentiment labels, and in this case, the performance of the DNN model was higher than that of the human coders. These results indicate that physiological signals can help in detecting the implicit aspects of negative sentiments, which are acoustically/visually indistinguishable.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128632257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Keith Curtis, G. Awad, Shahzad Rajput, I. Soboroff
{"title":"International Workshop on Deep Video Understanding","authors":"Keith Curtis, G. Awad, Shahzad Rajput, I. Soboroff","doi":"10.1145/3382507.3419746","DOIUrl":"https://doi.org/10.1145/3382507.3419746","url":null,"abstract":"This is the introduction paper to the International Workshop on Deep Video Understanding, organized at the 22nd ACM Interational Conference on Multimodal Interaction. In recent years, a growing trend towards working on understanding videos (in particular movies) to a deeper level started to motivate researchers working in multimedia and computer vision to present new approaches and datasets to tackle this problem. This is a challenging research area which aims to develop a deep understanding of the relations which exist between different individuals and entities in movies using all available modalities such as video, audio, text and metadata. The aim of this workshop is to foster innovative research in this new direction and to provide benchmarking evaluations to advance technologies in the deep video understanding community.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133683865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fifty Shades of Green: Towards a Robust Measure of Inter-annotator Agreement for Continuous Signals","authors":"Brandon M. Booth, Shrikanth S. Narayanan","doi":"10.1145/3382507.3418860","DOIUrl":"https://doi.org/10.1145/3382507.3418860","url":null,"abstract":"Continuous human annotations of complex human experiences are essential for enabling psychological and machine-learned inquiry into the human mind, but establishing a reliable set of annotations for analysis and ground truth generation is difficult. Measures of consensus or agreement are often used to establish the reliability of a collection of annotations and thereby purport their suitability for further research and analysis. This work examines many of the commonly used agreement metrics for continuous-scale and continuous-time human annotations and demonstrates their shortcomings, especially in measuring agreement in general annotation shape and structure. Annotation quality is carefully examined in a controlled study where the true target signal is known and evidence is presented suggesting that annotators' perceptual distortions can be modeled using monotonic functions. A novel measure of agreement is proposed which is agnostic to these perceptual differences between annotators and provides unique information when assessing agreement. We illustrate how this measure complements existing agreement metrics and can serve as a tool for curating a reliable collection of human annotations based on differential consensus.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127446483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Detection of Listener Uncertainty in Robot-Led Second Language Conversation Practice","authors":"Ronald Cumbal, José Lopes, Olov Engwall","doi":"10.1145/3382507.3418873","DOIUrl":"https://doi.org/10.1145/3382507.3418873","url":null,"abstract":"Uncertainty is a frequently occurring affective state that learners experience during the acquisition of a second language. This state can constitute both a learning opportunity and a source of learner frustration. An appropriate detection could therefore benefit the learning process by reducing cognitive instability. In this study, we use a dyadic practice conversation between an adult second-language learner and a social robot to elicit events of uncertainty through the manipulation of the robot's spoken utterances (increased lexical complexity or prosody modifications). The characteristics of these events are then used to analyze multi-party practice conversations between a robot and two learners. Classification models are trained with multimodal features from annotated events of listener (un)certainty. We report the performance of our models on different settings, (sub)turn segments and multimodal inputs.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129524971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Béatrice Biancardi, Lou Maisonnave-Couterou, Pierrick Renault, Brian Ravenet, M. Mancini, G. Varni
{"title":"The WoNoWa Dataset: Investigating the Transactive Memory System in Small Group Interactions","authors":"Béatrice Biancardi, Lou Maisonnave-Couterou, Pierrick Renault, Brian Ravenet, M. Mancini, G. Varni","doi":"10.1145/3382507.3418843","DOIUrl":"https://doi.org/10.1145/3382507.3418843","url":null,"abstract":"We present WoNoWa, a novel multi-modal dataset of small group interactions in collaborative tasks. The dataset is explicitly designed to elicit and to study over time a Transactive Memory System (TMS), a group's emergent state characterizing the group's meta-knowledge about \"who knows what\". A rich set of automatic features and manual annotations, extracted from the collected audio-visual data, is available on request for research purposes. Features include individual descriptors (e.g., position, Quantity of Motion, speech activity) and group descriptors (e.g., F-formations). Additionally, participants' self-assessments are available. Preliminary results from exploratory analyses show that the WoNoWa design allowed groups to develop a TMS that increased across the tasks. These results encourage the use of the WoNoWa dataset for a better understanding of the relationship between behavioural patterns and TMS, that in turn could help to improve group performance.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126234007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Theodoros Kostoulas, Michal Muszynski, Theodora Chaspari, Panos Amelidis
{"title":"Multimodal Affect and Aesthetic Experience","authors":"Theodoros Kostoulas, Michal Muszynski, Theodora Chaspari, Panos Amelidis","doi":"10.1145/3382507.3420055","DOIUrl":"https://doi.org/10.1145/3382507.3420055","url":null,"abstract":"The term 'aesthetic experience' corresponds to the inner state of a person exposed to form and content of artistic objects. Exploring certain aesthetic values of artistic objects, as well as interpreting the aesthetic experience of people when exposed to art can contribute towards understanding (a) art and (b) people's affective reactions to artwork. Focusing on different types of artistic content, such as movies, music, urban art and other artwork, the goal of this workshop is to enhance the interdisciplinary collaboration between affective computing and aesthetics researchers.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122467248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}