{"title":"4th ICMI Workshop on Bridging Social Sciences and AI for Understanding Child Behaviour","authors":"Heysem Kaya, Anouk Neerincx, Maryam Najafian, Saeid Safavi","doi":"10.1145/3577190.3616858","DOIUrl":"https://doi.org/10.1145/3577190.3616858","url":null,"abstract":"Analysing and understanding child behaviour is a topic of great scientific interest across a wide range of disciplines, including social sciences and artificial intelligence (AI). Knowledge in these diverse fields is not yet integrated to its full potential. The aim of this workshop is to bring researchers from these fields together. The first three workshops had a significant impact. In this workshop, we discussed topics such as the use of AI techniques to better examine and model interactions and children’s emotional development, analyzing head movement patterns with respect to child age. The 2023 edition of the workshop is a successful new step towards the objective of bridging social sciences and AI, attracting contributions from various academic fields on child behaviour analysis. We see that atypical child development holds an important space in child behaviour research. While in visual domain, gaze and joint attention are popularly studied; speech and physiological signals of atypically developing children are shown to provide valuable cues motivating future work. This document summarizes the WoCBU’23 workshop, including the review process, keynote talks and the accepted papers.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135045206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A New Theory of Data Processing: Applying Artificial Intelligence to Cognition and Humanity","authors":"Jingwei Liu","doi":"10.1145/3577190.3616123","DOIUrl":"https://doi.org/10.1145/3577190.3616123","url":null,"abstract":"The traditional data processing uses machine as a passive feature detector or classifier for a given fixed dataset. However, we contend that this is not how humans understand and process data from the real world. Based on active inference, we propose a neural network model that actively processes the incoming data using predictive processing and actively samples the inputs from the environment that conforms to its internal representations. The model we adopt is the Helmholtz machine, a perfect parallel for the hierarchical model of the brain and the forward-backward connections of the cortex, thus available a biologically plausible implementation of the brain functions such as predictive processing, hierarchical message passing, and predictive coding under a machine-learning context. Besides, active sampling could also be incorporated into the model via the generative end as an interaction of the agent with the external world. The active sampling of the environment directly resorts to environmental salience and cultural niche construction. By studying a coupled multi-agent model of constructing a “desire path” as part of a cultural niche, we find a plausible way of explaining and simulating various problems under group flow, social interactions, shared cultural practices, and thinking through other minds.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135045696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The FineMotion entry to the GENEA Challenge 2023: DeepPhase for conversational gestures generation","authors":"Vladislav Korzun, Anna Beloborodova, Arkady Ilin","doi":"10.1145/3577190.3616119","DOIUrl":"https://doi.org/10.1145/3577190.3616119","url":null,"abstract":"This paper describes FineMotion’s entry to the GENEA Challenge 2023. We explore the potential of DeepPhase embeddings by adapting neural motion controllers to conversational gesture generation. This is achieved by introducing a recurrent encoder for control features. We additionally use VQ-VAE codebook encoding of gestures to support dyadic setup. The resulting system generates stable realistic motion controllable by audio, text and interlocutor’s motion.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135043300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"From Natural to Non-Natural Interaction: Embracing Interaction Design Beyond the Accepted Convention of Natural","authors":"Radu-Daniel Vatavu","doi":"10.1145/3577190.3616122","DOIUrl":"https://doi.org/10.1145/3577190.3616122","url":null,"abstract":"Natural interactions feel intuitive, familiar, and a good match to the task, user’s abilities, and context. Consequently, a wealth of scientific research has been conducted on natural interaction with computer systems. Contrary to conventional mainstream, we advocate for “non-natural interaction design” as a transformative, creative process that results in highly usable and effective interactions by deliberately deviating from users’ expectations and experience of engaging with the physical world. The non-natural approach to interaction design provokes a departure from the established notion of the “natural,” all the while prioritizing usability—albeit amidst the backdrop of the unconventional, unexpected, and intriguing.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluating the Potential of Caption Activation to Mitigate Confusion Inferred from Facial Gestures in Virtual Meetings","authors":"Melanie Heck, Jinhee Jeong, Christian Becker","doi":"10.1145/3577190.3614142","DOIUrl":"https://doi.org/10.1145/3577190.3614142","url":null,"abstract":"Following the COVID-19 pandemic, virtual meetings have not only become an integral part of collaboration, but are now also a popular tool for disseminating information to a large audience through webinars, online lectures, and the like. Ideally, the meeting participants should understand discussed topics as smoothly as in physical encounters. However, many experience confusion, but are hesitant to express their doubts. In this paper, we present the results from a user study with 45 Google Meet users that investigates how auto-generated captions can be used to improve comprehension. The results show that captions can help overcome confusion caused by language barriers, but not if it is the result of distorted words. To mitigate negative side effects such as occlusion of important visual information when captions are not strictly needed, we propose to activate them dynamically only when a user effectively experiences confusion. To determine instances that require captioning, we test whether the subliminal cues from facial gestures can be used to detect confusion. We confirm that confusion activates six facial action units (AU4, AU6, AU7, AU10, AU17, and AU23).","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cristina Luna-Jiménez, Manuel Gil-Martín, Ricardo Kleinlein, Rubén San-Segundo, Fernando Fernández-Martínez
{"title":"Interpreting Sign Language Recognition using Transformers and MediaPipe Landmarks","authors":"Cristina Luna-Jiménez, Manuel Gil-Martín, Ricardo Kleinlein, Rubén San-Segundo, Fernando Fernández-Martínez","doi":"10.1145/3577190.3614143","DOIUrl":"https://doi.org/10.1145/3577190.3614143","url":null,"abstract":"Sign Language Recognition (SLR) is a challenging task that aims to bridge the communication gap between the deaf and hearing communities. In recent years, deep learning-based approaches have shown promising results in SLR. However, the lack of interpretability remains a significant challenge. In this paper, we seek to understand which hand and pose MediaPipe Landmarks are deemed the most important for prediction as estimated by a Transformer model. We propose to embed a learnable array of parameters into the model that performs an element-wise multiplication of the inputs. This learned array highlights the most informative input features that contributed to solve the recognition task. Resulting in a human-interpretable vector that lets us interpret the model predictions. We evaluate our approach on public datasets called WLASL100 (SRL) and IPNHand (gesture recognition). We believe that the insights gained in this way could be exploited for the development of more efficient SLR pipelines.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Estimation of Violin Bow Pressure Using Photo-Reflective Sensors","authors":"Yurina Mizuho, Riku Kitamura, Yuta Sugiura","doi":"10.1145/3577190.3614172","DOIUrl":"https://doi.org/10.1145/3577190.3614172","url":null,"abstract":"The violin is one of the most popular instruments, but it is hard to learn. The bowing of the right hand is a crucial factor in determining the tone quality, but it is too complex to master, teach, and reproduce. Therefore, many studies have attempted to measure and analyze the bowing of the violin to help record performances and support practice. This work aimed to measure bow pressure, one of the parameters of bowing motion.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cristina Gena, Francesca Manini, Antonio Lieto, Alberto Lillo, Fabiana Vernero
{"title":"Can empathy affect the attribution of mental states to robots?","authors":"Cristina Gena, Francesca Manini, Antonio Lieto, Alberto Lillo, Fabiana Vernero","doi":"10.1145/3577190.3614167","DOIUrl":"https://doi.org/10.1145/3577190.3614167","url":null,"abstract":"This paper presents an experimental study showing that the humanoid robot NAO, in a condition already validated with regards to its capacity to trigger situational empathy in humans, is able to stimulate the attribution of mental states towards itself. Indeed, results show that participants not only experienced empathy towards NAO, when the robot was afraid of losing its memory due to a malfunction, but they also attributed higher scores to the robot emotional intelligence in the Attribution of Mental State Questionnaire, in comparison with the users in the control condition. This result suggests a possible correlation between empathy toward the robot and humans’ attribution of mental states to it.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maneesh Bilalpur, Saurabh Hinduja, Laura Cariola, Lisa Sheeber, Nicholas Allen, Louis-Philippe Morency, Jeffrey F. Cohn
{"title":"SHAP-based Prediction of Mother's History of Depression to Understand the Influence on Child Behavior","authors":"Maneesh Bilalpur, Saurabh Hinduja, Laura Cariola, Lisa Sheeber, Nicholas Allen, Louis-Philippe Morency, Jeffrey F. Cohn","doi":"10.1145/3577190.3614136","DOIUrl":"https://doi.org/10.1145/3577190.3614136","url":null,"abstract":"Depression strongly impacts parents’ behavior. Does parents’ depression strongly affect the behavior of their children as well? To investigate this question, we compared dyadic interactions between 73 depressed and 75 non-depressed mothers and their adolescent child. Families were of low income and 84% were white. Child behavior was measured from audio-video recordings using manual annotation of verbal and nonverbal behavior by expert coders and by multimodal computational measures of facial expression, face and head dynamics, prosody, speech behavior, and linguistics. For both sets of measures, we used Support Vector Machines. For computational measures, we investigated the relative contribution of single versus multiple modalities using a novel approach to SHapley Additive exPlanations (SHAP). Computational measures outperformed manual ratings by human experts. Among individual computational measures, prosody was the most informative. SHAP reduction resulted in a four-fold decrease in the number of features and highest performance (77% accuracy; positive and negative agreements at 75% and 76%, respectively). These findings suggest that maternal depression strongly impacts the behavior of adolescent children; differences are most revealed in prosody; multimodal features together with SHAP reduction are most powerful.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kapotaksha Das, Mohamed Abouelenien, Mihai G. Burzo, John Elson, Kwaku Prakah-Asante, Clay Maranville
{"title":"Towards Autonomous Physiological Signal Extraction From Thermal Videos Using Deep Learning","authors":"Kapotaksha Das, Mohamed Abouelenien, Mihai G. Burzo, John Elson, Kwaku Prakah-Asante, Clay Maranville","doi":"10.1145/3577190.3614123","DOIUrl":"https://doi.org/10.1145/3577190.3614123","url":null,"abstract":"Using the thermal modality in order to extract physiological signals as a noncontact means of remote monitoring is gaining traction in applications, such as healthcare monitoring. However, existing methods rely heavily on traditional tracking and mostly unsupervised signal processing methods, which could be affected significantly by noise and subjects’ movements. Using a novel deep learning architecture based on convolutional long short-term memory networks on a diverse dataset of 36 subjects, we present a personalized approach to extract multimodal signals, including the heart rate, respiration rate, and body temperature from thermal videos. We perform multimodal signal extraction for subjects in states of both active speaking and silence, requiring no parameter tuning in an end-to-end deep learning approach with automatic feature extraction. We experiment with different data sampling methods for training our deep learning models, as well as different network designs. Our results indicate the effectiveness and improved efficiency of the proposed models reaching more than 90% accuracy based on the availability of proper training data for each subject.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}