Deichler, Anna, Mehta, Shivam, Alexanderson, Simon, Beskow, Jonas
{"title":"Diffusion-Based Co-Speech Gesture Generation Using Joint Text and Audio Representation","authors":"Deichler, Anna, Mehta, Shivam, Alexanderson, Simon, Beskow, Jonas","doi":"10.1145/3577190.3616117","DOIUrl":"https://doi.org/10.1145/3577190.3616117","url":null,"abstract":"This paper describes a system developed for the GENEA (Generation and Evaluation of Non-verbal Behaviour for Embodied Agents) Challenge 2023. Our solution builds on an existing diffusion-based motion synthesis model. We propose a contrastive speech and motion pretraining (CSMP) module, which learns a joint embedding for speech and gesture with the aim to learn a semantic coupling between these modalities. The output of the CSMP module is used as a conditioning signal in the diffusion-based gesture synthesis model in order to achieve semantically-aware co-speech gesture generation. Our entry achieved highest human-likeness and highest speech appropriateness rating among the submitted entries. This indicates that our system is a promising approach to achieve human-like co-speech gestures in agents that carry semantic meaning.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"138 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135043457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michal Muszynski, Theodoros Kostoulas, Leimin Tian, Edgar Roman-Rangel, Theodora Chaspari, Panos Amelidis
{"title":"4th International Workshop on Multimodal Affect and Aesthetic Experience","authors":"Michal Muszynski, Theodoros Kostoulas, Leimin Tian, Edgar Roman-Rangel, Theodora Chaspari, Panos Amelidis","doi":"10.1145/3577190.3616886","DOIUrl":"https://doi.org/10.1145/3577190.3616886","url":null,"abstract":"“Aesthetic experience” corresponds to the inner state of a person exposed to the form and content of artistic objects. Quantifying and interpreting the aesthetic experience of people in various contexts contribute towards a) creating context, and b) better understanding people’s affective reactions to aesthetic stimuli. Focusing on different types of artistic content, such as movie, music, literature, urban art, ancient artwork, and modern interactive technology, the 4th international workshop on Multimodal Affect and Aesthetic Experience (MAAE) aims to enhance interdisciplinary collaboration among researchers from affective computing, aesthetics, human-robot/computer interaction, digital archaeology and art, culture, ethics, and addictive games.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"273 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maria Teresa Parreira, Sarah Gillet, Iolanda Leite
{"title":"Robot Duck Debugging: Can Attentive Listening Improve Problem Solving?","authors":"Maria Teresa Parreira, Sarah Gillet, Iolanda Leite","doi":"10.1145/3577190.3614160","DOIUrl":"https://doi.org/10.1145/3577190.3614160","url":null,"abstract":"While thinking aloud has been reported to positively affect problem-solving, the effects of the presence of an embodied entity (e.g., a social robot) to whom words can be directed remain mostly unexplored. In this work, we investigated the role of a robot in a “rubber duck debugging” setting, by analyzing how a robot’s listening behaviors could support a thinking-aloud problem-solving session. Participants completed two different tasks while speaking their thoughts aloud to either a robot or an inanimate object (a giant rubber duck). We implemented and tested two types of listener behavior in the robot: a rule-based heuristic and a deep-learning-based model. In a between-subject user study with 101 participants, we evaluated how the presence of a robot affected users’ engagement in thinking aloud, behavior during the task, and self-reported user experience. In addition, we explored the impact of the two robot listening behaviors on those measures. In contrast to prior work, our results indicate that neither the rule-based heuristic nor the deep learning robot conditions improved performance or perception of the task, compared to an inanimate object. We discuss potential explanations and shed light on the feasibility of designing social robots as assistive tools in thinking-aloud problem-solving tasks.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mansi Sharma, Shuang Chen, Philipp Müller, Maurice Rekrut, Antonio Krüger
{"title":"Implicit Search Intent Recognition using EEG and Eye Tracking: Novel Dataset and Cross-User Prediction","authors":"Mansi Sharma, Shuang Chen, Philipp Müller, Maurice Rekrut, Antonio Krüger","doi":"10.1145/3577190.3614166","DOIUrl":"https://doi.org/10.1145/3577190.3614166","url":null,"abstract":"For machines to effectively assist humans in challenging visual search tasks, they must differentiate whether a human is simply glancing into a scene (navigational intent) or searching for a target object (informational intent). Previous research proposed combining electroencephalography (EEG) and eye-tracking measurements to recognize such search intents implicitly, i.e., without explicit user input. However, the applicability of these approaches to real-world scenarios suffers from two key limitations. First, previous work used fixed search times in the informational intent condition - a stark contrast to visual search, which naturally terminates when the target is found. Second, methods incorporating EEG measurements addressed prediction scenarios that require ground truth training data from the target user, which is impractical in many use cases. We address these limitations by making the first publicly available EEG and eye-tracking dataset for navigational vs. informational intent recognition, where the user determines search times. We present the first method for cross-user prediction of search intents from EEG and eye-tracking recordings and reach accuracy in leave-one-user-out evaluations - comparable to within-user prediction accuracy () but offering much greater flexibility.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tan Gemicioglu, R. Michael Winters, Yu-Te Wang, Thomas M. Gable, Ivan J. Tashev
{"title":"TongueTap: Multimodal Tongue Gesture Recognition with Head-Worn Devices","authors":"Tan Gemicioglu, R. Michael Winters, Yu-Te Wang, Thomas M. Gable, Ivan J. Tashev","doi":"10.1145/3577190.3614120","DOIUrl":"https://doi.org/10.1145/3577190.3614120","url":null,"abstract":"Mouth-based interfaces are a promising new approach enabling silent, hands-free and eyes-free interaction with wearable devices. However, interfaces sensing mouth movements are traditionally custom-designed and placed near or within the mouth. TongueTap synchronizes multimodal EEG, PPG, IMU, eye tracking and head tracking data from two commercial headsets to facilitate tongue gesture recognition using only off-the-shelf devices on the upper face. We classified eight closed-mouth tongue gestures with 94% accuracy, offering an invisible and inaudible method for discreet control of head-worn devices. Moreover, we found that the IMU alone differentiates eight gestures with 80% accuracy and a subset of four gestures with 92% accuracy. We built a dataset of 48,000 gesture trials across 16 participants, allowing TongueTap to perform user-independent classification. Our findings suggest tongue gestures can be a viable interaction technique for VR/AR headsets and earables without requiring novel hardware.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135045197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ether-Mark: An Off-Screen Marking Menu For Mobile Devices","authors":"Hanae Rateau, Yosra Rekik, Edward Lank","doi":"10.1145/3577190.3614150","DOIUrl":"https://doi.org/10.1145/3577190.3614150","url":null,"abstract":"Given the computing power of mobile devices, porting feature-rich applications to these devices is increasingly feasible. However, feature-rich applications include large command sets, and providing access to these commands through screen-based widgets results in issues of occlusion and layering. To address this issue, we introduce Ether-Mark, a hierarchical, gesture-based, marking menu inspired, around-device menu for mobile devices enabling both on- and near-device interaction. We investigate the design of such menus and their learnability through three experiments. We first design and contrast three variants of Ether-Mark, yielding a zigzag menu design. We then refine input accuracy via a deformation model of the menu. And, we evaluate the learnability of the menus and the accuracy of the deformation model, revealing an accuracy rate up to 98.28%. We finally, compare in-air Ether-Mark with marking menus.Our results argue for Ether-Mark as a promising effective mechanism to leverage proximal around-device space.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"185 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135045198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"4th Workshop on Social Affective Multimodal Interaction for Health (SAMIH)","authors":"Hiroki Tanaka, Satoshi Nakamura, Jean-Claude Martin, Catherine Pelachaud","doi":"10.1145/3577190.3616878","DOIUrl":"https://doi.org/10.1145/3577190.3616878","url":null,"abstract":"This workshop discusses how interactive, multimodal technology, such as virtual agents, can measure and train social-affective interactions. Sensing technology now enables analyzing users’ behaviors and physiological signals. Various signal processing and machine learning methods can be used for prediction tasks. Such social signal processing and tools can be applied to measure and reduce social stress in everyday situations, including public speaking at schools and workplaces.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"2020 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135045203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Recording multimodal pair-programming dialogue for reference resolution by conversational agents","authors":"Cecilia Domingo","doi":"10.1145/3577190.3614231","DOIUrl":"https://doi.org/10.1145/3577190.3614231","url":null,"abstract":"Pair programming is a collaborative technique which has proven highly beneficial in terms of the code produced and the learning gains for programmers. With recent advances in Programming Language Processing (PLP), numerous tools have been created that assist programmers in non-collaborative settings (i.e., where the technology provides users with a solution, instead of discussing the problem to develop a solution together). How can we develop AI that can assist in pair programming, a collaborative setting? To tackle this task, we begin by gathering multimodal dialogue data which can be used to train systems in a basic subtask of dialogue understanding: multimodal reference resolution, i.e., understanding which parts of a program are being mentioned by users through speech or by using the mouse and keyboard.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modeling Social Cognition and its Neurologic Deficits with Artificial Neural Networks","authors":"Laurent P. Mertens","doi":"10.1145/3577190.3614232","DOIUrl":"https://doi.org/10.1145/3577190.3614232","url":null,"abstract":"Artificial Neural Networks (ANNs) are computer models loosely inspired by the functioning of the human brain. They are the state-of-the-art method for tackling a variety of Artificial Intelligence (AI) problems, and an increasingly popular tool in neuroscientific studies. However, both domains pursue different goals: in AI, performance is key and brain resemblance is incidental, while in neuroscience the aim is chiefly to better understand the brain. This PhD is situated at the intersection of both disciplines. Its goal is to develop ANNs that model social cognition in neurotypical individuals, and that can be altered in a controlled way to exhibit behavior consistent with individuals with one of two clinical conditions, Autism Spectrum Disorder and Frontotemporal Dementia.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Annika Dix, Clarissa Sabrina Arlinghaus, A. Marie Harkin, Sebastian Pannasch
{"title":"The Role of Audiovisual Feedback Delays and Bimodal Congruency for Visuomotor Performance in Human-Machine Interaction","authors":"Annika Dix, Clarissa Sabrina Arlinghaus, A. Marie Harkin, Sebastian Pannasch","doi":"10.1145/3577190.3614111","DOIUrl":"https://doi.org/10.1145/3577190.3614111","url":null,"abstract":"Despite incredible technological progress in the last decades, latency is still an issue for today's technologies and their applications. To better understand how latency and resulting feedback delays affect the interaction between humans and cyber-physical systems (CPS), the present study examines separate and joint effects of visual and auditory feedback delays on performance and the motor control strategy in a complex visuomotor task. Thirty-six participants played the Wire Loop Game, a fine motor skill task, while going through four different delay conditions: no delay, visual only, auditory only, and audiovisual (length: 200 ms). Participants’ speed and accuracy for completing the task and movement kinematic were assessed. Visual feedback delays slowed down movement execution and impaired precision compared to a condition without feedback delays. In contrast, delayed auditory feedback improved precision. Descriptively, the latter finding mainly appeared when congruent visual and auditory feedback delays were provided. We discuss the role of temporal congruency of audiovisual information as well as potential compensatory mechanisms that can inform the design of multisensory feedback in human-CPS interaction faced with latency.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}