{"title":"Multimodal Groups' Analysis for Automated Cohesion Estimation","authors":"Lucien Maman","doi":"10.1145/3382507.3421153","DOIUrl":"https://doi.org/10.1145/3382507.3421153","url":null,"abstract":"Groups are getting more and more scholars' attention. With the rise of Social Signal Processing (SSP), many studies based on Social Sciences and Psychology findings focused on detecting and classifying groups? dynamics. Cohesion plays an important role in these groups? dynamics and is one of the most studied emergent states, involving both group motions and goals. This PhD project aims to provide a computational model addressing the multidimensionality of cohesion and capturing its subtle dynamics. It will offer new opportunities to develop applications to enhance interactions among humans as well as among humans and machines.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"169 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132972567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Advanced Multi-Instance Learning Method with Multi-features Engineering and Conservative Optimization for Engagement Intensity Prediction","authors":"Jianming Wu, Bo Yang, Yanan Wang, Gen Hattori","doi":"10.1145/3382507.3417959","DOIUrl":"https://doi.org/10.1145/3382507.3417959","url":null,"abstract":"This paper proposes an advanced multi-instance learning method with multi-features engineering and conservative optimization for engagement intensity prediction. It was applied to the EmotiW Challenge 2020 and the results demonstrated the proposed method's good performance. The task is to predict the engagement level when a subject-student is watching an educational video under a range of conditions and in various environments. As engagement intensity has a strong correlation with facial movements, upper-body posture movements and overall environmental movements in a given time interval, we extract and incorporate these motion features into a deep regression model consisting of layers with a combination of long short-term memory(LSTM), gated recurrent unit (GRU) and a fully connected layer. In order to precisely and robustly predict the engagement level in a long video with various situations such as darkness and complex backgrounds, a multi-features engineering function is used to extract synchronized multi-model features in a given period of time by considering both short-term and long-term dependencies. Based on these well-processed engineered multi-features, in the 1st training stage, we train and generate the best models covering all the model configurations to maximize validation accuracy. Furthermore, in the 2nd training stage, to avoid the overfitting problem attributable to the extremely small engagement dataset, we conduct conservative optimization by applying a single Bi-LSTM layer with only 16 units to minimize the overfitting, and split the engagement dataset (train + validation) with 5-fold cross validation (stratified k-fold) to train a conservative model. The proposed method, by using decision-level ensemble for the two training stages' models, finally win the second place in the challenge (MSE: 0.061110 on the testing set).","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131039134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Detecting Depression in Less Than 10 Seconds: Impact of Speaking Time on Depression Detection Sensitivity","authors":"Nujud Aloshban, A. Esposito, A. Vinciarelli","doi":"10.1145/3382507.3418875","DOIUrl":"https://doi.org/10.1145/3382507.3418875","url":null,"abstract":"This article investigates whether it is possible to detect depression using less than 10 seconds of speech. The experiments have involved 59 participants (including 29 that have been diagnosed with depression by a professional psychiatrist) and are based on a multimodal approach that jointly models linguistic (what people say) and acoustic (how people say it) aspects of speech using four different strategies for the fusion of multiple data streams. On average, every interview has lasted for 242.2 seconds, but the results show that 10 seconds or less are sufficient to achieve the same level of recall (roughly 70%) observed after using the entire inteview of every participant. In other words, it is possible to maintain the same level of sensitivity (the name of recall in clinical settings) while reducing by 95%, on average, the amount of time requireed to collect the necessary data.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115921864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Gesture Enhanced Comprehension of Ambiguous Human-to-Robot Instructions","authors":"Dulanga Weerakoon, Vigneshwaran Subbaraju, Nipuni Karumpulli, Tuan Tran, Qianli Xu, U-Xuan Tan, Joo-Hwee Lim, Archan Misra","doi":"10.1145/3382507.3418863","DOIUrl":"https://doi.org/10.1145/3382507.3418863","url":null,"abstract":"This work demonstrates the feasibility and benefits of using pointing gestures, a naturally-generated additional input modality, to improve the multi-modal comprehension accuracy of human instructions to robotic agents for collaborative tasks.We present M2Gestic, a system that combines neural-based text parsing with a novel knowledge-graph traversal mechanism, over a multi-modal input of vision, natural language text and pointing. Via multiple studies related to a benchmark table top manipulation task, we show that (a) M2Gestic can achieve close-to-human performance in reasoning over unambiguous verbal instructions, and (b) incorporating pointing input (even with its inherent location uncertainty) in M2Gestic results in a significant (30%) accuracy improvement when verbal instructions are ambiguous.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128265337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Personalised Human Device Interaction through Context aware Augmented Reality","authors":"Madhawa Perera","doi":"10.1145/3382507.3421157","DOIUrl":"https://doi.org/10.1145/3382507.3421157","url":null,"abstract":"Human-device interactions in smart environments are shifting prominently towards naturalistic user interactions such as gaze and gesture. However, ambiguities arise when users have to switch interactions as contexts change. This could confuse users who are accustomed to a set of conventional controls leading to system inefficiencies. My research explores how to reduce interaction ambiguity by semantically modelling user specific interactions with context, enabling personalised interactions through AR. Sensory data captured from an AR device is utilised to interpret user interactions and context which is then modeled in an extendable knowledge graph along with user's interaction preference using semantic web standards. These representations are utilized to bring semantics to AR applications about user's intent to interact with a particular device affordance. Therefore, this research aims to bring semantical modeling of personalised gesture interactions in AR/VR applications for smart/immersive environments.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130033894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Patricia Ivette Cornelio Martinez, E. Maggioni, Giada Brianza, S. Subramanian, Marianna Obrist
{"title":"SmellControl: The Study of Sense of Agency in Smell","authors":"Patricia Ivette Cornelio Martinez, E. Maggioni, Giada Brianza, S. Subramanian, Marianna Obrist","doi":"10.1145/3382507.3418810","DOIUrl":"https://doi.org/10.1145/3382507.3418810","url":null,"abstract":"The Sense of Agency (SoA) is crucial in interaction with technology, it refers to the feeling of 'I did that' as opposed to 'the system did that' supporting a feeling of being in control. Research in human-computer interaction has recently studied agency in visual, auditory and haptic interfaces, however the role of smell on agency remains unknown. Our sense of smell is quite powerful to elicit emotions, memories and awareness of the environment, which has been exploited to enhance user experiences (e.g., in VR and driving scenarios). In light of increased interest in designing multimodal interfaces including smell and its close link with emotions, we investigated, for the first time, the effect of smell-induced emotions on the SoA. We conducted a study using the Intentional Binding (IB) paradigm used to measure SoA while participants were exposed to three scents with different valence (pleasant, unpleasant, neutral). Our results show that participants? SoA increased with a pleasant scent compared to neutral and unpleasant scents. We discuss how our results can inform the design of multimodal and future olfactory interfaces.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132848893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lik-Hang Lee, Ngo Yan Yeung, Tristan Braud, Tong Li, Xiang Su, Pan Hui
{"title":"Force9: Force-assisted Miniature Keyboard on Smart Wearables","authors":"Lik-Hang Lee, Ngo Yan Yeung, Tristan Braud, Tong Li, Xiang Su, Pan Hui","doi":"10.1145/3382507.3418827","DOIUrl":"https://doi.org/10.1145/3382507.3418827","url":null,"abstract":"Smartwatches and other wearables are characterized by small-scale touchscreens that complicate the interaction with content. In this paper, we present Force9, the first optimized miniature keyboard leveraging force-sensitive touchscreens on wrist-worn computers. Force9 enables character selection in an ambiguous layout by analyzing the trade-off between interaction space and the easiness of force-assisted interaction. We argue that dividing the screen's pressure range into three contiguous force levels is sufficient to differentiate characters for fast and accurate text input. Our pilot study captures and calibrates the ability of users to perform force-assisted touches on miniature-sized keys on touchscreen devices. We then optimize the keyboard layout considering the goodness of character pairs (with regards to the selected English corpus) under the force-based configuration and the users? familiarity with the QWERTY layout. We finally evaluate the performance of the trimetric optimized Force9 layout, and achieve an average of 10.18 WPM by the end of the final session. Compared to the other state-of-the-art approaches, Force9 allows for single-gesture character selection without addendum sensors.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126212775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Carlos Velasco, A. Nijholt, C. Spence, Takuji Narumi, Kosuke Motoki, Gijs Huisman, Marianna Obrist
{"title":"Multisensory Approaches to Human-Food Interaction","authors":"Carlos Velasco, A. Nijholt, C. Spence, Takuji Narumi, Kosuke Motoki, Gijs Huisman, Marianna Obrist","doi":"10.1145/3382507.3419749","DOIUrl":"https://doi.org/10.1145/3382507.3419749","url":null,"abstract":"Here, we present the outcome of the 4th workshop on Multisensory Approaches to Human-Food Interaction (MHFI), developed in collaboration with ICMI 2020 in Utrecht, The Netherlands. Capitalizing on the increasing interest on multisensory aspects of human-food interaction and the unique contribution that our community offers, we developed a space to discuss ideas ranging from mechanisms of multisensory food perception, through multisensory technologies, to new applications of systems in the context of MHFI. All in all, the workshop involved 11 contributions, which will hopefully further help shape the basis of a field of inquiry that grows as we see progress in our understanding of the senses and the development of new technologies in the context of food.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126560484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shivam Srivastava, Saandeep Aathreya Sidhapur Lakshminarayan, Saurabh Hinduja, Sk Rahatul Jannat, Hamza Elhamdadi, Shaun J. Canavan
{"title":"Recognizing Emotion in the Wild using Multimodal Data","authors":"Shivam Srivastava, Saandeep Aathreya Sidhapur Lakshminarayan, Saurabh Hinduja, Sk Rahatul Jannat, Hamza Elhamdadi, Shaun J. Canavan","doi":"10.1145/3382507.3417970","DOIUrl":"https://doi.org/10.1145/3382507.3417970","url":null,"abstract":"In this work, we present our approach for all four tracks of the eighth Emotion Recognition in the Wild Challenge (EmotiW 2020). The four tasks are group emotion recognition, driver gaze prediction, predicting engagement in the wild, and emotion recognition using physiological signals. We explore multiple approaches including classical machine learning tools such as random forests, state of the art deep neural networks, and multiple fusion and ensemble-based approaches. We also show that similar approaches can be used across tracks as many of the features generalize well to the different problems (e.g. facial features). We detail evaluation results that are either comparable to or outperform the baseline results for both the validation and testing for most of the tracks.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121143044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Preserving Privacy in Image-based Emotion Recognition through User Anonymization","authors":"Vansh Narula, Kexin Feng, Theodora Chaspari","doi":"10.1145/3382507.3418833","DOIUrl":"https://doi.org/10.1145/3382507.3418833","url":null,"abstract":"The large amount of data captured by ambulatory sensing devices can afford us insights into longitudinal behavioral patterns, which can be linked to emotional, psychological, and cognitive outcomes. Yet, the sensitivity of behavioral data, which regularly involve speech signals and facial images, can cause strong privacy concerns, such as the leaking of the user identity. We examine the interplay between emotion-specific and user identity-specific information in image-based emotion recognition systems. We further study a user anonymization approach that preserves emotion-specific information, but eliminates user-dependent information from the convolutional kernel of convolutional neural networks (CNN), therefore reducing user re-identification risks. We formulate an adversarial learning problem implemented with a multitask CNN, that minimizes emotion classification and maximizes user identification loss. The proposed system is evaluated on three datasets achieving moderate to high emotion recognition and poor user identity recognition performance. The resulting image transformation obtained by the convolutional layer is visually inspected, attesting to the efficacy of the proposed system in preserving emotion-specific information. Implications from this study can inform the design of privacy-aware emotion recognition systems that preserve facets of human behavior, while concealing the identity of the user, and can be used in ambulatory monitoring applications related to health, well-being, and education.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121666569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}