Liang Yang, Jingjie Zeng, Tao Peng, Xi Luo, Jinghui Zhang, Hongfei Lin
{"title":"Leniency to those who confess?: Predicting the Legal Judgement via Multi-Modal Analysis","authors":"Liang Yang, Jingjie Zeng, Tao Peng, Xi Luo, Jinghui Zhang, Hongfei Lin","doi":"10.1145/3382507.3418893","DOIUrl":"https://doi.org/10.1145/3382507.3418893","url":null,"abstract":"The Legal Judgement Prediction (LJP) is now under the spotlight. And it usually consists of multiple sub-tasks, such as penalty prediction (fine and imprisonment) and the prediction of articles of law. For penalty prediction, they are often closely related to the trial process, especially the attitude analysis of criminal suspects, which will influence the judgment of the presiding judge to some extent. In this paper, we firstly construct a multi-modal dataset with 517 cases of intentional assault, which contains trial information as well as the attitude of the suspect. Then, we explore the relationship between suspect`s attitude and term of imprisonment. Finally, we use the proposed multi-modal model to predict the suspect's attitude, and compare it with several strong baselines. Our experimental results show that the attitude of the criminal suspect is closely related to the penalty prediction, which provides a new perspective for LJP.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121665215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Gaze Tracker Accuracy and Precision Measurements in Virtual Reality Headsets","authors":"J. Kangas, Olli Koskinen, R. Raisamo","doi":"10.1145/3382507.3418816","DOIUrl":"https://doi.org/10.1145/3382507.3418816","url":null,"abstract":"To effectively utilize a gaze tracker in user interaction it is important to know the quality of the gaze data that it is measuring. We have developed a method to evaluate the accuracy and precision of gaze trackers in virtual reality headsets. The method consists of two software components. The first component is a simulation software that calibrates the gaze tracker and then performs data collection by providing a gaze target that moves around the headset's field-of-view. The second component makes an off-line analysis of the logged gaze data and provides a number of measurement results of the accuracy and precision. The analysis results consist of the accuracy and precision of the gaze tracker in different directions inside the virtual 3D space. Our method combines the measurements into overall accuracy and precision. Visualizations of the measurements are created to see possible trends over the display area. Results from selected areas in the display are analyzed to find out differences between the areas (for example, the middle/outer edge of the display or the upper/lower part of display).","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124315493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Punchline Detection using Context-Aware Hierarchical Multimodal Fusion","authors":"Akshat Choube, M. Soleymani","doi":"10.1145/3382507.3418891","DOIUrl":"https://doi.org/10.1145/3382507.3418891","url":null,"abstract":"Humor has a history as old as humanity. Humor often induces laughter and elicits amusement and engagement. Humorous behavior involves behavior manifested in different modalities including language, voice tone, and gestures. Thus, automatic understanding of humorous behavior requires multimodal behavior analysis. Humor detection is a well-established problem in Natural Language Processing but its multimodal analysis is less explored. In this paper, we present a context-aware hierarchical fusion network for multimodal punchline detection. The proposed neural architecture first fuses the modalities two by two and then fuses all three modalities. The network also models the context of the punchline using Gated Recurrent Unit(s). The model's performance is evaluated on UR-FUNNY database yielding state-of-the-art performance.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116166926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Toshiki Onishi, Arisa Yamauchi, Ryo Ishii, Y. Aono, Akihiro Miyata
{"title":"Analyzing Nonverbal Behaviors along with Praising","authors":"Toshiki Onishi, Arisa Yamauchi, Ryo Ishii, Y. Aono, Akihiro Miyata","doi":"10.1145/3382507.3418868","DOIUrl":"https://doi.org/10.1145/3382507.3418868","url":null,"abstract":"In this work, as a first attempt to analyze the relationship between praising skills and human behavior in dialogue, we focus on head and face behavior. We create a new dialogue corpus including face and head behavior information of persons who give praise (praiser) and receive praise (receiver) and the degree of success of praising (praising score). We also create a machine learning model that uses features related to head and face behavior to estimate praising score, clarify which features of the praiser and receiver are important in estimating praising score. The analysis results showed that features of the praiser and receiver are important in estimating praising score and that features related to utterance, head, gaze, and chin were important. The analysis of the features of high importance revealed that the praiser and receiver should face each other without turning their heads to the left or right, and the longer the praiser's utterance, the more successful the praising.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126012765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Zero-Shot Learning for Gesture Recognition","authors":"Naveen Madapana","doi":"10.1145/3382507.3421161","DOIUrl":"https://doi.org/10.1145/3382507.3421161","url":null,"abstract":"Zero-Shot Learning (ZSL) is a new paradigm in machine learning that aims to recognize the classes that are not present in the training data. Hence, this paradigm is capable of comprehending the categories that were never seen before. While deep learning has pushed the limits of unseen object recognition, ZSL for temporal problems such as unfamiliar gesture recognition (referred to as ZSGL) remain unexplored. ZSGL has the potential to result in efficient human-machine interfaces that can recognize and understand the spontaneous and conversational gestures of humans. In this regard, the objective of this work is to conceptualize, model and develop a framework to tackle ZSGL problems. The first step in the pipeline is to develop a database of gesture attributes that are representative of a range of categories. Next, a deep architecture consisting of convolutional and recurrent layers is proposed to jointly optimize the semantic and classification losses. Lastly, rigorous experiments are performed to compare the proposed model with respect to existing ZSL models on CGD 2013 and MSRC-12 datasets. In our preliminary work, we identified a list of 64 discriminative attributes related to gestures' morphological characteristics. Our approach yields an unseen class accuracy of (41%) which outperforms the state-of-the-art approaches by a considerable margin. Future work involves the following: 1. Modifying the existing architecture in order to improve the ZSL accuracy, 2. Augmenting the database of attributes to incorporate semantic properties, 3. Addressing the issue of data imbalance which is inherent to ZSL problems, and 4. Expanding this research to other domains such as surgeme and action recognition.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121710632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Facilitating Flexible Force Feedback Design with Feelix","authors":"Anke van Oosterhout, M. Bruns, Eve E. Hoggan","doi":"10.1145/3382507.3418819","DOIUrl":"https://doi.org/10.1145/3382507.3418819","url":null,"abstract":"In the last decade, haptic actuators have improved in quality and efficiency, enabling easier implementation in user interfaces. One of the next steps towards a mature haptics field is a larger and more diverse toolset that enables designers and novices to explore with the design and implementation of haptic feedback in their projects. In this paper, we look at several design projects that utilize haptic force feedback to aid interaction between the user and product. We analysed the process interaction designers went through when developing their haptic user interfaces. Based on our insights, we identified requirements for a haptic force feedback authoring tool. We discuss how these requirements are addressed by 'Feelix', a tool that supports sketching and refinement of haptic force feedback effects.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127657497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Speaker-Invariant Adversarial Domain Adaptation for Emotion Recognition","authors":"Yufeng Yin, Baiyu Huang, Yizhen Wu, M. Soleymani","doi":"10.1145/3382507.3418813","DOIUrl":"https://doi.org/10.1145/3382507.3418813","url":null,"abstract":"Automatic emotion recognition methods are sensitive to the variations across different datasets and their performance drops when evaluated across corpora. We can apply domain adaptation techniques e.g., Domain-Adversarial Neural Network (DANN) to mitigate this problem. Though the DANN can detect and remove the bias between corpora, the bias between speakers still remains which results in reduced performance. In this paper, we propose Speaker-Invariant Domain-Adversarial Neural Network (SIDANN) to reduce both the domain bias and the speaker bias. Specifically, based on the DANN, we add a speaker discriminator to unlearn information representing speakers' individual characteristics with a gradient reversal layer (GRL). Our experiments with multimodal data (speech, vision, and text) and the cross-domain evaluation indicate that the proposed SIDANN outperforms (+5.6% and +2.8% on average for detecting arousal and valence) the DANN model, suggesting that the SIDANN has a better domain adaptation ability than the DANN. Besides, the modality contribution analysis shows that the acoustic features are the most informative for arousal detection while the lexical features perform the best for valence detection.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132396449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sarah Ita Levitan, James Shin, Ivy Chen, Julia Hirschberg
{"title":"LieCatcher: Game Framework for Collecting Human Judgments of Deceptive Speech","authors":"Sarah Ita Levitan, James Shin, Ivy Chen, Julia Hirschberg","doi":"10.1145/3382507.3421166","DOIUrl":"https://doi.org/10.1145/3382507.3421166","url":null,"abstract":"Humans are notoriously poor at detecting deception --- most are worse than chance. To address this issue we have developed LieCatcher, a single-player web-based Game With A Purpose (GWAP) that allows players to assess their lie detection skills while providing human judgments of deceptive speech. Players listen to audio recordings drawn from a corpus of deceptive and non-deceptive interview dialogues, and guess if the speaker is lying or telling the truth. They are awarded points for correct guesses and at the end of the game they receive a score summarizing their performance at lie detection. We present the game design and implementation, and describe a crowdsourcing experiment conducted to study perceived deception.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129977530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sandra Ottl, S. Amiriparian, Maurice Gerczuk, Vincent Karas, Björn Schuller
{"title":"Group-level Speech Emotion Recognition Utilising Deep Spectrum Features","authors":"Sandra Ottl, S. Amiriparian, Maurice Gerczuk, Vincent Karas, Björn Schuller","doi":"10.1145/3382507.3417964","DOIUrl":"https://doi.org/10.1145/3382507.3417964","url":null,"abstract":"The objectives of this challenge paper are two fold: first, we apply a range of neural network based transfer learning approaches to cope with the data scarcity in the field of speech emotion recognition, and second, we fuse the obtained representations and predictions in a nearly and late fusion strategy to check the complementarity of the applied networks. In particular, we use our Deep Spectrum system to extract deep feature representations from the audio content of the 2020 EmotiW group level emotion prediction challenge data. We evaluate a total of ten ImageNet pre-trained Convolutional Neural Networks, including AlexNet, VGG16, VGG19 and three DenseNet variants as audio feature extractors. We compare their performance to the ComParE feature set used in the challenge baseline, employing simple logistic regression models trained with Stochastic Gradient Descent as classifiers. With the help of late fusion, our approach improves the performance on the test set from 47.88 % to 62.70 % accuracy.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131692590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Skanda Muralidhar, E. Kleinlogel, E. Mayor, Adrian Bangerter, M. S. Mast, D. Gática-Pérez
{"title":"Understanding Applicants' Reactions to Asynchronous Video Interviews Through Self-reports and Nonverbal Cues","authors":"Skanda Muralidhar, E. Kleinlogel, E. Mayor, Adrian Bangerter, M. S. Mast, D. Gática-Pérez","doi":"10.1145/3382507.3418869","DOIUrl":"https://doi.org/10.1145/3382507.3418869","url":null,"abstract":"Asynchronous video interviews (AVIs) are increasingly used by organizations in their hiring process. In this mode of interviewing, the applicants are asked to record their responses to predefined interview questions using a webcam via an online platform. AVIs have increased usage due to employers' perceived benefits in terms of costs and scale. However, little research has been conducted regarding applicants' reactions to these new interview methods. In this work, we investigate applicants' reactions to an AVI platform using self-reported measures previously validated in psychology literature. We also investigate the connections of these measures with nonverbal behavior displayed during the interviews. We find that participants who found the platform creepy and had concerns about privacy reported lower interview performance compared to participants who did not have such concerns. We also observe weak correlations between nonverbal cues displayed and these self-reported measures. Finally, inference experiments achieve overall low-performance w.r.t. to explaining applicants' reactions. Overall, our results reveal that participants who are not at ease with AVIs (i.e., high creepy ambiguity score) might be unfairly penalized. This has implications for improved hiring practices using AVIs.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131770988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}