Proceedings of the 20th ACM International Conference on Multimodal Interaction最新文献_第7页

Dozing Off or Thinking Hard?: Classifying Multi-dimensional Attentional States in the Classroom from Video 打瞌睡还是努力思考?:从视频中对课堂多维注意力状态进行分类

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3243000

F. Putze, Dennis Küster, Sonja Annerer-Walcher, M. Benedek

引用次数: 6

A Multimodal Approach to Understanding Human Vocal Expressions and Beyond 理解人类声音表达的多模态方法及超越

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3243391

Shrikanth S. Narayanan

{"title":"A Multimodal Approach to Understanding Human Vocal Expressions and Beyond","authors":"Shrikanth S. Narayanan","doi":"10.1145/3242969.3243391","DOIUrl":"https://doi.org/10.1145/3242969.3243391","url":null,"abstract":"Human verbal and nonverbal expressions carry crucial information not only about intent but also emotions, individual identity, and the state of health and wellbeing. From a basic science perspective, understanding how such rich information is encoded in these signals can illuminate underlying production mechanisms including the variability therein, within and across individuals. From a technology perspective, finding ways for automatically processing and decoding this complex information continues to be of interest across a variety of applications. The convergence of sensing, communication and computing technologies is allowing access to data, in diverse forms and modalities, in ways that were unimaginable even a few years ago. These include data that afford the multimodal analysis and interpretation of the generation of human expressions. The first part of the talk will highlight advances that allow us to perform investigations on the dynamics of vocal production using real-time imaging and audio modeling to offer insights about how we produce speech and song with the vocal instrument. The second part of the talk will focus on the production of vocal expressions in conjunction with other signals from the face and body especially in encoding affect. The talk will draw data from various domains notably in health to illustrate some of the applications.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128549194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multimodal Analysis of Client Behavioral Change Coding in Motivational Interviewing 动机访谈中客户行为改变编码的多模态分析

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3242990

Chanuwas Aswamenakul, Lixing Liu, K. Carey, J. Woolley, Stefan Scherer, Brian Borsari

{"title":"Multimodal Analysis of Client Behavioral Change Coding in Motivational Interviewing","authors":"Chanuwas Aswamenakul, Lixing Liu, K. Carey, J. Woolley, Stefan Scherer, Brian Borsari","doi":"10.1145/3242969.3242990","DOIUrl":"https://doi.org/10.1145/3242969.3242990","url":null,"abstract":"Motivational Interviewing (MI) is a widely disseminated and effective therapeutic approach for behavioral disorder treatment. Over the past decade, MI research has identified client language as a central mediator between therapist skills and subsequent behavior change. Specifically, in-session client language referred to as change talk (CT; personal arguments for change) or sustain talk (ST; personal argument against changing the status quo) has been directly related to post-session behavior change. Despite the prevalent use of MI and extensive studies of MI underlying mechanisms, most existing studies focus on the linguistic aspect of MI, especially of client change talk and sustain talk and how they as a mediator influence the outcome of MI. In this study, we perform statistical analyses on acoustic behavior descriptors to test their discriminatory powers. Then we utilize multimodality by combining acoustic features with linguistic features to improve the accuracy of client change talk prediction. Lastly, we investigate into our trained model to understand what features inform the model about client utterance class and gain insights into the nature of MISC codes.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129782918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Evaluation of Real-time Deep Learning Turn-taking Models for Multiple Dialogue Scenarios 多对话场景下实时深度学习轮转模型的评估

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3242994

Divesh Lala, K. Inoue, Tatsuya Kawahara

{"title":"Evaluation of Real-time Deep Learning Turn-taking Models for Multiple Dialogue Scenarios","authors":"Divesh Lala, K. Inoue, Tatsuya Kawahara","doi":"10.1145/3242969.3242994","DOIUrl":"https://doi.org/10.1145/3242969.3242994","url":null,"abstract":"The task of identifying when to take a conversational turn is an important function of spoken dialogue systems. The turn-taking system should also ideally be able to handle many types of dialogue, from structured conversation to spontaneous and unstructured discourse. Our goal is to determine how much a generalized model trained on many types of dialogue scenarios would improve on a model trained only for a specific scenario. To achieve this goal we created a large corpus of Wizard-of-Oz conversation data which consisted of several different types of dialogue sessions, and then compared a generalized model with scenario-specific models. For our evaluation we go further than simply reporting conventional metrics, which we show are not informative enough to evaluate turn-taking in a real-time system. Instead, we process results using a performance curve of latency and false cut-in rate, and further improve our model's real-time performance using a finite-state turn-taking machine. Our results show that the generalized model greatly outperformed the individual model for attentive listening scenarios but was worse in job interview scenarios. This implies that a model based on a large corpus is better suited to conversation which is more user-initiated and unstructured. We also propose that our method of evaluation leads to more informative performance metrics in a real-time system.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125888035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 31

Strike A Pose: Capturing Non-Verbal Behaviour with Textile Sensors 摆个姿势:用纺织品传感器捕捉非语言行为

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3264968

Sophie Skach

引用次数: 0

MIRIAM: A Multimodal Interface for Explaining the Reasoning Behind Actions of Remote Autonomous Systems 用于解释远程自治系统行为背后的原因的多模态接口

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3266297

H. Hastie, Javier Chiyah-Garcia, D. A. Robb, A. Laskov, P. Patrón

引用次数: 13

Population-specific Detection of Couples' Interpersonal Conflict using Multi-task Learning 基于多任务学习的夫妻人际冲突群体特异性检测

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3243007

Aditya Gujral, Theodora Chaspari, Adela C. Timmons, Yehsong Kim, S. Barrett, G. Margolin

{"title":"Population-specific Detection of Couples' Interpersonal Conflict using Multi-task Learning","authors":"Aditya Gujral, Theodora Chaspari, Adela C. Timmons, Yehsong Kim, S. Barrett, G. Margolin","doi":"10.1145/3242969.3243007","DOIUrl":"https://doi.org/10.1145/3242969.3243007","url":null,"abstract":"The inherent diversity of human behavior limits the capabilities of general large-scale machine learning systems, that usually require ample amounts of data to provide robust descriptors of the outcomes of interest. Motivated by this challenge, personalized and population-specific models comprise a promising line of work for representing human behavior, since they can make decisions for clusters of people with common characteristics, reducing the amount of data needed for training. We propose a multi-task learning (MTL) framework for developing population-specific models of interpersonal conflict between couples using ambulatory sensor and mobile data from real-life interactions. The criteria for population clustering include global indices related to couples' relationship quality and attachment style, person-specific factors of partners' positivity, negativity, and stress levels, as well as fluctuating factors of daily emotional arousal obtained from acoustic and physiological indices. Population-specific information is incorporated through a MTL feed-forward neural network (FF-NN), whose first layers capture the common information across all data samples, while its last layers are specific to the unique characteristics of each population. Our results indicate that the proposed MTL FF-NN trained solely on the sensor-based acoustic, linguistic, and physiological modalities provides unweighted and weighted F1-scores of 0.51 and 0.75, respectively, outperforming the corresponding baselines of a single general FF-NN trained on the entire dataset and separate FF-NNs trained on each population cluster individually. These demonstrate the feasibility of such ambulatory systems for detecting real-life behaviors and possibly intervening upon them, and highlights the importance of taking into account the inherent diversity of different populations from the general pool of data.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114753138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Multimodal Representation of Advertisements Using Segment-level Autoencoders 使用分段级自动编码器的广告多模态表示

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3243026

Krishna Somandepalli, Victor R. Martinez, Naveen Kumar, Shrikanth S. Narayanan

{"title":"Multimodal Representation of Advertisements Using Segment-level Autoencoders","authors":"Krishna Somandepalli, Victor R. Martinez, Naveen Kumar, Shrikanth S. Narayanan","doi":"10.1145/3242969.3243026","DOIUrl":"https://doi.org/10.1145/3242969.3243026","url":null,"abstract":"Automatic analysis of advertisements (ads) poses an interesting problem for learning multimodal representations. A promising direction of research is the development of deep neural network autoencoders to obtain inter-modal and intra-modal representations. In this work, we propose a system to obtain segment-level unimodal and joint representations. These features are concatenated, and then averaged across the duration of an ad to obtain a single multimodal representation. The autoencoders are trained using segments generated by time-aligning frames between the audio and video modalities with forward and backward context. In order to assess the multimodal representations, we consider the tasks of classifying an ad as funny or exciting in a publicly available dataset of 2,720 ads. For this purpose we train the segment-level autoencoders on a larger, unlabeled dataset of 9,740 ads, agnostic of the test set. Our experiments show that: 1) the multimodal representations outperform joint and unimodal representations, 2) the different representations we learn are complementary to each other, and 3) the segment-level multimodal representations perform better than classical autoencoders and cross-modal representations -- within the context of the two classification tasks. We obtain an improvement of about 5% in classification accuracy compared to a competitive baseline.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124640371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Ten Opportunities and Challenges for Advancing Student-Centered Multimodal Learning Analytics 推进以学生为中心的多模态学习分析的十大机遇与挑战

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3243010

S. Oviatt

引用次数: 17

International Workshop on Multimodal Analyses Enabling Artificial Agents in Human-Machine Interaction (Workshop Summary) 在人机交互中启用人工智能的多模态分析国际研讨会(研讨会总结)

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3272743

Ronald Böck, Francesca Bonin, N. Campbell, R. Poppe

引用次数: 1