Proceedings of the 20th ACM International Conference on Multimodal Interaction最新文献

筛选
英文 中文
Dozing Off or Thinking Hard?: Classifying Multi-dimensional Attentional States in the Classroom from Video 打瞌睡还是努力思考?:从视频中对课堂多维注意力状态进行分类
Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3243000
F. Putze, Dennis Küster, Sonja Annerer-Walcher, M. Benedek
{"title":"Dozing Off or Thinking Hard?: Classifying Multi-dimensional Attentional States in the Classroom from Video","authors":"F. Putze, Dennis Küster, Sonja Annerer-Walcher, M. Benedek","doi":"10.1145/3242969.3243000","DOIUrl":"https://doi.org/10.1145/3242969.3243000","url":null,"abstract":"In this paper, we extract features of head pose, eye gaze, and facial expressions from video to estimate individual learners' attentional states in a classroom setting. We concentrate on the analysis of different definitions for a student's attention and show that available generic video processing components and a single video camera are sufficient to estimate the attentional state.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"44 13","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114081398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A Multimodal Approach to Understanding Human Vocal Expressions and Beyond 理解人类声音表达的多模态方法及超越
Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3243391
Shrikanth S. Narayanan
{"title":"A Multimodal Approach to Understanding Human Vocal Expressions and Beyond","authors":"Shrikanth S. Narayanan","doi":"10.1145/3242969.3243391","DOIUrl":"https://doi.org/10.1145/3242969.3243391","url":null,"abstract":"Human verbal and nonverbal expressions carry crucial information not only about intent but also emotions, individual identity, and the state of health and wellbeing. From a basic science perspective, understanding how such rich information is encoded in these signals can illuminate underlying production mechanisms including the variability therein, within and across individuals. From a technology perspective, finding ways for automatically processing and decoding this complex information continues to be of interest across a variety of applications. The convergence of sensing, communication and computing technologies is allowing access to data, in diverse forms and modalities, in ways that were unimaginable even a few years ago. These include data that afford the multimodal analysis and interpretation of the generation of human expressions. The first part of the talk will highlight advances that allow us to perform investigations on the dynamics of vocal production using real-time imaging and audio modeling to offer insights about how we produce speech and song with the vocal instrument. The second part of the talk will focus on the production of vocal expressions in conjunction with other signals from the face and body especially in encoding affect. The talk will draw data from various domains notably in health to illustrate some of the applications.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128549194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal Analysis of Client Behavioral Change Coding in Motivational Interviewing 动机访谈中客户行为改变编码的多模态分析
Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3242990
Chanuwas Aswamenakul, Lixing Liu, K. Carey, J. Woolley, Stefan Scherer, Brian Borsari
{"title":"Multimodal Analysis of Client Behavioral Change Coding in Motivational Interviewing","authors":"Chanuwas Aswamenakul, Lixing Liu, K. Carey, J. Woolley, Stefan Scherer, Brian Borsari","doi":"10.1145/3242969.3242990","DOIUrl":"https://doi.org/10.1145/3242969.3242990","url":null,"abstract":"Motivational Interviewing (MI) is a widely disseminated and effective therapeutic approach for behavioral disorder treatment. Over the past decade, MI research has identified client language as a central mediator between therapist skills and subsequent behavior change. Specifically, in-session client language referred to as change talk (CT; personal arguments for change) or sustain talk (ST; personal argument against changing the status quo) has been directly related to post-session behavior change. Despite the prevalent use of MI and extensive studies of MI underlying mechanisms, most existing studies focus on the linguistic aspect of MI, especially of client change talk and sustain talk and how they as a mediator influence the outcome of MI. In this study, we perform statistical analyses on acoustic behavior descriptors to test their discriminatory powers. Then we utilize multimodality by combining acoustic features with linguistic features to improve the accuracy of client change talk prediction. Lastly, we investigate into our trained model to understand what features inform the model about client utterance class and gain insights into the nature of MISC codes.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129782918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Evaluation of Real-time Deep Learning Turn-taking Models for Multiple Dialogue Scenarios 多对话场景下实时深度学习轮转模型的评估
Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3242994
Divesh Lala, K. Inoue, Tatsuya Kawahara
{"title":"Evaluation of Real-time Deep Learning Turn-taking Models for Multiple Dialogue Scenarios","authors":"Divesh Lala, K. Inoue, Tatsuya Kawahara","doi":"10.1145/3242969.3242994","DOIUrl":"https://doi.org/10.1145/3242969.3242994","url":null,"abstract":"The task of identifying when to take a conversational turn is an important function of spoken dialogue systems. The turn-taking system should also ideally be able to handle many types of dialogue, from structured conversation to spontaneous and unstructured discourse. Our goal is to determine how much a generalized model trained on many types of dialogue scenarios would improve on a model trained only for a specific scenario. To achieve this goal we created a large corpus of Wizard-of-Oz conversation data which consisted of several different types of dialogue sessions, and then compared a generalized model with scenario-specific models. For our evaluation we go further than simply reporting conventional metrics, which we show are not informative enough to evaluate turn-taking in a real-time system. Instead, we process results using a performance curve of latency and false cut-in rate, and further improve our model's real-time performance using a finite-state turn-taking machine. Our results show that the generalized model greatly outperformed the individual model for attentive listening scenarios but was worse in job interview scenarios. This implies that a model based on a large corpus is better suited to conversation which is more user-initiated and unstructured. We also propose that our method of evaluation leads to more informative performance metrics in a real-time system.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125888035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Strike A Pose: Capturing Non-Verbal Behaviour with Textile Sensors 摆个姿势:用纺织品传感器捕捉非语言行为
Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3264968
Sophie Skach
{"title":"Strike A Pose: Capturing Non-Verbal Behaviour with Textile Sensors","authors":"Sophie Skach","doi":"10.1145/3242969.3264968","DOIUrl":"https://doi.org/10.1145/3242969.3264968","url":null,"abstract":"This work searches to explore the potential of textile sensing systems as a new modality of capturing social behaviour. Hereby, the focus lies on evaluating the performance of embedded pressure sensors as reliable detectors for social cues, such as postural states. We have designed chair covers and trousers that were evaluated in two studies. The results show that these relatively simple sensors can distinguish postures as well as different behavioural cues.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125559686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MIRIAM: A Multimodal Interface for Explaining the Reasoning Behind Actions of Remote Autonomous Systems 用于解释远程自治系统行为背后的原因的多模态接口
Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3266297
H. Hastie, Javier Chiyah-Garcia, D. A. Robb, A. Laskov, P. Patrón
{"title":"MIRIAM: A Multimodal Interface for Explaining the Reasoning Behind Actions of Remote Autonomous Systems","authors":"H. Hastie, Javier Chiyah-Garcia, D. A. Robb, A. Laskov, P. Patrón","doi":"10.1145/3242969.3266297","DOIUrl":"https://doi.org/10.1145/3242969.3266297","url":null,"abstract":"Autonomous systems in remote locations have a high degree of autonomy and there is a need to explain what they are doing and why , in order to increase transparency and maintain trust. This is particularly important in hazardous, high-risk scenarios. Here, we describe a multimodal interface, MIRIAM, that enables remote vehicle behaviour to be queried by the user, along with mission and vehicle status. These explanations, as part of the multimodal interface, help improve the operator's mental model of what the vehicle can and can't do, increase transparency and assist with operator training.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121608530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Population-specific Detection of Couples' Interpersonal Conflict using Multi-task Learning 基于多任务学习的夫妻人际冲突群体特异性检测
Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3243007
Aditya Gujral, Theodora Chaspari, Adela C. Timmons, Yehsong Kim, S. Barrett, G. Margolin
{"title":"Population-specific Detection of Couples' Interpersonal Conflict using Multi-task Learning","authors":"Aditya Gujral, Theodora Chaspari, Adela C. Timmons, Yehsong Kim, S. Barrett, G. Margolin","doi":"10.1145/3242969.3243007","DOIUrl":"https://doi.org/10.1145/3242969.3243007","url":null,"abstract":"The inherent diversity of human behavior limits the capabilities of general large-scale machine learning systems, that usually require ample amounts of data to provide robust descriptors of the outcomes of interest. Motivated by this challenge, personalized and population-specific models comprise a promising line of work for representing human behavior, since they can make decisions for clusters of people with common characteristics, reducing the amount of data needed for training. We propose a multi-task learning (MTL) framework for developing population-specific models of interpersonal conflict between couples using ambulatory sensor and mobile data from real-life interactions. The criteria for population clustering include global indices related to couples' relationship quality and attachment style, person-specific factors of partners' positivity, negativity, and stress levels, as well as fluctuating factors of daily emotional arousal obtained from acoustic and physiological indices. Population-specific information is incorporated through a MTL feed-forward neural network (FF-NN), whose first layers capture the common information across all data samples, while its last layers are specific to the unique characteristics of each population. Our results indicate that the proposed MTL FF-NN trained solely on the sensor-based acoustic, linguistic, and physiological modalities provides unweighted and weighted F1-scores of 0.51 and 0.75, respectively, outperforming the corresponding baselines of a single general FF-NN trained on the entire dataset and separate FF-NNs trained on each population cluster individually. These demonstrate the feasibility of such ambulatory systems for detecting real-life behaviors and possibly intervening upon them, and highlights the importance of taking into account the inherent diversity of different populations from the general pool of data.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114753138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Multimodal Representation of Advertisements Using Segment-level Autoencoders 使用分段级自动编码器的广告多模态表示
Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3243026
Krishna Somandepalli, Victor R. Martinez, Naveen Kumar, Shrikanth S. Narayanan
{"title":"Multimodal Representation of Advertisements Using Segment-level Autoencoders","authors":"Krishna Somandepalli, Victor R. Martinez, Naveen Kumar, Shrikanth S. Narayanan","doi":"10.1145/3242969.3243026","DOIUrl":"https://doi.org/10.1145/3242969.3243026","url":null,"abstract":"Automatic analysis of advertisements (ads) poses an interesting problem for learning multimodal representations. A promising direction of research is the development of deep neural network autoencoders to obtain inter-modal and intra-modal representations. In this work, we propose a system to obtain segment-level unimodal and joint representations. These features are concatenated, and then averaged across the duration of an ad to obtain a single multimodal representation. The autoencoders are trained using segments generated by time-aligning frames between the audio and video modalities with forward and backward context. In order to assess the multimodal representations, we consider the tasks of classifying an ad as funny or exciting in a publicly available dataset of 2,720 ads. For this purpose we train the segment-level autoencoders on a larger, unlabeled dataset of 9,740 ads, agnostic of the test set. Our experiments show that: 1) the multimodal representations outperform joint and unimodal representations, 2) the different representations we learn are complementary to each other, and 3) the segment-level multimodal representations perform better than classical autoencoders and cross-modal representations -- within the context of the two classification tasks. We obtain an improvement of about 5% in classification accuracy compared to a competitive baseline.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124640371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Ten Opportunities and Challenges for Advancing Student-Centered Multimodal Learning Analytics 推进以学生为中心的多模态学习分析的十大机遇与挑战
Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3243010
S. Oviatt
{"title":"Ten Opportunities and Challenges for Advancing Student-Centered Multimodal Learning Analytics","authors":"S. Oviatt","doi":"10.1145/3242969.3243010","DOIUrl":"https://doi.org/10.1145/3242969.3243010","url":null,"abstract":"This paper presents a summary and critical reflection on ten major opportunities and challenges for advancing the field of multimodal learning analytics (MLA). It identifies emerging technology trends likely to disrupt learning analytics, challenges involved in forging viable participatory design partnerships, and impending issues associated with the control of data and privacy. Trends in health care analytics provide one attractive model for how new infrastructure can enable the collection of largerscale and more diverse datasets, and how end-user analytics can be designed to empower individuals and expand market adoption.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126568402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
International Workshop on Multimodal Analyses Enabling Artificial Agents in Human-Machine Interaction (Workshop Summary) 在人机交互中启用人工智能的多模态分析国际研讨会(研讨会总结)
Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3272743
Ronald Böck, Francesca Bonin, N. Campbell, R. Poppe
{"title":"International Workshop on Multimodal Analyses Enabling Artificial Agents in Human-Machine Interaction (Workshop Summary)","authors":"Ronald Böck, Francesca Bonin, N. Campbell, R. Poppe","doi":"10.1145/3242969.3272743","DOIUrl":"https://doi.org/10.1145/3242969.3272743","url":null,"abstract":"In this paper a brief overview of the third workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction. The paper is focussing on the main aspects intended to be discussed in the workshop reflecting the main scope of the papers presented during the meeting. The MA3HMI 2018 workshop is held in conjunction with the 18th ACM International Conference on Mulitmodal Interaction (ICMI 2018) taking place in Boulder, USA, in October 2018. This year, we have solicited papers concerning the different phases of the development of multimodal systems. Tools and systems that address real-time conversations with artificial agents and technical systems are also within the scope.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134120456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信