Proceedings of the 20th ACM International Conference on Multimodal Interaction最新文献_第2页

Pen + Mid-Air Gestures: Eliciting Contextual Gestures 笔+半空中手势:引出上下文手势

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3242979

Ilhan Aslan, Tabea Schmidt, Jens Woehrle, Lukas Vogel, E. André

引用次数: 16

Predicting ADHD Risk from Touch Interaction Data 从触摸交互数据预测ADHD风险

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3242986

Philipp Mock, Maike Tibus, A. Ehlis, H. Baayen, Peter Gerjets

引用次数: 9

Listening Skills Assessment through Computer Agents 通过计算机代理进行听力技能评估

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3242970

Hiroki Tanaka, Hideki Negoro, H. Iwasaka, Satoshi Nakamura

{"title":"Listening Skills Assessment through Computer Agents","authors":"Hiroki Tanaka, Hideki Negoro, H. Iwasaka, Satoshi Nakamura","doi":"10.1145/3242969.3242970","DOIUrl":"https://doi.org/10.1145/3242969.3242970","url":null,"abstract":"Social skills training, performed by human trainers, is a well-established method for obtaining appropriate skills in social interaction. Previous work automated the process of social skills training by developing a dialogue system that teaches social skills through interaction with a computer agent. Even though previous work that simulated social skills training considered speaking skills, human social skills trainers take into account other skills such as listening. In this paper, we propose assessment of user listening skills during conversation with computer agents toward automated social skills training. We recorded data of 27 Japanese graduate students interacting with a female agent. The agent spoke to the participants about a recent memorable story and how to make a telephone call, and the participants listened. Two expert external raters assessed the participants' listening skills. We manually extracted features relating to eye fixation and behavioral cues of the participants, and confirmed that a simple linear regression with selected features can correctly predict a user's listening skills with above 0.45 correlation coefficient.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114510613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

RainCheck 延期

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3243028

Ying-Chao Tung, Mayank Goel, Isaac Zinda, Jacob O. Wobbrock

引用次数: 3

"Honey, I Learned to Talk": Multimodal Fusion for Behavior Analysis “亲爱的，我学会了说话”:行为分析的多模态融合

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3242996

Shao-Yen Tseng, Haoqi Li, Brian R. Baucom, P. Georgiou

引用次数: 10

EVA: A Multimodal Argumentative Dialogue System EVA:多模态辩论对话系统

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3266292

Niklas Rach, Klaus Weber, Louisa Pragst, E. André, W. Minker, Stefan Ultes

引用次数: 10

Multiple Spatio-temporal Feature Learning for Video-based Emotion Recognition in the Wild 基于视频的情绪识别的多时空特征学习

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3264992

Cheng Lu, Wenming Zheng, Chaolong Li, Chuangao Tang, Suyuan Liu, Simeng Yan, Yuan Zong

{"title":"Multiple Spatio-temporal Feature Learning for Video-based Emotion Recognition in the Wild","authors":"Cheng Lu, Wenming Zheng, Chaolong Li, Chuangao Tang, Suyuan Liu, Simeng Yan, Yuan Zong","doi":"10.1145/3242969.3264992","DOIUrl":"https://doi.org/10.1145/3242969.3264992","url":null,"abstract":"The difficulty of emotion recognition in the wild (EmotiW) is how to train a robust model to deal with diverse scenarios and anomalies. The Audio-video Sub-challenge in EmotiW contains audio-video short clips with several emotional labels and the task is to distinguish which label the video belongs to. For the better emotion recognition in videos, we propose a multiple spatio-temporal feature fusion (MSFF) framework, which can more accurately depict emotional information in spatial and temporal dimensions by two mutually complementary sources, including the facial image and audio. The framework is consisted of two parts: the facial image model and the audio model. With respect to the facial image model, three different architectures of spatial-temporal neural networks are employed to extract discriminative features about different emotions in facial expression images. Firstly, the high-level spatial features are obtained by the pre-trained convolutional neural networks (CNN), including VGG-Face and ResNet-50 which are all fed with the images generated by each video. Then, the features of all frames are sequentially input to the Bi-directional Long Short-Term Memory (BLSTM) so as to capture dynamic variations of facial appearance textures in a video. In addition to the structure of CNN-RNN, another spatio-temporal network, namely deep 3-Dimensional Convolutional Neural Networks (3D CNN) by extending the 2D convolution kernel to 3D, is also applied to attain evolving emotional information encoded in multiple adjacent frames. For the audio model, the spectrogram images of speech generated by preprocessing audio, are also modeled in a VGG-BLSTM framework to characterize the affective fluctuation more efficiently. Finally, a fusion strategy with the score matrices of different spatio-temporal networks gained from the above framework is proposed to boost the performance of emotion recognition complementally. Extensive experiments show that the overall accuracy of our proposed MSFF is 60.64%, which achieves a large improvement compared with the baseline and outperform the result of champion team in 2017.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115616910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 53

Generating fMRI-Enriched Acoustic Vectors using a Cross-Modality Adversarial Network for Emotion Recognition 使用跨模态对抗网络生成富fmri的声音向量用于情绪识别

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3242992

Gao-Yi Chao, Chun-Min Chang, Jeng-Lin Li, Ya-Tse Wu, Chi-Chun Lee

{"title":"Generating fMRI-Enriched Acoustic Vectors using a Cross-Modality Adversarial Network for Emotion Recognition","authors":"Gao-Yi Chao, Chun-Min Chang, Jeng-Lin Li, Ya-Tse Wu, Chi-Chun Lee","doi":"10.1145/3242969.3242992","DOIUrl":"https://doi.org/10.1145/3242969.3242992","url":null,"abstract":"Automatic emotion recognition has long been developed by concentrating on modeling human expressive behavior. At the same time, neuro-scientific evidences have shown that the varied neuro-responses (i.e., blood oxygen level-dependent (BOLD) signals measured from the functional magnetic resonance imaging (fMRI)) is also a function on the types of emotion perceived. While past research has indicated that fusing acoustic features and fMRI improves the overall speech emotion recognition performance, obtaining fMRI data is not feasible in real world applications. In this work, we propose a cross modality adversarial network that jointly models the bi-directional generative relationship between acoustic features of speech samples and fMRI signals of human percetual responses by leveraging a parallel dataset. We encode the acoustic descriptors of a speech sample using the learned cross modality adversarial network to generate the fMRI-enriched acoustic vectors to be used in the emotion classifier. The generated fMRI-enriched acoustic vector is evaluated not only in the parallel dataset but also in an additional dataset without fMRI scanning. Our proposed framework significantly outperform using acoustic features only in a four-class emotion recognition task for both datasets, and the use of cyclic loss in learning the bi-directional mapping is also demonstrated to be crucial in achieving improved recognition rates.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114372257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Modeling Cognitive Processes from Multimodal Signals 从多模态信号建模认知过程

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3265861

F. Putze, Jutta Hild, A. Sano, Enkelejda Kasneci, E. Solovey, Tanja Schultz

引用次数: 4

Simultaneous Multimodal Access to Wheelchair and Computer for People with Tetraplegia 四肢瘫痪者同时使用轮椅和电脑的多模式通道

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3242980

M. N. Sahadat, Nordine Sebkhi, Maysam Ghovanloo

引用次数: 7