Proceedings of the 20th ACM International Conference on Multimodal Interaction最新文献

筛选
英文 中文
Pen + Mid-Air Gestures: Eliciting Contextual Gestures 笔+半空中手势:引出上下文手势
Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3242979
Ilhan Aslan, Tabea Schmidt, Jens Woehrle, Lukas Vogel, E. André
{"title":"Pen + Mid-Air Gestures: Eliciting Contextual Gestures","authors":"Ilhan Aslan, Tabea Schmidt, Jens Woehrle, Lukas Vogel, E. André","doi":"10.1145/3242969.3242979","DOIUrl":"https://doi.org/10.1145/3242969.3242979","url":null,"abstract":"Combining mid-air gestures with pen input for bi-manual input on tablets has been reported as an alternative and attractive input technique in drawing applications. Previous work has also argued that mid-air gestural input can cause discomfort and arm fatigue over time, which can be addressed in a desktop setting by allowing users to gesture in alternative restful arm positions (e.g., elbow rests on desk). However, it is unclear if and how gesture preferences and gesture designs would be different for alternative arm positions. In order to inquire these research question we report on a user and choice based gesture elicitation study in which 10 participants designed gestures for different arm positions. We provide an in-depth qualitative analysis and detailed categorization of gestures, discussing commonalities and differences in the gesture sets based on a \"think aloud\" protocol, video recordings, and self-reports on user preferences.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134504532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Predicting ADHD Risk from Touch Interaction Data 从触摸交互数据预测ADHD风险
Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3242986
Philipp Mock, Maike Tibus, A. Ehlis, H. Baayen, Peter Gerjets
{"title":"Predicting ADHD Risk from Touch Interaction Data","authors":"Philipp Mock, Maike Tibus, A. Ehlis, H. Baayen, Peter Gerjets","doi":"10.1145/3242969.3242986","DOIUrl":"https://doi.org/10.1145/3242969.3242986","url":null,"abstract":"This paper presents a novel approach for automatic prediction of risk of ADHD in schoolchildren based on touch interaction data. We performed a study with 129 fourth-grade students solving math problems on a multiple-choice interface to obtain a large dataset of touch trajectories. Using Support Vector Machines, we analyzed the predictive power of such data for ADHD scales. For regression of overall ADHD scores, we achieve a mean squared error of 0.0962 on a four-point scale (R² = 0.5667). Classification accuracy for increased ADHD risk (upper vs. lower third of collected scores) is 91.1%.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115523790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Listening Skills Assessment through Computer Agents 通过计算机代理进行听力技能评估
Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3242970
Hiroki Tanaka, Hideki Negoro, H. Iwasaka, Satoshi Nakamura
{"title":"Listening Skills Assessment through Computer Agents","authors":"Hiroki Tanaka, Hideki Negoro, H. Iwasaka, Satoshi Nakamura","doi":"10.1145/3242969.3242970","DOIUrl":"https://doi.org/10.1145/3242969.3242970","url":null,"abstract":"Social skills training, performed by human trainers, is a well-established method for obtaining appropriate skills in social interaction. Previous work automated the process of social skills training by developing a dialogue system that teaches social skills through interaction with a computer agent. Even though previous work that simulated social skills training considered speaking skills, human social skills trainers take into account other skills such as listening. In this paper, we propose assessment of user listening skills during conversation with computer agents toward automated social skills training. We recorded data of 27 Japanese graduate students interacting with a female agent. The agent spoke to the participants about a recent memorable story and how to make a telephone call, and the participants listened. Two expert external raters assessed the participants' listening skills. We manually extracted features relating to eye fixation and behavioral cues of the participants, and confirmed that a simple linear regression with selected features can correctly predict a user's listening skills with above 0.45 correlation coefficient.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114510613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
RainCheck 延期
Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3243028
Ying-Chao Tung, Mayank Goel, Isaac Zinda, Jacob O. Wobbrock
{"title":"RainCheck","authors":"Ying-Chao Tung, Mayank Goel, Isaac Zinda, Jacob O. Wobbrock","doi":"10.1145/3242969.3243028","DOIUrl":"https://doi.org/10.1145/3242969.3243028","url":null,"abstract":"Modern smartphones are built with capacitive-sensing touchscreens, which can detect anything that is conductive or has a dielectric differential with air. The human finger is an example of such a dielectric, and works wonderfully with such touchscreens. However, touch interactions are disrupted by raindrops, water smear, and wet fingers because capacitive touchscreens cannot distinguish finger touches from other conductive materials. When users' screens get wet, the screen's usability is significantly reduced. RainCheck addresses this hazard by filtering out potential touch points caused by water to differentiate fingertips from raindrops and water smear, adapting in real-time to restore successful interaction to the user. Specifically, RainCheck uses the low-level raw sensor data from touchscreen drivers and employs precise selection techniques to resolve water-fingertip ambiguity. Our study shows that RainCheck improves gesture accuracy by 75.7%, touch accuracy by 47.9%, and target selection time by 80.0%, making it a successful remedy to interference caused by rain and other water.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"403 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115996589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
"Honey, I Learned to Talk": Multimodal Fusion for Behavior Analysis “亲爱的,我学会了说话”:行为分析的多模态融合
Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3242996
Shao-Yen Tseng, Haoqi Li, Brian R. Baucom, P. Georgiou
{"title":"\"Honey, I Learned to Talk\": Multimodal Fusion for Behavior Analysis","authors":"Shao-Yen Tseng, Haoqi Li, Brian R. Baucom, P. Georgiou","doi":"10.1145/3242969.3242996","DOIUrl":"https://doi.org/10.1145/3242969.3242996","url":null,"abstract":"In this work we analyze the importance of lexical and acoustic modalities in behavioral expression and perception. We demonstrate that this importance relates to the amount of therapy, and hence communication training, that a person received. It also exhibits some relationship to gender. We proceed to provide an analysis on couple therapy data by splitting the data into clusters based on gender or stage in therapy. Our analysis demonstrates the significant difference between optimal modality weights per cluster and relationship to therapy stage. Given this finding we propose the use of communication-skill aware fusion models to account for these differences in modality importance. The fusion models operate on partitions of the data according to the gender of the speaker or the therapy stage of the couple. We show that while most multimodal fusion methods can improve mean absolute error of behavioral estimates, the best results are given by a model that considers the degree of communication training among the interlocutors.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129248348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
EVA: A Multimodal Argumentative Dialogue System EVA:多模态辩论对话系统
Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3266292
Niklas Rach, Klaus Weber, Louisa Pragst, E. André, W. Minker, Stefan Ultes
{"title":"EVA: A Multimodal Argumentative Dialogue System","authors":"Niklas Rach, Klaus Weber, Louisa Pragst, E. André, W. Minker, Stefan Ultes","doi":"10.1145/3242969.3266292","DOIUrl":"https://doi.org/10.1145/3242969.3266292","url":null,"abstract":"This work introduces EVA, a multimodal argumentative Dialogue System that is capable of discussing controversial topics with the user. The interaction is structured as an argument game in which the user and the system select respective moves in order to convince their opponent. EVA's response is presented as a natural language utterance by a virtual agent that supports the respective content using characteristic gestures and mimic.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125591748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Multiple Spatio-temporal Feature Learning for Video-based Emotion Recognition in the Wild 基于视频的情绪识别的多时空特征学习
Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3264992
Cheng Lu, Wenming Zheng, Chaolong Li, Chuangao Tang, Suyuan Liu, Simeng Yan, Yuan Zong
{"title":"Multiple Spatio-temporal Feature Learning for Video-based Emotion Recognition in the Wild","authors":"Cheng Lu, Wenming Zheng, Chaolong Li, Chuangao Tang, Suyuan Liu, Simeng Yan, Yuan Zong","doi":"10.1145/3242969.3264992","DOIUrl":"https://doi.org/10.1145/3242969.3264992","url":null,"abstract":"The difficulty of emotion recognition in the wild (EmotiW) is how to train a robust model to deal with diverse scenarios and anomalies. The Audio-video Sub-challenge in EmotiW contains audio-video short clips with several emotional labels and the task is to distinguish which label the video belongs to. For the better emotion recognition in videos, we propose a multiple spatio-temporal feature fusion (MSFF) framework, which can more accurately depict emotional information in spatial and temporal dimensions by two mutually complementary sources, including the facial image and audio. The framework is consisted of two parts: the facial image model and the audio model. With respect to the facial image model, three different architectures of spatial-temporal neural networks are employed to extract discriminative features about different emotions in facial expression images. Firstly, the high-level spatial features are obtained by the pre-trained convolutional neural networks (CNN), including VGG-Face and ResNet-50 which are all fed with the images generated by each video. Then, the features of all frames are sequentially input to the Bi-directional Long Short-Term Memory (BLSTM) so as to capture dynamic variations of facial appearance textures in a video. In addition to the structure of CNN-RNN, another spatio-temporal network, namely deep 3-Dimensional Convolutional Neural Networks (3D CNN) by extending the 2D convolution kernel to 3D, is also applied to attain evolving emotional information encoded in multiple adjacent frames. For the audio model, the spectrogram images of speech generated by preprocessing audio, are also modeled in a VGG-BLSTM framework to characterize the affective fluctuation more efficiently. Finally, a fusion strategy with the score matrices of different spatio-temporal networks gained from the above framework is proposed to boost the performance of emotion recognition complementally. Extensive experiments show that the overall accuracy of our proposed MSFF is 60.64%, which achieves a large improvement compared with the baseline and outperform the result of champion team in 2017.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115616910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
Generating fMRI-Enriched Acoustic Vectors using a Cross-Modality Adversarial Network for Emotion Recognition 使用跨模态对抗网络生成富fmri的声音向量用于情绪识别
Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3242992
Gao-Yi Chao, Chun-Min Chang, Jeng-Lin Li, Ya-Tse Wu, Chi-Chun Lee
{"title":"Generating fMRI-Enriched Acoustic Vectors using a Cross-Modality Adversarial Network for Emotion Recognition","authors":"Gao-Yi Chao, Chun-Min Chang, Jeng-Lin Li, Ya-Tse Wu, Chi-Chun Lee","doi":"10.1145/3242969.3242992","DOIUrl":"https://doi.org/10.1145/3242969.3242992","url":null,"abstract":"Automatic emotion recognition has long been developed by concentrating on modeling human expressive behavior. At the same time, neuro-scientific evidences have shown that the varied neuro-responses (i.e., blood oxygen level-dependent (BOLD) signals measured from the functional magnetic resonance imaging (fMRI)) is also a function on the types of emotion perceived. While past research has indicated that fusing acoustic features and fMRI improves the overall speech emotion recognition performance, obtaining fMRI data is not feasible in real world applications. In this work, we propose a cross modality adversarial network that jointly models the bi-directional generative relationship between acoustic features of speech samples and fMRI signals of human percetual responses by leveraging a parallel dataset. We encode the acoustic descriptors of a speech sample using the learned cross modality adversarial network to generate the fMRI-enriched acoustic vectors to be used in the emotion classifier. The generated fMRI-enriched acoustic vector is evaluated not only in the parallel dataset but also in an additional dataset without fMRI scanning. Our proposed framework significantly outperform using acoustic features only in a four-class emotion recognition task for both datasets, and the use of cyclic loss in learning the bi-directional mapping is also demonstrated to be crucial in achieving improved recognition rates.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114372257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Modeling Cognitive Processes from Multimodal Signals 从多模态信号建模认知过程
Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3265861
F. Putze, Jutta Hild, A. Sano, Enkelejda Kasneci, E. Solovey, Tanja Schultz
{"title":"Modeling Cognitive Processes from Multimodal Signals","authors":"F. Putze, Jutta Hild, A. Sano, Enkelejda Kasneci, E. Solovey, Tanja Schultz","doi":"10.1145/3242969.3265861","DOIUrl":"https://doi.org/10.1145/3242969.3265861","url":null,"abstract":"Multimodal signals allow us to gain insights into internal cognitive processes of a person, for example: speech and gesture analysis yields cues about hesitations, knowledgeability, or alertness, eye tracking yields information about a person's focus of attention, task, or cognitive state, EEG yields information about a person's cognitive load or information appraisal. Capturing cognitive processes is an important research tool to understand human behavior as well as a crucial part of a user model to an adaptive interactive system such as a robot or a tutoring system. As cognitive processes are often multifaceted, a comprehensive model requires the combination of multiple complementary signals. In this workshop at the ACM International Conference on Multimodal Interfaces (ICMI) conference in Boulder, Colorado, USA, we discussed the state-of-the-art in monitoring and modeling cognitive processes from multi-modal signals.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117198022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Simultaneous Multimodal Access to Wheelchair and Computer for People with Tetraplegia 四肢瘫痪者同时使用轮椅和电脑的多模式通道
Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3242980
M. N. Sahadat, Nordine Sebkhi, Maysam Ghovanloo
{"title":"Simultaneous Multimodal Access to Wheelchair and Computer for People with Tetraplegia","authors":"M. N. Sahadat, Nordine Sebkhi, Maysam Ghovanloo","doi":"10.1145/3242969.3242980","DOIUrl":"https://doi.org/10.1145/3242969.3242980","url":null,"abstract":"Existing assistive technologies often capture and utilize a single remaining ability to assist people with tetraplegia which is unable to do complex interaction efficiently. In this work, we developed a multimodal assistive system (MAS) to utilize multiple remaining abilities (speech, tongue, and head motion) sequentially or simultaneously to facilitate complex computer interactions such as scrolling, drag and drop, and typing long sentences. Inputs of MAS can be used to drive a wheelchair using only tongue motion, mouse functionalities (e.g., clicks, navigation) by combining the tongue and head motions. To enhance seamless interface, MAS processes both head and tongue motions in the headset with an average accuracy of 88.5%. In a pilot study, a modified center-out tapping task was performed by four able-bodied participants to navigate cursor, using head tracking, click using tongue command, and text entry through speech recognition, respectively. The average throughput in the final round was 1.28 bits/s and a cursor navigation path efficiency of 68.62%.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121589009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信