2015 International Conference on Affective Computing and Intelligent Interaction (ACII)最新文献_第3页

Harmony search for feature selection in speech emotion recognition 和谐搜索在语音情感识别中的特征选择

2015 International Conference on Affective Computing and Intelligent Interaction (ACII) Pub Date : 2015-09-21 DOI: 10.1109/ACII.2015.7344596

Yongsen Tao, Kunxia Wang, Jing Yang, Ning An, Lian Li

引用次数: 10

Emotion recognition in spontaneous and acted dialogues 自发和表演对话中的情绪识别

2015 International Conference on Affective Computing and Intelligent Interaction (ACII) Pub Date : 2015-09-21 DOI: 10.1109/ACII.2015.7344645

Leimin Tian, Johanna D. Moore, Catherine Lai

{"title":"Emotion recognition in spontaneous and acted dialogues","authors":"Leimin Tian, Johanna D. Moore, Catherine Lai","doi":"10.1109/ACII.2015.7344645","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344645","url":null,"abstract":"In this work, we compare emotion recognition on two types of speech: spontaneous and acted dialogues. Experiments were conducted on the AVEC2012 database of spontaneous dialogues and the IEMOCAP database of acted dialogues. We studied the performance of two types of acoustic features for emotion recognition: knowledge-inspired disfluency and nonverbal vocalisation (DIS-NV) features, and statistical Low-Level Descriptor (LLD) based features. Both Support Vector Machines (SVM) and Long Short-Term Memory Recurrent Neural Networks (LSTM-RNN) were built using each feature set on each emotional database. Our work aims to identify aspects of the data that constrain the effectiveness of models and features. Our results show that the performance of different types of features and models is influenced by the type of dialogue and the amount of training data. Because DIS-NVs are less frequent in acted dialogues than in spontaneous dialogues, the DIS-NV features perform better than the LLD features when recognizing emotions in spontaneous dialogues, but not in acted dialogues. The LSTM-RNN model gives better performance than the SVM model when there is enough training data, but the complex structure of a LSTM-RNN model may limit its performance when there is less training data available, and may also risk over-fitting. Additionally, we find that long distance contexts may be more useful when performing emotion recognition at the word level than at the utterance level.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"25 1","pages":"698-704"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73834455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 49

PhysSigTK: Enabling engagement experiments with physiological signals for game design PhysSigTK:为游戏设计提供生理信号的粘性实验

2015 International Conference on Affective Computing and Intelligent Interaction (ACII) Pub Date : 2015-09-21 DOI: 10.1109/ACII.2015.7344692

Stefan Rank, Cathy Lu

引用次数: 7

Utilizing multimodal cues to automatically evaluate public speaking performance 利用多模态线索自动评估公众演讲表现

2015 International Conference on Affective Computing and Intelligent Interaction (ACII) Pub Date : 2015-09-21 DOI: 10.1109/ACII.2015.7344601

L. Chen, C. W. Leong, G. Feng, Chong Min Lee, Swapna Somasundaran

{"title":"Utilizing multimodal cues to automatically evaluate public speaking performance","authors":"L. Chen, C. W. Leong, G. Feng, Chong Min Lee, Swapna Somasundaran","doi":"10.1109/ACII.2015.7344601","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344601","url":null,"abstract":"Public speaking, an important type of oral communication, is critical to success in both learning and career development. However, there is a lack of tools to efficiently and economically evaluate presenters' verbal and nonverbal behaviors. The recent advancements in automated scoring and multimodal sensing technologies may address this issue. We report a study on the development of an automated scoring model for public speaking performance using multimodal cues. A multimodal presentation corpus containing 14 subjects' 56 presentations has been recorded using a Microsoft Kinect depth camera. Task design, rubric development, and human rating were conducted according to standards in educational assessment. A rich set of multimodal features has been extracted from head poses, eye gazes, facial expressions, motion traces, speech signal, and transcripts. The model building experiment shows that jointly using both lexical/speech and visual features achieves more accurate scoring, which suggests the feasibility of using multimodal technologies in the assessment of public speaking skills.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"110 1","pages":"394-400"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81753144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

Learning speech emotion features by joint disentangling-discrimination 联合解缠-辨析学习语音情感特征

2015 International Conference on Affective Computing and Intelligent Interaction (ACII) Pub Date : 2015-09-21 DOI: 10.1109/ACII.2015.7344598

W. Xue, Zhengwei Huang, Xin Luo, Qi-rong Mao

{"title":"Learning speech emotion features by joint disentangling-discrimination","authors":"W. Xue, Zhengwei Huang, Xin Luo, Qi-rong Mao","doi":"10.1109/ACII.2015.7344598","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344598","url":null,"abstract":"Speech plays an important part in human-computer interaction. As a major branch of speech processing, speech emotion recognition (SER) has drawn much attention of researchers. Excellent discriminant features are of great importance in SER. However, emotion-specific features are commonly mixed with some other features. In this paper, we introduce an approach to pull apart these two parts of features as much as possible. First we employ an unsupervised feature learning framework to achieve some rough features. Then these rough features are further fed into a semi-supervised feature learning framework. In this phase, efforts are made to disentangle the emotion-specific features and some other features by using a novel loss function, which combines reconstruction penalty, orthogonal penalty, discriminative penalty and verification penalty. Orthogonal penalty is utilized to disentangle emotion-specific features and other features. The discriminative penalty enlarges inter-emotion variations, while the verification penalty reduces the intra-emotion variations. Evaluations on the FAU Aibo emotion database show that our approach can improve the speech emotion classification performance.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"29 1","pages":"374-379"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83079266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

3D emotional facial animation synthesis with factored conditional Restricted Boltzmann Machines 三维情感面部动画合成与因子条件受限玻尔兹曼机

2015 International Conference on Affective Computing and Intelligent Interaction (ACII) Pub Date : 2015-09-21 DOI: 10.1109/ACII.2015.7344664

Yong Zhao, D. Jiang, H. Sahli

引用次数: 2

Emotion, voices and musical instruments: Repeated exposure to angry vocal sounds makes instrumental sounds angrier 情绪、声音和乐器:反复听到愤怒的声音会使乐器的声音更愤怒

2015 International Conference on Affective Computing and Intelligent Interaction (ACII) Pub Date : 2015-09-21 DOI: 10.1109/ACII.2015.7344641

Casady Bowman, T. Yamauchi, Kunchen Xiao

{"title":"Emotion, voices and musical instruments: Repeated exposure to angry vocal sounds makes instrumental sounds angrier","authors":"Casady Bowman, T. Yamauchi, Kunchen Xiao","doi":"10.1109/ACII.2015.7344641","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344641","url":null,"abstract":"The perception of emotion is critical for social interactions. Nonlinguistic signals such as those in the human voice and musical instruments are used for communicating emotion. Using an adaptation paradigm, this study examines the extent to which common mental mechanisms are applied for emotion processing of instrumental and vocal sounds. In two experiments we show that prolonged exposure to affective non-linguistic vocalizations elicits auditory after effects when participants are tested on instrumental morphs (Experiment 1a), yet no aftereffects are apparent when participants are exposed to affective instrumental sounds and tested on non-linguistic voices (Experiment 1b). Specifically, results indicate that exposure to angry vocal sounds made participants perceive instrumental sounds as angrier and less fearful, but not vice versa. These findings suggest that there is a directionality for emotion perception in vocal and instrumental sounds. Significantly, this unidirectional relationship reveals that mechanisms used for emotion processing is likely to be shared from vocal sounds to instrumental sounds, but not vice versa.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"2012 1","pages":"670-676"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82621663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Multimodal emotion recognition in response to videos (Extended abstract) 视频响应的多模态情感识别(扩展摘要)

2015 International Conference on Affective Computing and Intelligent Interaction (ACII) Pub Date : 2015-09-21 DOI: 10.1109/ACII.2015.7344615

M. Soleymani, M. Pantic, T. Pun

{"title":"Multimodal emotion recognition in response to videos (Extended abstract)","authors":"M. Soleymani, M. Pantic, T. Pun","doi":"10.1109/ACII.2015.7344615","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344615","url":null,"abstract":"We present a user-independent emotion recognition method with the goal of detecting expected emotions or affective tags for videos using electroencephalogram (EEG), pupillary response and gaze distance. We first selected 20 video clips with extrinsic emotional content from movies and online resources. Then EEG responses and eye gaze data were recorded from 24 participants while watching emotional video clips. Ground truth was defined based on the median arousal and valence scores given to clips in a preliminary study. The arousal classes were calm, medium aroused and activated and the valence classes were unpleasant, neutral and pleasant. A one-participant-out cross validation was employed to evaluate the classification performance in a user-independent approach. The best classification accuracy of 68.5% for three labels of valence and 76.4% for three labels of arousal were obtained using a modality fusion strategy and a support vector machine. The results over a population of 24 participants demonstrate that user-independent emotion recognition can outperform individual self-reports for arousal assessments and do not underperform for valence assessments.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"45 1","pages":"491-497"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88150252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

A multi-label convolutional neural network approach to cross-domain action unit detection 一种多标签卷积神经网络跨域动作单元检测方法

2015 International Conference on Affective Computing and Intelligent Interaction (ACII) Pub Date : 2015-09-21 DOI: 10.1109/ACII.2015.7344632

Sayan Ghosh, Eugene Laksana, Stefan Scherer, Louis-Philippe Morency

引用次数: 62

An investigation of emotion changes from speech 对言语引起的情绪变化的研究

2015 International Conference on Affective Computing and Intelligent Interaction (ACII) Pub Date : 2015-09-21 DOI: 10.1109/ACII.2015.7344650

Zhaocheng Huang

引用次数: 5