{"title":"An experimental study of speech emotion recognition based on deep convolutional neural networks","authors":"W. Zheng, Jian Yu, Yuexian Zou","doi":"10.1109/ACII.2015.7344669","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344669","url":null,"abstract":"Speech emotion recognition (SER) is a challenging task since it is unclear what kind of features are able to reflect the characteristics of human emotion from speech. However, traditional feature extractions perform inconsistently for different emotion recognition tasks. Obviously, different spectrogram provides information reflecting difference emotion. This paper proposes a systematical approach to implement an effectively emotion recognition system based on deep convolution neural networks (DCNNs) using labeled training audio data. Specifically, the log-spectrogram is computed and the principle component analysis (PCA) technique is used to reduce the dimensionality and suppress the interferences. Then the PCA whitened spectrogram is split into non-overlapping segments. The DCNN is constructed to learn the representation of the emotion from the segments with labeled training speech data. Our preliminary experiments show the proposed emotion recognition system based on DCNNs (containing 2 convolution and 2 pooling layers) achieves about 40% classification accuracy. Moreover, it also outperforms the SVM based classification using the hand-crafted acoustic features.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"19 1","pages":"827-831"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84317516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. McKeown, W. Curran, J. Wagner, F. Lingenfelser, E. André
{"title":"The Belfast storytelling database: A spontaneous social interaction database with laughter focused annotation","authors":"G. McKeown, W. Curran, J. Wagner, F. Lingenfelser, E. André","doi":"10.1109/ACII.2015.7344567","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344567","url":null,"abstract":"To support the endeavor of creating intelligent interfaces between computers and humans the use of training materials based on realistic human-human interactions has been recognized as a crucial task. One of the effects of the creation of these databases is an increased realization of the importance of often overlooked social signals and behaviours in organizing and orchestrating our interactions. Laughter is one of these key social signals; its importance in maintaining the smooth flow of human interaction has only recently become apparent in the embodied conversational agent domain. In turn, these realizations require training data that focus on these key social signals. This paper presents a database that is well annotated and theoretically constructed with respect to understanding laughter as it is used within human social interaction. Its construction, motivation, annotation and availability are presented in detail in this paper.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"30 1","pages":"166-172"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78146251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kazuyuki Matsumoto, Kyosuke Akita, Minoru Yoshida, K. Kita, F. Ren
{"title":"Estimate the intimacy of the characters based on their emotional states for application to non-task dialogue","authors":"Kazuyuki Matsumoto, Kyosuke Akita, Minoru Yoshida, K. Kita, F. Ren","doi":"10.1109/ACII.2015.7344591","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344591","url":null,"abstract":"Recently, a portable digital device equipped with voice guidance has been widely used with increasing the demand for the usability-conscious dialogue system. One of the problems with the existing dialogue system is its immature application to non-task dialogue. Non-task-oriented dialogue requires some schemes that enable smooth and flexible conversations with a user. For example, it would be possible to go beyond the closed relationship between the system and the user by considering the user's relationship with others in real life. In this paper, we focused on the dialogue made by the two characters in a drama scenario, and tried to express their relationship with a scale of “intimacy degree.” There will be such various elements related to the intimacy degree as the frequency of response to the utterance and the attitude of a speaker during the dialogue. We focused on the emotional state of the speaker during the utterance and tried to realize intimacy estimation with higher accuracy. As the evaluation result, we achieved higher accuracy in intimacy estimation than the existing method based on speech role.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"66 1","pages":"327-333"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85614794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ingo Siegert, Ronald Böck, A. Wendemuth, Bogdan Vlasenko
{"title":"Exploring dataset similarities using PCA-based feature selection","authors":"Ingo Siegert, Ronald Böck, A. Wendemuth, Bogdan Vlasenko","doi":"10.1109/ACII.2015.7344600","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344600","url":null,"abstract":"In emotion recognition from speech, several well-established corpora are used to date for the development of classification engines. The data is annotated differently, and the community in the field uses a variety of feature extraction schemes. The aim of this paper is to investigate promising features for individual corpora and then compare the results for proposing optimal features across data sets, introducing a new ranking method. Further, this enables us to present a method for automatic identification of groups of corpora with similar characteristics. This answers an urgent question in classifier development, namely whether data from different corpora is similar enough to jointly be used as training material, overcoming shortage of material in matching domains. We compare the results of this method with manual groupings of corpora. We consider the established emotional speech corpora AVIC, ABC, DES, EMO-DB, ENTERFACE, SAL, SMARTKOM, SUSAS and VAM, however our approach is general.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"62 1","pages":"387-393"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90738063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The enduring basis of emotional episodes: Towards a capacious overview","authors":"R. Cowie","doi":"10.1109/ACII.2015.7344557","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344557","url":null,"abstract":"It matters for affective computing to have a framework that brings key points about human emotion to mind in an orderly way. A natural option builds on the ancient view that overt emotion arises from interactions between rational awareness and systems of a different type whose functions are ongoing, but not obvious. Key ideas from modern research can be incorporated by assuming that the latter do five broad kinds of work: evaluating states of affairs; preparing us to act accordingly; learning from significant conjunctions; interrupting conscious processes if need be; and aligning us with other people. Multiple structures act as interfaces between those systems and rational awareness. Emotional feelings inform conscious awareness of what they are doing, and emotion words split the space of their activity into discrete regions. The picture is not ideal, but it offers a substantial organising device.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"30 1","pages":"98-104"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85733448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Personality test based on eye tracking techniques","authors":"Yun Zhang, Wei Xin, D. Miao","doi":"10.1109/ACII.2015.7344670","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344670","url":null,"abstract":"This paper presents an original research on eye tracking based personality test. To deal with the unavoidable human deception and inaccurate self-assessment during subjective psychological test, eye tracking techniques are utilized to reveal the participant's cognitive procedure during test. A non-intrusive real-time eye tracking based questionnaire system is developed for Chinese military recruitment personality test. A pilot study is carried out on 12 qualified samples. The preliminary result of experiment indicates a strong correlation between the participant's fixation features and test results. And such kind of relationship can be developed as an assistive indicator or a predictive parameter to traditional psychological test result to highly improve its reliability and validity in future applications.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"91 1","pages":"832-837"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86028647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Posed and spontaneous facial expression differentiation using deep Boltzmann machines","authors":"Quan Gan, Chongliang Wu, Shangfei Wang, Q. Ji","doi":"10.1109/ACII.2015.7344637","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344637","url":null,"abstract":"Current works on differentiating between posed and spontaneous facial expressions usually use features that are handcrafted for expression category recognition. Till now, no features have been specifically designed for differentiating between posed and spontaneous facial expressions. Recently, deep learning models have been proven to be efficient for many challenging computer vision tasks, and therefore in this paper we propose using the deep Boltzmann machine to learn representations of facial images and to differentiate between posed and spontaneous facial expressions. First, faces are located from images. Then, a two-layer deep Boltzmann machine is trained to distinguish posed and spon-tanous expressions. Experimental results on two benchmark datasets, i.e. the SPOS and USTC-NVIE datasets, demonstrate that the deep Boltzmann machine performs well on posed and spontaneous expression differentiation tasks. Comparison results on both datasets show that our method has an advantage over the other methods.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"116 1","pages":"643-648"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86065151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A temporally piece-wise fisher vector approach for depression analysis","authors":"Abhinav Dhall, Roland Göcke","doi":"10.1109/ACII.2015.7344580","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344580","url":null,"abstract":"Depression and other mood disorders are common, disabling disorders with a profound impact on individuals and families. Inspite of its high prevalence, it is easily missed during the early stages. Automatic depression analysis has become a very active field of research in the affective computing community in the past few years. This paper presents a framework for depression analysis based on unimodal visual cues. Temporally piece-wise Fisher Vectors (FV) are computed on temporal segments. As a low-level feature, block-wise Local Binary Pattern-Three Orthogonal Planes descriptors are computed. Statistical aggregation techniques are analysed and compared for creating a discriminative representative for a video sample. The paper explores the strength of FV in representing temporal segments in a spontaneous clinical data. This creates a meaningful representation of the facial dynamics in a temporal segment. The experiments are conducted on the Audio Video Emotion Challenge (AVEC) 2014 German speaking depression database. The superior results of the proposed framework show the effectiveness of the technique as compared to the current state-of-art.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"23 1","pages":"255-259"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83298186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Turing's menagerie: Talking lions, virtual bats, electric sheep and analogical peacocks: Common ground and common interest are necessary components of engagement","authors":"G. McKeown","doi":"10.1109/ACII.2015.7344689","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344689","url":null,"abstract":"This theoretical paper attempts to define some of the key components and challenges required to create embodied conversational agents that can be genuinely interesting conversational partners. Wittgenstein's argument concerning talking lions emphasizes the importance of having a shared common ground as a basis for conversational interactions. Virtual bats suggests that-for some people at least-it is important that there be a feeling of authenticity concerning a subjectively experiencing entity that can convey what it is like to be that entity. Electric sheep reminds us of the importance of empathy in human conversational interaction and that we should provide a full communicative repertoire of both verbal and non-verbal components if we are to create genuinely engaging interactions. Also we may be making the task more difficult rather than easy if we leave out non-verbal aspects of communication. Finally, analogical peacocks highlights the importance of between minds alignment and establishes a longer term goal of being interesting, creative, and humorous if an embodied conversational agent is to be truly an engaging conversational partner. Some potential directions and solutions to addressing these issues are suggested.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"20 1","pages":"950-955"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88076923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yu-Hao Chin, Po-Chuan Lin, Tzu-Chiang Tai, Jia-Ching Wang
{"title":"Genre based emotion annotation for music in noisy environment","authors":"Yu-Hao Chin, Po-Chuan Lin, Tzu-Chiang Tai, Jia-Ching Wang","doi":"10.1109/ACII.2015.7344675","DOIUrl":"https://doi.org/10.1109/ACII.2015.7344675","url":null,"abstract":"The music listened by human is sometimes exposed to noise. For example, background noise usually exists when listening to music in broadcasts or lives. The noise will worsen the performance in various music emotion recognition systems. To solve the problem, this work constructs a robust system for music emotion classification in a noisy environment. Furthermore, the genre is considered when determining the emotional label for the song. The proposed system consists of three major parts, i.e. subspace based noise suppression, genre index computation, and support vector machine (SVM). Firstly, the system uses noise suppression to remove the noise content in the signal. After that, acoustical features are extracted from each music clip. Next, a dictionary is constructed by using songs that cover a wide range of genres, and it is adopted to implement sparse coding. Via sparse coding, data can be transformed to sparse coefficient vectors, and this paper computes genre indexes for the music genres based on the sparse coefficient vector. The genre indexes are regarded as combination weights in the latter phase. At the training stage of the SVM, this paper train emotional models for each genre. At the prediction stage, the predictions that obtained by emotional models in each genre are weighted combined across all genres using the genre indexes. Finally, the proposed system annotates multiple emotional labels for a song based on the combined prediction. The experimental result shows that the system can achieve a good performance in both normal and noisy environments.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"29 1","pages":"863-866"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83086079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}