从韵律和凝视模式中推断出与对话中用户当前情绪相适应的情绪

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI:10.1145/3242969.3264974

Anindita Nath

{"title":"从韵律和凝视模式中推断出与对话中用户当前情绪相适应的情绪","authors":"Anindita Nath","doi":"10.1145/3242969.3264974","DOIUrl":null,"url":null,"abstract":"Multi-modal sentiment detection from natural video/audio streams has recently received much attention. I propose to use this multi-modal information to develop a technique, Sentiment Coloring , that utilizes the detected sentiments to generate effective responses. In particular, I aim to produce suggested responses colored with sentiment appropriate for that present in the interlocutor's speech. To achieve this, contextual information pertaining to sentiment, extracted from the past as well as the current speech of both the speakers in a dialog, will be utilized. Sentiment, here, includes the three polarities: positive, neutral and negative, as well as other expressions of stance and attitude. Utilizing only the non-verbal cues, namely, prosody and gaze, I will implement two algorithmic approaches and compare their performance in sentiment detection: a simple machine learning algorithm (neural networks), that will act as the baseline, and a deep learning approach, an end-to-end bidirectional LSTM RNN, which is the state-of-the-art in emotion classification. I will build a responsive spoken dialog system(s) with this Sentiment Coloring technique and evaluate the same with human subjects to measure benefits of the technique in various interactive environments.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"112 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Responding with Sentiment Appropriate for the User's Current Sentiment in Dialog as Inferred from Prosody and Gaze Patterns\",\"authors\":\"Anindita Nath\",\"doi\":\"10.1145/3242969.3264974\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multi-modal sentiment detection from natural video/audio streams has recently received much attention. I propose to use this multi-modal information to develop a technique, Sentiment Coloring , that utilizes the detected sentiments to generate effective responses. In particular, I aim to produce suggested responses colored with sentiment appropriate for that present in the interlocutor's speech. To achieve this, contextual information pertaining to sentiment, extracted from the past as well as the current speech of both the speakers in a dialog, will be utilized. Sentiment, here, includes the three polarities: positive, neutral and negative, as well as other expressions of stance and attitude. Utilizing only the non-verbal cues, namely, prosody and gaze, I will implement two algorithmic approaches and compare their performance in sentiment detection: a simple machine learning algorithm (neural networks), that will act as the baseline, and a deep learning approach, an end-to-end bidirectional LSTM RNN, which is the state-of-the-art in emotion classification. I will build a responsive spoken dialog system(s) with this Sentiment Coloring technique and evaluate the same with human subjects to measure benefits of the technique in various interactive environments.\",\"PeriodicalId\":308751,\"journal\":{\"name\":\"Proceedings of the 20th ACM International Conference on Multimodal Interaction\",\"volume\":\"112 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 20th ACM International Conference on Multimodal Interaction\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3242969.3264974\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3242969.3264974","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

基于自然视频/音频流的多模态情感检测近年来受到了广泛关注。我建议使用这种多模态信息来开发一种技术，情绪着色，利用检测到的情绪来产生有效的反应。特别是，我的目标是提出带有情感色彩的建议回应，适合对话者的演讲。为了实现这一点，将利用从对话中说话者的过去和当前讲话中提取的与情绪有关的上下文信息。这里的情绪包括积极、中性和消极三种极性，以及其他立场和态度的表达。仅利用非语言线索，即韵律和凝视，我将实现两种算法方法并比较它们在情感检测中的表现:一种简单的机器学习算法(神经网络)，将作为基线，以及一种深度学习方法，端到端双向LSTM RNN，这是最先进的情感分类。我将使用这种情感着色技术构建一个响应式口语对话系统，并对人类受试者进行评估，以衡量该技术在各种交互环境中的好处。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Responding with Sentiment Appropriate for the User's Current Sentiment in Dialog as Inferred from Prosody and Gaze Patterns

Multi-modal sentiment detection from natural video/audio streams has recently received much attention. I propose to use this multi-modal information to develop a technique, Sentiment Coloring , that utilizes the detected sentiments to generate effective responses. In particular, I aim to produce suggested responses colored with sentiment appropriate for that present in the interlocutor's speech. To achieve this, contextual information pertaining to sentiment, extracted from the past as well as the current speech of both the speakers in a dialog, will be utilized. Sentiment, here, includes the three polarities: positive, neutral and negative, as well as other expressions of stance and attitude. Utilizing only the non-verbal cues, namely, prosody and gaze, I will implement two algorithmic approaches and compare their performance in sentiment detection: a simple machine learning algorithm (neural networks), that will act as the baseline, and a deep learning approach, an end-to-end bidirectional LSTM RNN, which is the state-of-the-art in emotion classification. I will build a responsive spoken dialog system(s) with this Sentiment Coloring technique and evaluate the same with human subjects to measure benefits of the technique in various interactive environments.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 20th ACM International Conference on Multimodal Interaction

自引率

0.00%

发文量