Neural Network Model for Visualization of Conversational Mood with Four Adjective Pairs

Koichi Yamagata, Koya Kawahara, Yuto Suzuki, Yuki Nakahodo, Shunsuke Ito, Haruka Matsukura, Maki Sakamoto
{"title":"Neural Network Model for Visualization of Conversational Mood with Four Adjective Pairs","authors":"Koichi Yamagata, Koya Kawahara, Yuto Suzuki, Yuki Nakahodo, Shunsuke Ito, Haruka Matsukura, Maki Sakamoto","doi":"10.54941/ahfe1004396","DOIUrl":null,"url":null,"abstract":"In recent years, the accuracy of speech recognition has improved remarkably. Speech recognition software can be used to obtain text information from conversational speech data. Although text can be treated as surface level information, several studies have indicated that speech recognition can also be used to estimate emotions, which represent higher level information in a conversation. Several newly proposed models use LSTM or GRU to estimate emotion in conversations. However, when attempting to monitor or influence conversations conducted as part of a meeting or a chat, the mood of the conversation is more important than the emotion. In normal conversation, emotions such as anger and sadness are unlikely to be explicitly expressed for some purposes, including avoidance of getting into an unexpected argument and offending others. Thus, when attempting to control or monitor the state of a conversation during a meeting or casual discussion, it is often more important to estimate the mood than the emotion. Some researchers have examined the role of mood, as distinguished from emotion, and one called diffuse emotional states that persist over a long period of time \"mood\" and are usually distinguished based on duration and intensity of expression. However, these differences are rarely quantified, and no specific durations are fixed. Accurate identification of the mood of a conversation is especially important for Japanese people who are engaged in collaborative and democratic decision making. To construct the teacher data for the model designed to estimate the conversational mood, we first selected representative adjective pairs that could describe the conversational mood. We utilized a system developed by Iiba et al. to estimate 21 affective scales of adjective pairs from input text. The 21 adjective pairs were clustered into 4 groups based on the output scales. The 4 adjective pairs to be annotated were representative of the 4 clusters. We expected these 4 adjective pairs (gloomy-happy, easy-serious, calm-aggressive, tidy-messy) to capture the mood of a conversation.Based on the four adjective pairs, we constructed a new training data set containing 60 hours of conversations in Japanese. In this study, the data obtained only by microphones are used for estimation of conversational mood. The data set was annotated by the four adjective scales to learn the mood of the conversations. We de-veloped a LSTM deep neural network model that could read the \"conversational mood\" in real time. Furthermore, in our proposed neural network model, the amount of laughter which is generally measured by capturing facial expression with camera is also estimated together with the conversational mood. Because laughter is considered to play an important role in creating a cheerful environment, it can be used to evaluate the conversational mood. The evaluation results are shown to present the validity of our model. This model is expected to be applied to a system that can influence or control the mood of conversations in some ways, including presentation of ambient music and aromas, depending on the purpose of the discussion, such as during a conference, chatting, or business meeting.","PeriodicalId":470195,"journal":{"name":"AHFE international","volume":"128 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AHFE international","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.54941/ahfe1004396","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In recent years, the accuracy of speech recognition has improved remarkably. Speech recognition software can be used to obtain text information from conversational speech data. Although text can be treated as surface level information, several studies have indicated that speech recognition can also be used to estimate emotions, which represent higher level information in a conversation. Several newly proposed models use LSTM or GRU to estimate emotion in conversations. However, when attempting to monitor or influence conversations conducted as part of a meeting or a chat, the mood of the conversation is more important than the emotion. In normal conversation, emotions such as anger and sadness are unlikely to be explicitly expressed for some purposes, including avoidance of getting into an unexpected argument and offending others. Thus, when attempting to control or monitor the state of a conversation during a meeting or casual discussion, it is often more important to estimate the mood than the emotion. Some researchers have examined the role of mood, as distinguished from emotion, and one called diffuse emotional states that persist over a long period of time "mood" and are usually distinguished based on duration and intensity of expression. However, these differences are rarely quantified, and no specific durations are fixed. Accurate identification of the mood of a conversation is especially important for Japanese people who are engaged in collaborative and democratic decision making. To construct the teacher data for the model designed to estimate the conversational mood, we first selected representative adjective pairs that could describe the conversational mood. We utilized a system developed by Iiba et al. to estimate 21 affective scales of adjective pairs from input text. The 21 adjective pairs were clustered into 4 groups based on the output scales. The 4 adjective pairs to be annotated were representative of the 4 clusters. We expected these 4 adjective pairs (gloomy-happy, easy-serious, calm-aggressive, tidy-messy) to capture the mood of a conversation.Based on the four adjective pairs, we constructed a new training data set containing 60 hours of conversations in Japanese. In this study, the data obtained only by microphones are used for estimation of conversational mood. The data set was annotated by the four adjective scales to learn the mood of the conversations. We de-veloped a LSTM deep neural network model that could read the "conversational mood" in real time. Furthermore, in our proposed neural network model, the amount of laughter which is generally measured by capturing facial expression with camera is also estimated together with the conversational mood. Because laughter is considered to play an important role in creating a cheerful environment, it can be used to evaluate the conversational mood. The evaluation results are shown to present the validity of our model. This model is expected to be applied to a system that can influence or control the mood of conversations in some ways, including presentation of ambient music and aromas, depending on the purpose of the discussion, such as during a conference, chatting, or business meeting.
四形容词对会话情绪可视化的神经网络模型
近年来,语音识别的准确率有了显著提高。语音识别软件可用于从会话语音数据中获取文本信息。虽然文本可以被视为表层信息,但一些研究表明,语音识别也可以用来估计情感,这代表了对话中更高层次的信息。几个新提出的模型使用LSTM或GRU来估计对话中的情绪。然而,当试图监控或影响作为会议或聊天的一部分进行的谈话时,谈话的情绪比情绪更重要。在正常的谈话中,出于某些目的,包括避免陷入意想不到的争吵和冒犯他人,不太可能明确表达愤怒和悲伤等情绪。因此,当试图在会议或随意讨论中控制或监视谈话状态时,估计情绪通常比情绪更重要。一些研究人员已经研究了情绪的作用,将其与情感区分开来,其中一种称为弥漫情绪状态,这种状态持续很长一段时间,通常根据表达的持续时间和强度来区分。然而,这些差异很少被量化,也没有固定的具体持续时间。准确识别谈话的情绪对于从事协作和民主决策的日本人来说尤为重要。为了构建用于估计会话情绪的模型的教师数据,我们首先选择了能够描述会话情绪的代表性形容词对。我们利用Iiba等人开发的系统从输入文本中估计了形容词对的21个情感尺度。根据输出尺度将21个形容词对聚为4组。待标注的4对形容词对在4个聚类中具有代表性。我们期望这4对形容词(忧郁-快乐,轻松-严肃,冷静-进取,整洁-凌乱)来捕捉谈话的情绪。基于这四个形容词对,我们构建了一个包含60小时日语会话的新训练数据集。在本研究中,仅使用麦克风获得的数据来估计会话情绪。数据集通过四个形容词尺度进行注释,以了解对话的情绪。我们开发了一个LSTM深度神经网络模型,可以实时读取“会话情绪”。此外,在我们提出的神经网络模型中,通常通过用相机捕捉面部表情来测量的笑声量也与会话情绪一起估计。因为笑声被认为在创造一个愉快的环境中起着重要的作用,它可以用来评估谈话的情绪。评价结果表明了模型的有效性。该模型有望应用于能够以某种方式影响或控制对话情绪的系统,包括根据讨论的目的(例如在会议、聊天或商务会议期间)呈现环境音乐和气味。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信