三维骨骼运动增强的情感识别网络

IF 3.2 Q1 Computer Science
Jiaqi Shi, Chaoran Liu, C. Ishi, H. Ishiguro
{"title":"三维骨骼运动增强的情感识别网络","authors":"Jiaqi Shi, Chaoran Liu, C. Ishi, H. Ishiguro","doi":"10.1017/ATSIP.2021.11","DOIUrl":null,"url":null,"abstract":"Automatic emotion recognition has become an important trend in the fields of human–computer natural interaction and artificial intelligence. Although gesture is one of the most important components of nonverbal communication, which has a considerable impact on emotion recognition, it is rarely considered in the study of emotion recognition. An important reason is the lack of large open-source emotional databases containing skeletal movement data. In this paper, we extract three-dimensional skeleton information from videos and apply the method to IEMOCAP database to add a new modality. We propose an attention-based convolutional neural network which takes the extracted data as input to predict the speakers’ emotional state. We also propose a graph attention-based fusion method that combines our model with the models using other modalities, to provide complementary information in the emotion classification task and effectively fuse multimodal cues. The combined model utilizes audio signals, text information, and skeletal data. The performance of the model significantly outperforms the bimodal model and other fusion strategies, proving the effectiveness of the method.","PeriodicalId":44812,"journal":{"name":"APSIPA Transactions on Signal and Information Processing","volume":null,"pages":null},"PeriodicalIF":3.2000,"publicationDate":"2021-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"3D skeletal movement-enhanced emotion recognition networks\",\"authors\":\"Jiaqi Shi, Chaoran Liu, C. Ishi, H. Ishiguro\",\"doi\":\"10.1017/ATSIP.2021.11\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatic emotion recognition has become an important trend in the fields of human–computer natural interaction and artificial intelligence. Although gesture is one of the most important components of nonverbal communication, which has a considerable impact on emotion recognition, it is rarely considered in the study of emotion recognition. An important reason is the lack of large open-source emotional databases containing skeletal movement data. In this paper, we extract three-dimensional skeleton information from videos and apply the method to IEMOCAP database to add a new modality. We propose an attention-based convolutional neural network which takes the extracted data as input to predict the speakers’ emotional state. We also propose a graph attention-based fusion method that combines our model with the models using other modalities, to provide complementary information in the emotion classification task and effectively fuse multimodal cues. The combined model utilizes audio signals, text information, and skeletal data. The performance of the model significantly outperforms the bimodal model and other fusion strategies, proving the effectiveness of the method.\",\"PeriodicalId\":44812,\"journal\":{\"name\":\"APSIPA Transactions on Signal and Information Processing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2021-08-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"APSIPA Transactions on Signal and Information Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1017/ATSIP.2021.11\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"APSIPA Transactions on Signal and Information Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1017/ATSIP.2021.11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 1

摘要

情绪自动识别已成为人机自然交互和人工智能领域的一个重要趋势。尽管手势是非言语交际中最重要的组成部分之一,对情绪识别有着相当大的影响,但在情绪识别的研究中很少考虑它。一个重要的原因是缺乏包含骨骼运动数据的大型开源情感数据库。在本文中,我们从视频中提取三维骨架信息,并将该方法应用于IEMOCAP数据库以添加新的模态。我们提出了一种基于注意力的卷积神经网络,该网络以提取的数据为输入来预测说话人的情绪状态。我们还提出了一种基于图注意力的融合方法,将我们的模型与使用其他模态的模型相结合,以在情绪分类任务中提供互补信息,并有效地融合多模态线索。组合模型利用音频信号、文本信息和骨架数据。该模型的性能显著优于双峰模型和其他融合策略,证明了该方法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
3D skeletal movement-enhanced emotion recognition networks
Automatic emotion recognition has become an important trend in the fields of human–computer natural interaction and artificial intelligence. Although gesture is one of the most important components of nonverbal communication, which has a considerable impact on emotion recognition, it is rarely considered in the study of emotion recognition. An important reason is the lack of large open-source emotional databases containing skeletal movement data. In this paper, we extract three-dimensional skeleton information from videos and apply the method to IEMOCAP database to add a new modality. We propose an attention-based convolutional neural network which takes the extracted data as input to predict the speakers’ emotional state. We also propose a graph attention-based fusion method that combines our model with the models using other modalities, to provide complementary information in the emotion classification task and effectively fuse multimodal cues. The combined model utilizes audio signals, text information, and skeletal data. The performance of the model significantly outperforms the bimodal model and other fusion strategies, proving the effectiveness of the method.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
APSIPA Transactions on Signal and Information Processing
APSIPA Transactions on Signal and Information Processing ENGINEERING, ELECTRICAL & ELECTRONIC-
CiteScore
8.60
自引率
6.20%
发文量
30
审稿时长
40 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信