MSP-Face Corpus: A Natural Audiovisual Emotional Database

Andrea Vidal, Ali N. Salman, Wei-Cheng Lin, C. Busso
{"title":"MSP-Face Corpus: A Natural Audiovisual Emotional Database","authors":"Andrea Vidal, Ali N. Salman, Wei-Cheng Lin, C. Busso","doi":"10.1145/3382507.3418872","DOIUrl":null,"url":null,"abstract":"Expressive behaviors conveyed during daily interactions are difficult to determine, because they often consist of a blend of different emotions. The complexity in expressive human communication is an important challenge to build and evaluate automatic systems that can reliably predict emotions. Emotion recognition systems are often trained with limited databases, where the emotions are either elicited or recorded by actors. These approaches do not necessarily reflect real emotions, creating a mismatch when the same emotion recognition systems are applied to practical applications. Developing rich emotional databases that reflect the complexity in the externalization of emotion is an important step to build better models to recognize emotions. This study presents the MSP-Face database, a natural audiovisual database obtained from video-sharing websites, where multiple individuals discuss various topics expressing their opinions and experiences. The natural recordings convey a broad range of emotions that are difficult to obtain with other alternative data collection protocols. A feature of the corpus is the addition of two sets. The first set includes videos that have been annotated with emotional labels using a crowd-sourcing protocol (9,370 recordings -- 24 hrs, 41 m). The second set includes similar videos without emotional labels (17,955 recordings -- 45 hrs, 57 m), offering the perfect infrastructure to explore semi-supervised and unsupervised machine-learning algorithms on natural emotional videos. This study describes the process of collecting and annotating the corpus. It also provides baselines over this new database using unimodal (audio, video) and multimodal emotional recognition systems.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"81 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 International Conference on Multimodal Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3382507.3418872","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

Abstract

Expressive behaviors conveyed during daily interactions are difficult to determine, because they often consist of a blend of different emotions. The complexity in expressive human communication is an important challenge to build and evaluate automatic systems that can reliably predict emotions. Emotion recognition systems are often trained with limited databases, where the emotions are either elicited or recorded by actors. These approaches do not necessarily reflect real emotions, creating a mismatch when the same emotion recognition systems are applied to practical applications. Developing rich emotional databases that reflect the complexity in the externalization of emotion is an important step to build better models to recognize emotions. This study presents the MSP-Face database, a natural audiovisual database obtained from video-sharing websites, where multiple individuals discuss various topics expressing their opinions and experiences. The natural recordings convey a broad range of emotions that are difficult to obtain with other alternative data collection protocols. A feature of the corpus is the addition of two sets. The first set includes videos that have been annotated with emotional labels using a crowd-sourcing protocol (9,370 recordings -- 24 hrs, 41 m). The second set includes similar videos without emotional labels (17,955 recordings -- 45 hrs, 57 m), offering the perfect infrastructure to explore semi-supervised and unsupervised machine-learning algorithms on natural emotional videos. This study describes the process of collecting and annotating the corpus. It also provides baselines over this new database using unimodal (audio, video) and multimodal emotional recognition systems.
面部语料库:一个自然的视听情感数据库
在日常互动中传达的表达性行为很难确定,因为它们通常由不同情绪的混合组成。表达性人类交流的复杂性是建立和评估能够可靠预测情绪的自动系统的重要挑战。情感识别系统通常使用有限的数据库进行训练,其中的情感要么是由演员激发的,要么是由演员记录的。这些方法不一定反映真实的情绪,当同样的情绪识别系统应用于实际应用时,会产生不匹配。开发反映情绪外化复杂性的丰富的情绪数据库是建立更好的情绪识别模型的重要一步。本研究提出了MSP-Face数据库,这是一个从视频分享网站获得的自然视听数据库,其中多个个体讨论各种主题,表达他们的观点和经验。自然记录传达了广泛的情感,这是其他替代数据收集协议难以获得的。语料库的一个特征是两个集合的加法。第一组包括使用众包协议标注了情感标签的视频(9,370段录音,24小时,41米)。第二组包括没有情感标签的类似视频(17,955段录音,45小时,57米),为探索自然情感视频的半监督和无监督机器学习算法提供了完美的基础设施。本研究描述了语料库的收集和注释过程。它还提供了使用单模态(音频,视频)和多模态情感识别系统的新数据库的基线。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信