多模态多任务深度学习与电视连续剧数据的情感识别

2018 Oriental COCOSDA - International Conference on Speech Database and Assessments Pub Date : 2018-05-07 DOI:10.1109/ICSDA.2018.8693020

Sashi Novitasari, Quoc Truong Do, S. Sakti, D. Lestari, Satoshi Nakamura

{"title":"多模态多任务深度学习与电视连续剧数据的情感识别","authors":"Sashi Novitasari, Quoc Truong Do, S. Sakti, D. Lestari, Satoshi Nakamura","doi":"10.1109/ICSDA.2018.8693020","DOIUrl":null,"url":null,"abstract":"Since paralinguistic aspects must be considered to understand speech, we construct a deep learning framework that utilizes multi-modal features to simultaneously recognize both speakers and emotions. There are three kinds of feature modalities: acoustic, lexical, and facial. To fuse the features from multiple modalities, we experimented on three methods: majority voting, concatenation, and hierarchical fusion. The recognition was done from TV-series dataset that simulate actual conversations.","PeriodicalId":303819,"journal":{"name":"2018 Oriental COCOSDA - International Conference on Speech Database and Assessments","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Multi-Modal Multi-Task Deep Learning For Speaker And Emotion Recognition Of TV-Series Data\",\"authors\":\"Sashi Novitasari, Quoc Truong Do, S. Sakti, D. Lestari, Satoshi Nakamura\",\"doi\":\"10.1109/ICSDA.2018.8693020\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Since paralinguistic aspects must be considered to understand speech, we construct a deep learning framework that utilizes multi-modal features to simultaneously recognize both speakers and emotions. There are three kinds of feature modalities: acoustic, lexical, and facial. To fuse the features from multiple modalities, we experimented on three methods: majority voting, concatenation, and hierarchical fusion. The recognition was done from TV-series dataset that simulate actual conversations.\",\"PeriodicalId\":303819,\"journal\":{\"name\":\"2018 Oriental COCOSDA - International Conference on Speech Database and Assessments\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-05-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 Oriental COCOSDA - International Conference on Speech Database and Assessments\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSDA.2018.8693020\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 Oriental COCOSDA - International Conference on Speech Database and Assessments","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSDA.2018.8693020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

由于必须考虑副语言方面来理解语音，因此我们构建了一个利用多模态特征同时识别说话者和情绪的深度学习框架。有三种特征形态:声学、词汇和面部。为了融合来自多种模式的特征，我们实验了三种方法:多数投票、串联和分层融合。这种识别是通过模拟真实对话的电视剧数据集完成的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multi-Modal Multi-Task Deep Learning For Speaker And Emotion Recognition Of TV-Series Data

Since paralinguistic aspects must be considered to understand speech, we construct a deep learning framework that utilizes multi-modal features to simultaneously recognize both speakers and emotions. There are three kinds of feature modalities: acoustic, lexical, and facial. To fuse the features from multiple modalities, we experimented on three methods: majority voting, concatenation, and hierarchical fusion. The recognition was done from TV-series dataset that simulate actual conversations.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 Oriental COCOSDA - International Conference on Speech Database and Assessments

自引率

0.00%

发文量