Samuel Kim, P. Georgiou, Sungbok Lee, Shrikanth S. Narayanan
{"title":"Real-time Emotion Detection System using Speech: Multi-modal Fusion of Different Timescale Features","authors":"Samuel Kim, P. Georgiou, Sungbok Lee, Shrikanth S. Narayanan","doi":"10.1109/MMSP.2007.4412815","DOIUrl":null,"url":null,"abstract":"The goal of this work is to build a real-time emotion detection system which utilizes multi-modal fusion of different timescale features of speech. Conventional spectral and prosody features are used for intra-frame and supra-frame features respectively, and a new information fusion algorithm which takes care of the characteristics of each machine learning algorithm is introduced. In this framework, the proposed system can be associated with additional features, such as lexical or discourse information, in later steps. To verify the realtime system performance, binary decision tasks on angry and neutral emotion are performed using concatenated speech signal simulating realtime conditions.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"102","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 IEEE 9th Workshop on Multimedia Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MMSP.2007.4412815","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 102
Abstract
The goal of this work is to build a real-time emotion detection system which utilizes multi-modal fusion of different timescale features of speech. Conventional spectral and prosody features are used for intra-frame and supra-frame features respectively, and a new information fusion algorithm which takes care of the characteristics of each machine learning algorithm is introduced. In this framework, the proposed system can be associated with additional features, such as lexical or discourse information, in later steps. To verify the realtime system performance, binary decision tasks on angry and neutral emotion are performed using concatenated speech signal simulating realtime conditions.