{"title":"Towards Indonesian speech-emotion automatic recognition (I-SpEAR)","authors":"Novita Belinda Wunarso, Y. Soelistio","doi":"10.1109/conmedia.2017.8266038","DOIUrl":null,"url":null,"abstract":"Even though speech-emotion recognition (SER) has been receiving much attention as research topic, there are still some disputes about which vocal features can identify certain emotion. Emotion expression is also known to be differed according to the cultural backgrounds that make it important to study SER specific to the culture where the language belongs to. Furthermore, only a few studies addresses the SER in Indonesian which what this study attempts to explore. In this study, we extract simple features from 3420 voice data gathered from 38 participants. The features are compared by means of linear mixed effect model which shows that people who are in emotional and non-emotional state can be differentiated by their speech duration. Using SVM and speech duration as input feature, we achieve 76.84% average accuracy in classifying emotional and non-emotional speech.","PeriodicalId":403944,"journal":{"name":"2017 4th International Conference on New Media Studies (CONMEDIA)","volume":"2016 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 4th International Conference on New Media Studies (CONMEDIA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/conmedia.2017.8266038","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
Even though speech-emotion recognition (SER) has been receiving much attention as research topic, there are still some disputes about which vocal features can identify certain emotion. Emotion expression is also known to be differed according to the cultural backgrounds that make it important to study SER specific to the culture where the language belongs to. Furthermore, only a few studies addresses the SER in Indonesian which what this study attempts to explore. In this study, we extract simple features from 3420 voice data gathered from 38 participants. The features are compared by means of linear mixed effect model which shows that people who are in emotional and non-emotional state can be differentiated by their speech duration. Using SVM and speech duration as input feature, we achieve 76.84% average accuracy in classifying emotional and non-emotional speech.