印尼语语音情感自动识别(I-SpEAR)研究

Novita Belinda Wunarso, Y. Soelistio
{"title":"印尼语语音情感自动识别(I-SpEAR)研究","authors":"Novita Belinda Wunarso, Y. Soelistio","doi":"10.1109/conmedia.2017.8266038","DOIUrl":null,"url":null,"abstract":"Even though speech-emotion recognition (SER) has been receiving much attention as research topic, there are still some disputes about which vocal features can identify certain emotion. Emotion expression is also known to be differed according to the cultural backgrounds that make it important to study SER specific to the culture where the language belongs to. Furthermore, only a few studies addresses the SER in Indonesian which what this study attempts to explore. In this study, we extract simple features from 3420 voice data gathered from 38 participants. The features are compared by means of linear mixed effect model which shows that people who are in emotional and non-emotional state can be differentiated by their speech duration. Using SVM and speech duration as input feature, we achieve 76.84% average accuracy in classifying emotional and non-emotional speech.","PeriodicalId":403944,"journal":{"name":"2017 4th International Conference on New Media Studies (CONMEDIA)","volume":"2016 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Towards Indonesian speech-emotion automatic recognition (I-SpEAR)\",\"authors\":\"Novita Belinda Wunarso, Y. Soelistio\",\"doi\":\"10.1109/conmedia.2017.8266038\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Even though speech-emotion recognition (SER) has been receiving much attention as research topic, there are still some disputes about which vocal features can identify certain emotion. Emotion expression is also known to be differed according to the cultural backgrounds that make it important to study SER specific to the culture where the language belongs to. Furthermore, only a few studies addresses the SER in Indonesian which what this study attempts to explore. In this study, we extract simple features from 3420 voice data gathered from 38 participants. The features are compared by means of linear mixed effect model which shows that people who are in emotional and non-emotional state can be differentiated by their speech duration. Using SVM and speech duration as input feature, we achieve 76.84% average accuracy in classifying emotional and non-emotional speech.\",\"PeriodicalId\":403944,\"journal\":{\"name\":\"2017 4th International Conference on New Media Studies (CONMEDIA)\",\"volume\":\"2016 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-09-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 4th International Conference on New Media Studies (CONMEDIA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/conmedia.2017.8266038\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 4th International Conference on New Media Studies (CONMEDIA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/conmedia.2017.8266038","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

摘要

尽管语音情感识别作为一个研究课题受到了广泛的关注,但对于哪些语音特征可以识别某种情感,仍然存在一些争议。情感表达也因文化背景的不同而有所不同,这使得研究语言所属文化的特定情感表达变得很重要。此外,只有少数研究涉及本研究试图探索的印尼语SER。在这项研究中,我们从38名参与者收集的3420个语音数据中提取了简单的特征。利用线性混合效应模型对这些特征进行了比较,结果表明,言语持续时间可以区分情绪状态和非情绪状态。使用支持向量机和语音时长作为输入特征,对情感语音和非情感语音进行分类的平均准确率达到76.84%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Towards Indonesian speech-emotion automatic recognition (I-SpEAR)
Even though speech-emotion recognition (SER) has been receiving much attention as research topic, there are still some disputes about which vocal features can identify certain emotion. Emotion expression is also known to be differed according to the cultural backgrounds that make it important to study SER specific to the culture where the language belongs to. Furthermore, only a few studies addresses the SER in Indonesian which what this study attempts to explore. In this study, we extract simple features from 3420 voice data gathered from 38 participants. The features are compared by means of linear mixed effect model which shows that people who are in emotional and non-emotional state can be differentiated by their speech duration. Using SVM and speech duration as input feature, we achieve 76.84% average accuracy in classifying emotional and non-emotional speech.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信