Estimating Public Speaking Anxiety from Speech Signals Using Unsupervised Transfer Learning

Kexin Feng, Megha Yadav, Md. Nazmus Sakib, A. Behzadan, Theodora Chaspari
{"title":"Estimating Public Speaking Anxiety from Speech Signals Using Unsupervised Transfer Learning","authors":"Kexin Feng, Megha Yadav, Md. Nazmus Sakib, A. Behzadan, Theodora Chaspari","doi":"10.1109/GlobalSIP45357.2019.8969502","DOIUrl":null,"url":null,"abstract":"Public speaking anxiety (PSA) ranks as a top social phobia across the world caused by various confounding factors. Motivated by the inherent data sparsity and lack of annotations in human-related applications, we propose unsupervised learning techniques to estimate PSA from speech signals. The labeled source data come from the publicly available CREMA-D dataset, while the unlabeled target data come from real-life public speaking tasks. Since fear is one of the major factors of PSA, the goal of this study is to build fear-specific representations from the labeled source data to estimate the degree of fear in the target data, and examine the extent to which the latter is associated with anxiety during the public speaking encounter. Transfer learning is performed through the domain-adversarial neural network (DANN) and Wasserstein generative adversarial network (WGAN). Results indicate that the proposed unsupervised fear- specific estimates can detect public speaking anxiety with Pearson’s correlation coefficient of 0.28 (p <0.01). When these fear- specific estimates are combined with the degree of an individual’s preparation for the public speaking task, obtained through selfreports, they yield Pearson’s correlation of 0.55 (p <0.01). These indicate the feasibility of leveraging labeled emotion-specific corpora for detecting human-related outcomes in real-life and provides a foundation for smart assistive technologies through the automated real-time estimation of anxiety during public speaking.","PeriodicalId":221378,"journal":{"name":"2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GlobalSIP45357.2019.8969502","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Public speaking anxiety (PSA) ranks as a top social phobia across the world caused by various confounding factors. Motivated by the inherent data sparsity and lack of annotations in human-related applications, we propose unsupervised learning techniques to estimate PSA from speech signals. The labeled source data come from the publicly available CREMA-D dataset, while the unlabeled target data come from real-life public speaking tasks. Since fear is one of the major factors of PSA, the goal of this study is to build fear-specific representations from the labeled source data to estimate the degree of fear in the target data, and examine the extent to which the latter is associated with anxiety during the public speaking encounter. Transfer learning is performed through the domain-adversarial neural network (DANN) and Wasserstein generative adversarial network (WGAN). Results indicate that the proposed unsupervised fear- specific estimates can detect public speaking anxiety with Pearson’s correlation coefficient of 0.28 (p <0.01). When these fear- specific estimates are combined with the degree of an individual’s preparation for the public speaking task, obtained through selfreports, they yield Pearson’s correlation of 0.55 (p <0.01). These indicate the feasibility of leveraging labeled emotion-specific corpora for detecting human-related outcomes in real-life and provides a foundation for smart assistive technologies through the automated real-time estimation of anxiety during public speaking.
利用无监督迁移学习从语音信号中估计公共演讲焦虑
公共演讲焦虑症(Public speech anxiety, PSA)是世界上最常见的社交恐惧症,它是由多种混杂因素引起的。由于人类相关应用中固有的数据稀疏性和缺乏注释,我们提出了无监督学习技术来估计语音信号的PSA。标记的源数据来自公开可用的CREMA-D数据集,而未标记的目标数据来自现实生活中的公开演讲任务。由于恐惧是PSA的主要因素之一,因此本研究的目的是从标记的源数据中构建恐惧特异性表征,以估计目标数据中的恐惧程度,并检验后者在多大程度上与公共演讲遭遇中的焦虑相关。迁移学习是通过域对抗神经网络(DANN)和沃瑟斯坦生成对抗网络(WGAN)来实现的。结果表明,所提出的无监督恐惧特异性估计可以检测到公共演讲焦虑,Pearson相关系数为0.28 (p <0.01)。当这些特定恐惧的估计与个人通过自我报告获得的公开演讲任务的准备程度相结合时,它们产生了0.55 (p <0.01)的Pearson相关性。这表明了利用标记的情绪特定语料库来检测现实生活中与人类相关的结果的可行性,并为通过自动实时估计公共演讲期间的焦虑程度为智能辅助技术提供了基础。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信