利用无监督迁移学习从语音信号中估计公共演讲焦虑

2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP) Pub Date : 2019-11-01 DOI:10.1109/GlobalSIP45357.2019.8969502

Kexin Feng, Megha Yadav, Md. Nazmus Sakib, A. Behzadan, Theodora Chaspari

{"title":"利用无监督迁移学习从语音信号中估计公共演讲焦虑","authors":"Kexin Feng, Megha Yadav, Md. Nazmus Sakib, A. Behzadan, Theodora Chaspari","doi":"10.1109/GlobalSIP45357.2019.8969502","DOIUrl":null,"url":null,"abstract":"Public speaking anxiety (PSA) ranks as a top social phobia across the world caused by various confounding factors. Motivated by the inherent data sparsity and lack of annotations in human-related applications, we propose unsupervised learning techniques to estimate PSA from speech signals. The labeled source data come from the publicly available CREMA-D dataset, while the unlabeled target data come from real-life public speaking tasks. Since fear is one of the major factors of PSA, the goal of this study is to build fear-specific representations from the labeled source data to estimate the degree of fear in the target data, and examine the extent to which the latter is associated with anxiety during the public speaking encounter. Transfer learning is performed through the domain-adversarial neural network (DANN) and Wasserstein generative adversarial network (WGAN). Results indicate that the proposed unsupervised fear- specific estimates can detect public speaking anxiety with Pearson’s correlation coefficient of 0.28 (p <0.01). When these fear- specific estimates are combined with the degree of an individual’s preparation for the public speaking task, obtained through selfreports, they yield Pearson’s correlation of 0.55 (p <0.01). These indicate the feasibility of leveraging labeled emotion-specific corpora for detecting human-related outcomes in real-life and provides a foundation for smart assistive technologies through the automated real-time estimation of anxiety during public speaking.","PeriodicalId":221378,"journal":{"name":"2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Estimating Public Speaking Anxiety from Speech Signals Using Unsupervised Transfer Learning\",\"authors\":\"Kexin Feng, Megha Yadav, Md. Nazmus Sakib, A. Behzadan, Theodora Chaspari\",\"doi\":\"10.1109/GlobalSIP45357.2019.8969502\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Public speaking anxiety (PSA) ranks as a top social phobia across the world caused by various confounding factors. Motivated by the inherent data sparsity and lack of annotations in human-related applications, we propose unsupervised learning techniques to estimate PSA from speech signals. The labeled source data come from the publicly available CREMA-D dataset, while the unlabeled target data come from real-life public speaking tasks. Since fear is one of the major factors of PSA, the goal of this study is to build fear-specific representations from the labeled source data to estimate the degree of fear in the target data, and examine the extent to which the latter is associated with anxiety during the public speaking encounter. Transfer learning is performed through the domain-adversarial neural network (DANN) and Wasserstein generative adversarial network (WGAN). Results indicate that the proposed unsupervised fear- specific estimates can detect public speaking anxiety with Pearson’s correlation coefficient of 0.28 (p <0.01). When these fear- specific estimates are combined with the degree of an individual’s preparation for the public speaking task, obtained through selfreports, they yield Pearson’s correlation of 0.55 (p <0.01). These indicate the feasibility of leveraging labeled emotion-specific corpora for detecting human-related outcomes in real-life and provides a foundation for smart assistive technologies through the automated real-time estimation of anxiety during public speaking.\",\"PeriodicalId\":221378,\"journal\":{\"name\":\"2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP)\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/GlobalSIP45357.2019.8969502\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GlobalSIP45357.2019.8969502","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

公共演讲焦虑症(Public speech anxiety, PSA)是世界上最常见的社交恐惧症，它是由多种混杂因素引起的。由于人类相关应用中固有的数据稀疏性和缺乏注释，我们提出了无监督学习技术来估计语音信号的PSA。标记的源数据来自公开可用的CREMA-D数据集，而未标记的目标数据来自现实生活中的公开演讲任务。由于恐惧是PSA的主要因素之一，因此本研究的目的是从标记的源数据中构建恐惧特异性表征，以估计目标数据中的恐惧程度，并检验后者在多大程度上与公共演讲遭遇中的焦虑相关。迁移学习是通过域对抗神经网络(DANN)和沃瑟斯坦生成对抗网络(WGAN)来实现的。结果表明，所提出的无监督恐惧特异性估计可以检测到公共演讲焦虑，Pearson相关系数为0.28 (p <0.01)。当这些特定恐惧的估计与个人通过自我报告获得的公开演讲任务的准备程度相结合时，它们产生了0.55 (p <0.01)的Pearson相关性。这表明了利用标记的情绪特定语料库来检测现实生活中与人类相关的结果的可行性，并为通过自动实时估计公共演讲期间的焦虑程度为智能辅助技术提供了基础。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Estimating Public Speaking Anxiety from Speech Signals Using Unsupervised Transfer Learning

Public speaking anxiety (PSA) ranks as a top social phobia across the world caused by various confounding factors. Motivated by the inherent data sparsity and lack of annotations in human-related applications, we propose unsupervised learning techniques to estimate PSA from speech signals. The labeled source data come from the publicly available CREMA-D dataset, while the unlabeled target data come from real-life public speaking tasks. Since fear is one of the major factors of PSA, the goal of this study is to build fear-specific representations from the labeled source data to estimate the degree of fear in the target data, and examine the extent to which the latter is associated with anxiety during the public speaking encounter. Transfer learning is performed through the domain-adversarial neural network (DANN) and Wasserstein generative adversarial network (WGAN). Results indicate that the proposed unsupervised fear- specific estimates can detect public speaking anxiety with Pearson’s correlation coefficient of 0.28 (p <0.01). When these fear- specific estimates are combined with the degree of an individual’s preparation for the public speaking task, obtained through selfreports, they yield Pearson’s correlation of 0.55 (p <0.01). These indicate the feasibility of leveraging labeled emotion-specific corpora for detecting human-related outcomes in real-life and provides a foundation for smart assistive technologies through the automated real-time estimation of anxiety during public speaking.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP)

自引率

0.00%

发文量