{"title":"利用无监督迁移学习从语音信号中估计公共演讲焦虑","authors":"Kexin Feng, Megha Yadav, Md. Nazmus Sakib, A. Behzadan, Theodora Chaspari","doi":"10.1109/GlobalSIP45357.2019.8969502","DOIUrl":null,"url":null,"abstract":"Public speaking anxiety (PSA) ranks as a top social phobia across the world caused by various confounding factors. Motivated by the inherent data sparsity and lack of annotations in human-related applications, we propose unsupervised learning techniques to estimate PSA from speech signals. The labeled source data come from the publicly available CREMA-D dataset, while the unlabeled target data come from real-life public speaking tasks. Since fear is one of the major factors of PSA, the goal of this study is to build fear-specific representations from the labeled source data to estimate the degree of fear in the target data, and examine the extent to which the latter is associated with anxiety during the public speaking encounter. Transfer learning is performed through the domain-adversarial neural network (DANN) and Wasserstein generative adversarial network (WGAN). Results indicate that the proposed unsupervised fear- specific estimates can detect public speaking anxiety with Pearson’s correlation coefficient of 0.28 (p <0.01). When these fear- specific estimates are combined with the degree of an individual’s preparation for the public speaking task, obtained through selfreports, they yield Pearson’s correlation of 0.55 (p <0.01). These indicate the feasibility of leveraging labeled emotion-specific corpora for detecting human-related outcomes in real-life and provides a foundation for smart assistive technologies through the automated real-time estimation of anxiety during public speaking.","PeriodicalId":221378,"journal":{"name":"2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Estimating Public Speaking Anxiety from Speech Signals Using Unsupervised Transfer Learning\",\"authors\":\"Kexin Feng, Megha Yadav, Md. Nazmus Sakib, A. Behzadan, Theodora Chaspari\",\"doi\":\"10.1109/GlobalSIP45357.2019.8969502\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Public speaking anxiety (PSA) ranks as a top social phobia across the world caused by various confounding factors. Motivated by the inherent data sparsity and lack of annotations in human-related applications, we propose unsupervised learning techniques to estimate PSA from speech signals. The labeled source data come from the publicly available CREMA-D dataset, while the unlabeled target data come from real-life public speaking tasks. Since fear is one of the major factors of PSA, the goal of this study is to build fear-specific representations from the labeled source data to estimate the degree of fear in the target data, and examine the extent to which the latter is associated with anxiety during the public speaking encounter. Transfer learning is performed through the domain-adversarial neural network (DANN) and Wasserstein generative adversarial network (WGAN). Results indicate that the proposed unsupervised fear- specific estimates can detect public speaking anxiety with Pearson’s correlation coefficient of 0.28 (p <0.01). When these fear- specific estimates are combined with the degree of an individual’s preparation for the public speaking task, obtained through selfreports, they yield Pearson’s correlation of 0.55 (p <0.01). These indicate the feasibility of leveraging labeled emotion-specific corpora for detecting human-related outcomes in real-life and provides a foundation for smart assistive technologies through the automated real-time estimation of anxiety during public speaking.\",\"PeriodicalId\":221378,\"journal\":{\"name\":\"2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP)\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/GlobalSIP45357.2019.8969502\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GlobalSIP45357.2019.8969502","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Estimating Public Speaking Anxiety from Speech Signals Using Unsupervised Transfer Learning
Public speaking anxiety (PSA) ranks as a top social phobia across the world caused by various confounding factors. Motivated by the inherent data sparsity and lack of annotations in human-related applications, we propose unsupervised learning techniques to estimate PSA from speech signals. The labeled source data come from the publicly available CREMA-D dataset, while the unlabeled target data come from real-life public speaking tasks. Since fear is one of the major factors of PSA, the goal of this study is to build fear-specific representations from the labeled source data to estimate the degree of fear in the target data, and examine the extent to which the latter is associated with anxiety during the public speaking encounter. Transfer learning is performed through the domain-adversarial neural network (DANN) and Wasserstein generative adversarial network (WGAN). Results indicate that the proposed unsupervised fear- specific estimates can detect public speaking anxiety with Pearson’s correlation coefficient of 0.28 (p <0.01). When these fear- specific estimates are combined with the degree of an individual’s preparation for the public speaking task, obtained through selfreports, they yield Pearson’s correlation of 0.55 (p <0.01). These indicate the feasibility of leveraging labeled emotion-specific corpora for detecting human-related outcomes in real-life and provides a foundation for smart assistive technologies through the automated real-time estimation of anxiety during public speaking.