基于分布外检测的对室内环境扭曲和非目标情绪的鲁棒情绪识别

ACM Transactions on Computing for Healthcare (HEALTH) Pub Date : 2021-12-20 DOI:10.1145/3492300

Ye Gao, Asif Salekin, Kristin D. Gordon, Karen Rose, Hongning Wang, J. Stankovic

{"title":"基于分布外检测的对室内环境扭曲和非目标情绪的鲁棒情绪识别","authors":"Ye Gao, Asif Salekin, Kristin D. Gordon, Karen Rose, Hongning Wang, J. Stankovic","doi":"10.1145/3492300","DOIUrl":null,"url":null,"abstract":"The rapid development of machine learning on acoustic signal processing has resulted in many solutions for detecting emotions from speech. Early works were developed for clean and acted speech and for a fixed set of emotions. Importantly, the datasets and solutions assumed that a person only exhibited one of these emotions. More recent work has continually been adding realism to emotion detection by considering issues such as reverberation, de-amplification, and background noise, but often considering one dataset at a time, and also assuming all emotions are accounted for in the model. We significantly improve realistic considerations for emotion detection by (i) more comprehensively assessing different situations by combining the five common publicly available datasets as one and enhancing the new dataset with data augmentation that considers reverberation and de-amplification, (ii) incorporating 11 typical home noises into the acoustics, and (iii) considering that in real situations a person may be exhibiting many emotions that are not currently of interest and they should not have to fit into a pre-fixed category nor be improperly labeled. Our novel solution combines CNN with out-of-data distribution detection. Our solution increases the situations where emotions can be effectively detected and outperforms a state-of-the-art baseline.","PeriodicalId":288903,"journal":{"name":"ACM Transactions on Computing for Healthcare (HEALTH)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Emotion Recognition Robust to Indoor Environmental Distortions and Non-targeted Emotions Using Out-of-distribution Detection\",\"authors\":\"Ye Gao, Asif Salekin, Kristin D. Gordon, Karen Rose, Hongning Wang, J. Stankovic\",\"doi\":\"10.1145/3492300\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The rapid development of machine learning on acoustic signal processing has resulted in many solutions for detecting emotions from speech. Early works were developed for clean and acted speech and for a fixed set of emotions. Importantly, the datasets and solutions assumed that a person only exhibited one of these emotions. More recent work has continually been adding realism to emotion detection by considering issues such as reverberation, de-amplification, and background noise, but often considering one dataset at a time, and also assuming all emotions are accounted for in the model. We significantly improve realistic considerations for emotion detection by (i) more comprehensively assessing different situations by combining the five common publicly available datasets as one and enhancing the new dataset with data augmentation that considers reverberation and de-amplification, (ii) incorporating 11 typical home noises into the acoustics, and (iii) considering that in real situations a person may be exhibiting many emotions that are not currently of interest and they should not have to fit into a pre-fixed category nor be improperly labeled. Our novel solution combines CNN with out-of-data distribution detection. Our solution increases the situations where emotions can be effectively detected and outperforms a state-of-the-art baseline.\",\"PeriodicalId\":288903,\"journal\":{\"name\":\"ACM Transactions on Computing for Healthcare (HEALTH)\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Computing for Healthcare (HEALTH)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3492300\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Computing for Healthcare (HEALTH)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3492300","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

机器学习在声学信号处理方面的快速发展为从语音中检测情绪提供了许多解决方案。早期的作品是为干净和表演的语言和一套固定的情感而开发的。重要的是，数据集和解决方案假设一个人只表现出其中一种情绪。最近的工作通过考虑混响、去放大和背景噪声等问题，不断为情绪检测添加现实性，但通常一次只考虑一个数据集，并且假设模型中考虑了所有情绪。我们通过(i)将五个常见的公开数据集合并为一个，并通过考虑混响和去放大的数据增强新数据集，更全面地评估不同的情况，从而显著改善了情感检测的现实考虑因素;(ii)将11种典型的家庭噪音纳入声学;(三)考虑到在现实情况下，一个人可能会表现出许多目前不感兴趣的情绪，他们不应该被归入预先确定的类别，也不应该被不恰当地贴上标签。我们的新解决方案结合了CNN和数据外分布检测。我们的解决方案增加了可以有效检测情绪的情况，并且优于最先进的基线。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Emotion Recognition Robust to Indoor Environmental Distortions and Non-targeted Emotions Using Out-of-distribution Detection

The rapid development of machine learning on acoustic signal processing has resulted in many solutions for detecting emotions from speech. Early works were developed for clean and acted speech and for a fixed set of emotions. Importantly, the datasets and solutions assumed that a person only exhibited one of these emotions. More recent work has continually been adding realism to emotion detection by considering issues such as reverberation, de-amplification, and background noise, but often considering one dataset at a time, and also assuming all emotions are accounted for in the model. We significantly improve realistic considerations for emotion detection by (i) more comprehensively assessing different situations by combining the five common publicly available datasets as one and enhancing the new dataset with data augmentation that considers reverberation and de-amplification, (ii) incorporating 11 typical home noises into the acoustics, and (iii) considering that in real situations a person may be exhibiting many emotions that are not currently of interest and they should not have to fit into a pre-fixed category nor be improperly labeled. Our novel solution combines CNN with out-of-data distribution detection. Our solution increases the situations where emotions can be effectively detected and outperforms a state-of-the-art baseline.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Computing for Healthcare (HEALTH)

自引率

0.00%

发文量