语音情感识别方法:系统综述

IF 2.4 3区计算机科学 Q2 ACOUSTICS

Speech Communication Pub Date : 2023-10-01 DOI:10.1016/j.specom.2023.102974

Ahlam Hashem, Muhammad Arif, Manal Alghamdi

{"title":"语音情感识别方法:系统综述","authors":"Ahlam Hashem, Muhammad Arif, Manal Alghamdi","doi":"10.1016/j.specom.2023.102974","DOIUrl":null,"url":null,"abstract":"<div><p>The speech emotion recognition (SER) field has been active since it became a crucial feature in advanced Human–Computer Interaction (HCI), and wide real-life applications use it. In recent years, numerous SER systems have been covered by researchers, including the availability of appropriate emotional databases, selecting robustness features, and applying suitable classifiers using Machine Learning (ML) and Deep Learning (DL). Deep models proved to perform more accurately for SER than conventional ML techniques. Nevertheless, SER is yet challenging for classification where to separate similar emotional patterns; it needs a highly discriminative feature representation. For this purpose, this survey aims to critically analyze what is being done in this field of research in light of previous studies that aim to recognize emotions using speech audio in different aspects and review the current state of SER using DL. Through a systematic literature review whereby searching selected keywords from 2012–2022, 96 papers were extracted and covered the most current findings and directions. Specifically, we covered the database (acted, evoked, and natural) and features (prosodic, spectral, voice quality, and teager energy operator), the necessary preprocessing steps. Furthermore, different DL models and their performance are examined in depth. Based on our review, we also suggested SER aspects that could be considered in the future.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"154 ","pages":"Article 102974"},"PeriodicalIF":2.4000,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Speech emotion recognition approaches: A systematic review\",\"authors\":\"Ahlam Hashem, Muhammad Arif, Manal Alghamdi\",\"doi\":\"10.1016/j.specom.2023.102974\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The speech emotion recognition (SER) field has been active since it became a crucial feature in advanced Human–Computer Interaction (HCI), and wide real-life applications use it. In recent years, numerous SER systems have been covered by researchers, including the availability of appropriate emotional databases, selecting robustness features, and applying suitable classifiers using Machine Learning (ML) and Deep Learning (DL). Deep models proved to perform more accurately for SER than conventional ML techniques. Nevertheless, SER is yet challenging for classification where to separate similar emotional patterns; it needs a highly discriminative feature representation. For this purpose, this survey aims to critically analyze what is being done in this field of research in light of previous studies that aim to recognize emotions using speech audio in different aspects and review the current state of SER using DL. Through a systematic literature review whereby searching selected keywords from 2012–2022, 96 papers were extracted and covered the most current findings and directions. Specifically, we covered the database (acted, evoked, and natural) and features (prosodic, spectral, voice quality, and teager energy operator), the necessary preprocessing steps. Furthermore, different DL models and their performance are examined in depth. Based on our review, we also suggested SER aspects that could be considered in the future.</p></div>\",\"PeriodicalId\":49485,\"journal\":{\"name\":\"Speech Communication\",\"volume\":\"154 \",\"pages\":\"Article 102974\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2023-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Speech Communication\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167639323001085\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ACOUSTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Speech Communication","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167639323001085","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ACOUSTICS","Score":null,"Total":0}

引用次数: 1

摘要

语音情感识别(SER)领域自成为高级人机交互(HCI)的一个重要特征以来一直活跃，并在现实生活中得到广泛应用。近年来，研究人员已经研究了许多SER系统，包括适当的情感数据库的可用性，选择鲁棒性特征，以及使用机器学习(ML)和深度学习(DL)应用合适的分类器。深度模型被证明比传统的机器学习技术更准确地执行SER。然而，SER在分类中仍然具有挑战性，在哪里分离相似的情感模式;它需要一个高度判别的特征表示。为此，本调查旨在根据先前旨在利用语音音频从不同方面识别情绪的研究，批判性地分析这一领域的研究进展，并回顾使用深度学习的SER的现状。通过检索2012-2022年的关键词进行系统的文献综述，提取出96篇论文，涵盖了最新的研究发现和研究方向。具体来说，我们涵盖了数据库(动作的、诱发的和自然的)和特征(韵律、频谱、语音质量和能量算子)，以及必要的预处理步骤。此外，还对不同的深度学习模型及其性能进行了深入研究。在回顾的基础上，我们还提出了未来可以考虑的SER方面。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Speech emotion recognition approaches: A systematic review

The speech emotion recognition (SER) field has been active since it became a crucial feature in advanced Human–Computer Interaction (HCI), and wide real-life applications use it. In recent years, numerous SER systems have been covered by researchers, including the availability of appropriate emotional databases, selecting robustness features, and applying suitable classifiers using Machine Learning (ML) and Deep Learning (DL). Deep models proved to perform more accurately for SER than conventional ML techniques. Nevertheless, SER is yet challenging for classification where to separate similar emotional patterns; it needs a highly discriminative feature representation. For this purpose, this survey aims to critically analyze what is being done in this field of research in light of previous studies that aim to recognize emotions using speech audio in different aspects and review the current state of SER using DL. Through a systematic literature review whereby searching selected keywords from 2012–2022, 96 papers were extracted and covered the most current findings and directions. Specifically, we covered the database (acted, evoked, and natural) and features (prosodic, spectral, voice quality, and teager energy operator), the necessary preprocessing steps. Furthermore, different DL models and their performance are examined in depth. Based on our review, we also suggested SER aspects that could be considered in the future.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Speech Communication 工程技术-计算机：跨学科应用

CiteScore

6.80

自引率

6.20%

发文量

审稿时长

19.2 weeks

期刊介绍： Speech Communication is an interdisciplinary journal whose primary objective is to fulfil the need for the rapid dissemination and thorough discussion of basic and applied research results. The journal''s primary objectives are: • to present a forum for the advancement of human and human-machine speech communication science; • to stimulate cross-fertilization between different fields of this domain; • to contribute towards the rapid and wide diffusion of scientifically sound contributions in this domain.