{"title":"基于语音的增强型情感识别","authors":"Dr. M. Narendra, Lankala Suvarchala","doi":"10.32628/ijsrst24113128","DOIUrl":null,"url":null,"abstract":"Speech Emotion Recognition (SER) is a Machine Learning (ML) topic that has attracted substantial attention from researchers, particularly in the field of emotional computing. This is because of its growing potential, improvements in algorithms, and real-world applications. Pitch, intensity, and Mel-Frequency Cepstral Coefficients (MFCC) are examples of quantitative variables that can be used to represent the paralinguistic information found in human speech. The three main processes of data processing, feature selection/extraction, and classification based on the underlying emotional traits are typically followed to achieve SER. The use of ML techniques for SER implementation is supported by the nature of these processes as well as the unique characteristics of human speech. Several ML techniques were used in recent affective computing research projects for SER tasks; Only a few number of them, nevertheless, adequately convey the fundamental strategies and tactics that can be applied to support the three essential phases of SER implementation. Additionally, these works either overlook or just briefly explain the difficulties involved in completing these tasks and the cutting-edge methods employed to overcome them. With a focus on the three SER implementation processes, we give a comprehensive assessment of research conducted over the past ten years that tackled SER challenges from machine learning perspectives in this study. A number of difficulties are covered in detail, including the problem of Speaker-Independent experiments' low classification accuracy and related solutions. The review offers principles for SER evaluation as well, emphasizing indicators that can be experimented with and common baselines. The purpose of this paper is to serve as a a thorough manual that SER researchers may use to build SER solutions using ML techniques, inspire potential upgrades to current SER models, or spark the development of new methods to improve SER performance.","PeriodicalId":14387,"journal":{"name":"International Journal of Scientific Research in Science and Technology","volume":"51 34","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Enhanced Human Speech Based Emotion Recognition\",\"authors\":\"Dr. M. Narendra, Lankala Suvarchala\",\"doi\":\"10.32628/ijsrst24113128\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speech Emotion Recognition (SER) is a Machine Learning (ML) topic that has attracted substantial attention from researchers, particularly in the field of emotional computing. This is because of its growing potential, improvements in algorithms, and real-world applications. Pitch, intensity, and Mel-Frequency Cepstral Coefficients (MFCC) are examples of quantitative variables that can be used to represent the paralinguistic information found in human speech. The three main processes of data processing, feature selection/extraction, and classification based on the underlying emotional traits are typically followed to achieve SER. The use of ML techniques for SER implementation is supported by the nature of these processes as well as the unique characteristics of human speech. Several ML techniques were used in recent affective computing research projects for SER tasks; Only a few number of them, nevertheless, adequately convey the fundamental strategies and tactics that can be applied to support the three essential phases of SER implementation. Additionally, these works either overlook or just briefly explain the difficulties involved in completing these tasks and the cutting-edge methods employed to overcome them. With a focus on the three SER implementation processes, we give a comprehensive assessment of research conducted over the past ten years that tackled SER challenges from machine learning perspectives in this study. A number of difficulties are covered in detail, including the problem of Speaker-Independent experiments' low classification accuracy and related solutions. The review offers principles for SER evaluation as well, emphasizing indicators that can be experimented with and common baselines. The purpose of this paper is to serve as a a thorough manual that SER researchers may use to build SER solutions using ML techniques, inspire potential upgrades to current SER models, or spark the development of new methods to improve SER performance.\",\"PeriodicalId\":14387,\"journal\":{\"name\":\"International Journal of Scientific Research in Science and Technology\",\"volume\":\"51 34\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Scientific Research in Science and Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.32628/ijsrst24113128\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Scientific Research in Science and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.32628/ijsrst24113128","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
语音情感识别(SER)是一个机器学习(ML)课题,吸引了大量研究人员的关注,尤其是在情感计算领域。这是因为其潜力不断增长、算法不断改进以及在现实世界中的应用。音高、音强和梅尔频率倒频谱系数(MFCC)是量化变量的示例,可用来表示人类语音中的副语言信息。要实现 SER,通常需要经过数据处理、特征选择/提取和基于基本情感特征的分类这三个主要过程。这些过程的性质以及人类语音的独特特征都支持使用 ML 技术来实现 SER。在最近的情感计算研究项目中,有几种 ML 技术被用于 SER 任务;然而,其中只有少数几种技术充分传达了可用于支持 SER 实施的三个基本阶段的基本战略和策略。此外,这些著作要么忽略了完成这些任务所涉及的困难,要么只是简要说明了克服这些困难所采用的前沿方法。在本研究中,我们以三个 SER 实施过程为重点,对过去十年间从机器学习角度应对 SER 挑战的研究进行了全面评估。其中详细介绍了一些难题,包括与说话者无关的实验分类准确率低的问题及相关解决方案。综述还提供了 SER 评估的原则,强调了可进行实验的指标和通用基线。本文旨在提供一本详尽的手册,供 SER 研究人员使用 ML 技术构建 SER 解决方案,激发对当前 SER 模型的潜在升级,或激发开发新方法以提高 SER 性能。
An Enhanced Human Speech Based Emotion Recognition
Speech Emotion Recognition (SER) is a Machine Learning (ML) topic that has attracted substantial attention from researchers, particularly in the field of emotional computing. This is because of its growing potential, improvements in algorithms, and real-world applications. Pitch, intensity, and Mel-Frequency Cepstral Coefficients (MFCC) are examples of quantitative variables that can be used to represent the paralinguistic information found in human speech. The three main processes of data processing, feature selection/extraction, and classification based on the underlying emotional traits are typically followed to achieve SER. The use of ML techniques for SER implementation is supported by the nature of these processes as well as the unique characteristics of human speech. Several ML techniques were used in recent affective computing research projects for SER tasks; Only a few number of them, nevertheless, adequately convey the fundamental strategies and tactics that can be applied to support the three essential phases of SER implementation. Additionally, these works either overlook or just briefly explain the difficulties involved in completing these tasks and the cutting-edge methods employed to overcome them. With a focus on the three SER implementation processes, we give a comprehensive assessment of research conducted over the past ten years that tackled SER challenges from machine learning perspectives in this study. A number of difficulties are covered in detail, including the problem of Speaker-Independent experiments' low classification accuracy and related solutions. The review offers principles for SER evaluation as well, emphasizing indicators that can be experimented with and common baselines. The purpose of this paper is to serve as a a thorough manual that SER researchers may use to build SER solutions using ML techniques, inspire potential upgrades to current SER models, or spark the development of new methods to improve SER performance.