Emotion Detection using Speech and Face in Deep Learning

2023 International Conference on Sustainable Computing and Smart Systems (ICSCSS) Pub Date : 2023-06-14 DOI:10.1109/ICSCSS57650.2023.10169784

S. Shajith Ahamed, J. Jabez, M. Prithiviraj

{"title":"Emotion Detection using Speech and Face in Deep Learning","authors":"S. Shajith Ahamed, J. Jabez, M. Prithiviraj","doi":"10.1109/ICSCSS57650.2023.10169784","DOIUrl":null,"url":null,"abstract":"Humans have a unique ability to demonstrate and understand emotions through a variety of models of communication. Based on their emotions or mood swings we can judge whether the human subject is in good psychological condition or not. The most visible apparent deficiencies of today’s Emotion capturing systems were their inability to understand the emotions of such patients like mental health disorder, social emotion Agnosia, alexithymia or even autism by using facial expressions. It can be used in schools to help students who find it difficult to express their feelings (introverts) or who have unstable mental health concerns, such as depression, and hence the teacher’s or health workers can communicate with their parents and work through their problems. These days, technology allows employers to recognize individuals who are overly stressed in the workplace and release them from their duties. In research work a Deep Learning algorithm is utilized to create an integrated tool to identify the facial emotions and the stress level or emotion quotient from speech. Tools that can assist people in recognizing the emotions of those around them could be very beneficial in treatment settings as well as in regular social encounters. Emotion detection using speech and face in deep learning has made significant progress in recent years, but there are still several challenges that need to be addressed. Here are some of the main challenges: Limited Dataset: The availability of labeled datasets for emotion detection is limited, especially for less common emotions or for specific cultural contexts. This makes it challenging to train deep learning models that can generalize well to new data. Variability in Data: The data used for emotion detection can vary widely in terms of quality, noise, and variability. For example, speech data can be affected by environmental noise, accents, and speaking styles, while facial data can be affected by lighting conditions, facial expressions, and occlusion. Feature Extraction: Extracting relevant features from speech and facial data can be challenging, especially when dealing with complex emotions that are not easily captured by simple features. This requires careful design of feature extraction algorithms and feature engineering techniques. Interpretability: Deep learning models are often seen as “black boxes” that are difficult to interpret. This can make it challenging to understand how the model is making decisions and to diagnose errors or biases in the model.Ethical and Privacy Concerns: Emotion detection using speech and facial data raises ethical and privacy concerns, as it can be used for sensitive applications such as surveillance, emotion profiling, and behavioral prediction. This requires careful consideration of ethical and privacy issues in the design and deployment of deep learning models for emotion detection.","PeriodicalId":217957,"journal":{"name":"2023 International Conference on Sustainable Computing and Smart Systems (ICSCSS)","volume":"86 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Sustainable Computing and Smart Systems (ICSCSS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSCSS57650.2023.10169784","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Humans have a unique ability to demonstrate and understand emotions through a variety of models of communication. Based on their emotions or mood swings we can judge whether the human subject is in good psychological condition or not. The most visible apparent deficiencies of today’s Emotion capturing systems were their inability to understand the emotions of such patients like mental health disorder, social emotion Agnosia, alexithymia or even autism by using facial expressions. It can be used in schools to help students who find it difficult to express their feelings (introverts) or who have unstable mental health concerns, such as depression, and hence the teacher’s or health workers can communicate with their parents and work through their problems. These days, technology allows employers to recognize individuals who are overly stressed in the workplace and release them from their duties. In research work a Deep Learning algorithm is utilized to create an integrated tool to identify the facial emotions and the stress level or emotion quotient from speech. Tools that can assist people in recognizing the emotions of those around them could be very beneficial in treatment settings as well as in regular social encounters. Emotion detection using speech and face in deep learning has made significant progress in recent years, but there are still several challenges that need to be addressed. Here are some of the main challenges: Limited Dataset: The availability of labeled datasets for emotion detection is limited, especially for less common emotions or for specific cultural contexts. This makes it challenging to train deep learning models that can generalize well to new data. Variability in Data: The data used for emotion detection can vary widely in terms of quality, noise, and variability. For example, speech data can be affected by environmental noise, accents, and speaking styles, while facial data can be affected by lighting conditions, facial expressions, and occlusion. Feature Extraction: Extracting relevant features from speech and facial data can be challenging, especially when dealing with complex emotions that are not easily captured by simple features. This requires careful design of feature extraction algorithms and feature engineering techniques. Interpretability: Deep learning models are often seen as “black boxes” that are difficult to interpret. This can make it challenging to understand how the model is making decisions and to diagnose errors or biases in the model.Ethical and Privacy Concerns: Emotion detection using speech and facial data raises ethical and privacy concerns, as it can be used for sensitive applications such as surveillance, emotion profiling, and behavioral prediction. This requires careful consideration of ethical and privacy issues in the design and deployment of deep learning models for emotion detection.

查看原文本刊更多论文

深度学习中基于语音和面部的情感检测

人类有一种独特的能力，可以通过各种各样的交流模式来展示和理解情感。根据他们的情绪或情绪波动，我们可以判断人类受试者是否处于良好的心理状态。当今的情绪捕捉系统最明显的缺陷是无法通过面部表情来理解精神健康障碍、社交情绪失认症、述情障碍甚至自闭症等患者的情绪。它可以在学校里用来帮助那些难以表达自己的感受(内向)或有不稳定的心理健康问题(如抑郁症)的学生，因此教师或卫生工作者可以与他们的父母沟通并解决他们的问题。如今，科技使雇主能够识别出在工作场所压力过大的员工，并让他们放下工作。在研究工作中，利用深度学习算法创建了一个集成的工具来识别面部情绪和语音中的压力水平或情商。能够帮助人们识别周围人情绪的工具在治疗环境和日常社交中都非常有益。近年来，深度学习中使用语音和面部的情感检测取得了重大进展，但仍有一些挑战需要解决。有限的数据集:用于情绪检测的标记数据集的可用性是有限的，特别是对于不太常见的情绪或特定的文化背景。这使得训练能够很好地泛化新数据的深度学习模型变得具有挑战性。数据的可变性:用于情感检测的数据在质量、噪声和可变性方面差异很大。例如，语音数据可能受到环境噪声、口音和说话风格的影响，而面部数据可能受到光照条件、面部表情和遮挡的影响。特征提取:从语音和面部数据中提取相关特征是具有挑战性的，特别是在处理不容易通过简单特征捕获的复杂情绪时。这需要仔细设计特征提取算法和特征工程技术。可解释性:深度学习模型通常被视为难以解释的“黑盒子”。这使得理解模型如何做出决策以及诊断模型中的错误或偏差变得具有挑战性。道德和隐私问题:使用语音和面部数据进行情感检测引起了道德和隐私问题，因为它可以用于敏感应用，如监视、情绪分析和行为预测。这需要在设计和部署用于情感检测的深度学习模型时仔细考虑道德和隐私问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 International Conference on Sustainable Computing and Smart Systems (ICSCSS)

自引率

0.00%

发文量