Speech Emotion Recognition in People at High Risk of Dementia.

Dementia and neurocognitive disorders Pub Date : 2024-07-01 Epub Date: 2024-07-24 DOI:10.12779/dnd.2024.23.3.146

Dongseon Kim, Bongwon Yi, Yugwon Won

{"title":"Speech Emotion Recognition in People at High Risk of Dementia.","authors":"Dongseon Kim, Bongwon Yi, Yugwon Won","doi":"10.12779/dnd.2024.23.3.146","DOIUrl":null,"url":null,"abstract":"Background and purpose: The emotions of people at various stages of dementia need to be effectively utilized for prevention, early intervention, and care planning. With technology available for understanding and addressing the emotional needs of people, this study aims to develop speech emotion recognition (SER) technology to classify emotions for people at high risk of dementia.Methods: Speech samples from people at high risk of dementia were categorized into distinct emotions via human auditory assessment, the outcomes of which were annotated for guided deep-learning method. The architecture incorporated convolutional neural network, long short-term memory, attention layers, and Wav2Vec2, a novel feature extractor to develop automated speech-emotion recognition.Results: Twenty-seven kinds of Emotions were found in the speech of the participants. These emotions were grouped into 6 detailed emotions: happiness, interest, sadness, frustration, anger, and neutrality, and further into 3 basic emotions: positive, negative, and neutral. To improve algorithmic performance, multiple learning approaches were applied using different data sources-voice and text-and varying the number of emotions. Ultimately, a 2-stage algorithm-initial text-based classification followed by voice-based analysis-achieved the highest accuracy, reaching 70%.Conclusions: The diverse emotions identified in this study were attributed to the characteristics of the participants and the method of data collection. The speech of people at high risk of dementia to companion robots also explains the relatively low performance of the SER algorithm. Accordingly, this study suggests the systematic and comprehensive construction of a dataset from people with dementia.","PeriodicalId":72779,"journal":{"name":"Dementia and neurocognitive disorders","volume":"23 3","pages":"146-160"},"PeriodicalIF":0.0000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11300689/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Dementia and neurocognitive disorders","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.12779/dnd.2024.23.3.146","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/7/24 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background and purpose: The emotions of people at various stages of dementia need to be effectively utilized for prevention, early intervention, and care planning. With technology available for understanding and addressing the emotional needs of people, this study aims to develop speech emotion recognition (SER) technology to classify emotions for people at high risk of dementia.

Methods: Speech samples from people at high risk of dementia were categorized into distinct emotions via human auditory assessment, the outcomes of which were annotated for guided deep-learning method. The architecture incorporated convolutional neural network, long short-term memory, attention layers, and Wav2Vec2, a novel feature extractor to develop automated speech-emotion recognition.

Results: Twenty-seven kinds of Emotions were found in the speech of the participants. These emotions were grouped into 6 detailed emotions: happiness, interest, sadness, frustration, anger, and neutrality, and further into 3 basic emotions: positive, negative, and neutral. To improve algorithmic performance, multiple learning approaches were applied using different data sources-voice and text-and varying the number of emotions. Ultimately, a 2-stage algorithm-initial text-based classification followed by voice-based analysis-achieved the highest accuracy, reaching 70%.

Conclusions: The diverse emotions identified in this study were attributed to the characteristics of the participants and the method of data collection. The speech of people at high risk of dementia to companion robots also explains the relatively low performance of the SER algorithm. Accordingly, this study suggests the systematic and comprehensive construction of a dataset from people with dementia.

Abstract Image

查看原文本刊更多论文

痴呆症高危人群的语音情感识别。

背景和目的：需要有效地利用痴呆症不同阶段患者的情绪来进行预防、早期干预和护理规划。目前已有技术可以理解和满足人们的情感需求，本研究旨在开发语音情感识别（SER）技术，为痴呆症高危人群进行情感分类：方法：通过人类听觉评估，将痴呆症高危人群的语音样本分为不同的情绪，并对评估结果进行注释，以指导深度学习方法。该架构包含卷积神经网络、长短期记忆、注意力层和新型特征提取器 Wav2Vec2，用于开发自动语音情感识别：结果：在参与者的语音中发现了 27 种情绪。这些情绪被分为 6 种详细情绪：快乐、兴趣、悲伤、沮丧、愤怒和中立，并进一步分为 3 种基本情绪：积极、消极和中立。为了提高算法性能，我们采用了多种学习方法，使用不同的数据源（语音和文本）和不同的情绪数量。最终，一种先进行基于文本的分类，然后再进行基于语音的分析的两阶段算法获得了最高的准确率，达到了 70%：结论：本研究中识别出的不同情绪归因于参与者的特点和数据收集方法。痴呆症高危人群对陪伴机器人的讲话也是 SER 算法性能相对较低的原因。因此，本研究建议系统、全面地构建痴呆症患者的数据集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Dementia and neurocognitive disorders

自引率

0.00%

发文量