Speech Emotion Recognition in People at High Risk of Dementia.

Dementia and neurocognitive disorders Pub Date : 2024-07-01 Epub Date: 2024-07-24 DOI:10.12779/dnd.2024.23.3.146
Dongseon Kim, Bongwon Yi, Yugwon Won
{"title":"Speech Emotion Recognition in People at High Risk of Dementia.","authors":"Dongseon Kim, Bongwon Yi, Yugwon Won","doi":"10.12779/dnd.2024.23.3.146","DOIUrl":null,"url":null,"abstract":"<p><strong>Background and purpose: </strong>The emotions of people at various stages of dementia need to be effectively utilized for prevention, early intervention, and care planning. With technology available for understanding and addressing the emotional needs of people, this study aims to develop speech emotion recognition (SER) technology to classify emotions for people at high risk of dementia.</p><p><strong>Methods: </strong>Speech samples from people at high risk of dementia were categorized into distinct emotions via human auditory assessment, the outcomes of which were annotated for guided deep-learning method. The architecture incorporated convolutional neural network, long short-term memory, attention layers, and Wav2Vec2, a novel feature extractor to develop automated speech-emotion recognition.</p><p><strong>Results: </strong>Twenty-seven kinds of Emotions were found in the speech of the participants. These emotions were grouped into 6 detailed emotions: happiness, interest, sadness, frustration, anger, and neutrality, and further into 3 basic emotions: positive, negative, and neutral. To improve algorithmic performance, multiple learning approaches were applied using different data sources-voice and text-and varying the number of emotions. Ultimately, a 2-stage algorithm-initial text-based classification followed by voice-based analysis-achieved the highest accuracy, reaching 70%.</p><p><strong>Conclusions: </strong>The diverse emotions identified in this study were attributed to the characteristics of the participants and the method of data collection. The speech of people at high risk of dementia to companion robots also explains the relatively low performance of the SER algorithm. Accordingly, this study suggests the systematic and comprehensive construction of a dataset from people with dementia.</p>","PeriodicalId":72779,"journal":{"name":"Dementia and neurocognitive disorders","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11300689/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Dementia and neurocognitive disorders","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.12779/dnd.2024.23.3.146","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/7/24 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Background and purpose: The emotions of people at various stages of dementia need to be effectively utilized for prevention, early intervention, and care planning. With technology available for understanding and addressing the emotional needs of people, this study aims to develop speech emotion recognition (SER) technology to classify emotions for people at high risk of dementia.

Methods: Speech samples from people at high risk of dementia were categorized into distinct emotions via human auditory assessment, the outcomes of which were annotated for guided deep-learning method. The architecture incorporated convolutional neural network, long short-term memory, attention layers, and Wav2Vec2, a novel feature extractor to develop automated speech-emotion recognition.

Results: Twenty-seven kinds of Emotions were found in the speech of the participants. These emotions were grouped into 6 detailed emotions: happiness, interest, sadness, frustration, anger, and neutrality, and further into 3 basic emotions: positive, negative, and neutral. To improve algorithmic performance, multiple learning approaches were applied using different data sources-voice and text-and varying the number of emotions. Ultimately, a 2-stage algorithm-initial text-based classification followed by voice-based analysis-achieved the highest accuracy, reaching 70%.

Conclusions: The diverse emotions identified in this study were attributed to the characteristics of the participants and the method of data collection. The speech of people at high risk of dementia to companion robots also explains the relatively low performance of the SER algorithm. Accordingly, this study suggests the systematic and comprehensive construction of a dataset from people with dementia.

痴呆症高危人群的语音情感识别。
背景和目的:需要有效地利用痴呆症不同阶段患者的情绪来进行预防、早期干预和护理规划。目前已有技术可以理解和满足人们的情感需求,本研究旨在开发语音情感识别(SER)技术,为痴呆症高危人群进行情感分类:方法:通过人类听觉评估,将痴呆症高危人群的语音样本分为不同的情绪,并对评估结果进行注释,以指导深度学习方法。该架构包含卷积神经网络、长短期记忆、注意力层和新型特征提取器 Wav2Vec2,用于开发自动语音情感识别:结果:在参与者的语音中发现了 27 种情绪。这些情绪被分为 6 种详细情绪:快乐、兴趣、悲伤、沮丧、愤怒和中立,并进一步分为 3 种基本情绪:积极、消极和中立。为了提高算法性能,我们采用了多种学习方法,使用不同的数据源(语音和文本)和不同的情绪数量。最终,一种先进行基于文本的分类,然后再进行基于语音的分析的两阶段算法获得了最高的准确率,达到了 70%:结论:本研究中识别出的不同情绪归因于参与者的特点和数据收集方法。痴呆症高危人群对陪伴机器人的讲话也是 SER 算法性能相对较低的原因。因此,本研究建议系统、全面地构建痴呆症患者的数据集。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
文献相关原料
公司名称 产品信息 采购帮参考价格
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信