Predicting dementia through audio: Ensemble and deep learning approaches using acoustic features

IF 6.3 2区医学 Q1 BIOLOGY

Computers in biology and medicine Pub Date : 2025-09-17 DOI:10.1016/j.compbiomed.2025.111078

Priyanka G. , Amshakala K.

{"title":"Predicting dementia through audio: Ensemble and deep learning approaches using acoustic features","authors":"Priyanka G. , Amshakala K.","doi":"10.1016/j.compbiomed.2025.111078","DOIUrl":null,"url":null,"abstract":"<div><div>A deterioration in cognitive function beyond what one might expect from normal aging characterizes the symptoms of dementia. It predominantly marks older adults, although it is not a normal part of aging. Dementia encompasses a range of symptoms that can include memory loss, impaired reasoning, personality changes, and difficulties with daily activities. One of the major difficulties that elderly people with dementia tend to face is communicating with other people to meet their daily needs. Diagnosing dementia involves a comprehensive evaluation of an individual's cognitive function, medical history, and other relevant factors. In this work, audio recordings of patients are used for diagnosing dementia at earlier stages. To do this, we take sound characteristics from the audio recordings, such as pitch, variations in pitch, loudness changes, how quickly the voice starts, and specific sound patterns. We then selected the best acoustic features using statistical methods to train ensemble models such as Random Forest, AdaBoost, XGBoost, and Gradient Boost. In addition to ensemble learning models, certain deep learning models like BiLSTM, LSTM, and CNN-LSTM are also trained with these features. The features selected for training include spectral centroid, MFCC, and fundamental frequency (F0). Further, both the ensemble learning models and the deep learning models underwent random search for hyperparameter tuning, along with regularization and cross-validation, to enhance their performance. It was observed that the gradient boost model was found to perform well with an accuracy of 90.5 % in diagnosing dementia from audio data when trained with spectral centroid, MFCC, and fundamental frequency (F0). Furthermore, the study explores the underlying factors that may lead ensemble models to achieve superior performance over deep learning models in specific cases, even though deep learning models are typically considered more effective for large-scale datasets.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"197 ","pages":"Article 111078"},"PeriodicalIF":6.3000,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in biology and medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010482525014301","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

A deterioration in cognitive function beyond what one might expect from normal aging characterizes the symptoms of dementia. It predominantly marks older adults, although it is not a normal part of aging. Dementia encompasses a range of symptoms that can include memory loss, impaired reasoning, personality changes, and difficulties with daily activities. One of the major difficulties that elderly people with dementia tend to face is communicating with other people to meet their daily needs. Diagnosing dementia involves a comprehensive evaluation of an individual's cognitive function, medical history, and other relevant factors. In this work, audio recordings of patients are used for diagnosing dementia at earlier stages. To do this, we take sound characteristics from the audio recordings, such as pitch, variations in pitch, loudness changes, how quickly the voice starts, and specific sound patterns. We then selected the best acoustic features using statistical methods to train ensemble models such as Random Forest, AdaBoost, XGBoost, and Gradient Boost. In addition to ensemble learning models, certain deep learning models like BiLSTM, LSTM, and CNN-LSTM are also trained with these features. The features selected for training include spectral centroid, MFCC, and fundamental frequency (F0). Further, both the ensemble learning models and the deep learning models underwent random search for hyperparameter tuning, along with regularization and cross-validation, to enhance their performance. It was observed that the gradient boost model was found to perform well with an accuracy of 90.5 % in diagnosing dementia from audio data when trained with spectral centroid, MFCC, and fundamental frequency (F0). Furthermore, the study explores the underlying factors that may lead ensemble models to achieve superior performance over deep learning models in specific cases, even though deep learning models are typically considered more effective for large-scale datasets.

查看原文本刊更多论文

通过音频预测痴呆症：使用声学特征的集成和深度学习方法。

认知功能的退化超出了人们对正常衰老的预期，这是痴呆症的特征。它主要是老年人的标志，尽管它不是衰老的正常部分。痴呆症包括一系列症状，包括记忆丧失、推理能力受损、性格改变和日常活动困难。老年痴呆症患者往往面临的主要困难之一是与其他人沟通以满足他们的日常需求。诊断痴呆症需要对个人的认知功能、病史和其他相关因素进行综合评估。在这项工作中，患者的录音被用于早期诊断痴呆症。为此，我们从录音中提取声音特征，如音高、音高变化、响度变化、声音开始的速度以及特定的声音模式。然后，我们使用统计方法选择最佳声学特征来训练随机森林、AdaBoost、XGBoost和Gradient Boost等集成模型。除了集成学习模型之外，某些深度学习模型，如BiLSTM、LSTM和CNN-LSTM也使用这些特征进行训练。选择用于训练的特征包括谱质心、MFCC和基频F0。此外，集成学习模型和深度学习模型都进行了随机搜索超参数调整，以及正则化和交叉验证，以提高其性能。我们观察到，当使用谱质心、MFCC和基频（F0）进行训练时，梯度增强模型在从音频数据诊断痴呆症方面表现良好，准确率为90.5%。此外，该研究还探讨了可能导致集成模型在特定情况下比深度学习模型实现更优性能的潜在因素，尽管深度学习模型通常被认为对大规模数据集更有效。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computers in biology and medicine 工程技术-工程：生物医学

CiteScore

11.70

自引率

10.40%

发文量

1086

审稿时长

74 days

期刊介绍： Computers in Biology and Medicine is an international forum for sharing groundbreaking advancements in the use of computers in bioscience and medicine. This journal serves as a medium for communicating essential research, instruction, ideas, and information regarding the rapidly evolving field of computer applications in these domains. By encouraging the exchange of knowledge, we aim to facilitate progress and innovation in the utilization of computers in biology and medicine.