{"title":"Predicting dementia through audio: Ensemble and deep learning approaches using acoustic features","authors":"Priyanka G. , Amshakala K.","doi":"10.1016/j.compbiomed.2025.111078","DOIUrl":null,"url":null,"abstract":"<div><div>A deterioration in cognitive function beyond what one might expect from normal aging characterizes the symptoms of dementia. It predominantly marks older adults, although it is not a normal part of aging. Dementia encompasses a range of symptoms that can include memory loss, impaired reasoning, personality changes, and difficulties with daily activities. One of the major difficulties that elderly people with dementia tend to face is communicating with other people to meet their daily needs. Diagnosing dementia involves a comprehensive evaluation of an individual's cognitive function, medical history, and other relevant factors. In this work, audio recordings of patients are used for diagnosing dementia at earlier stages. To do this, we take sound characteristics from the audio recordings, such as pitch, variations in pitch, loudness changes, how quickly the voice starts, and specific sound patterns. We then selected the best acoustic features using statistical methods to train ensemble models such as Random Forest, AdaBoost, XGBoost, and Gradient Boost. In addition to ensemble learning models, certain deep learning models like BiLSTM, LSTM, and CNN-LSTM are also trained with these features. The features selected for training include spectral centroid, MFCC, and fundamental frequency (F0). Further, both the ensemble learning models and the deep learning models underwent random search for hyperparameter tuning, along with regularization and cross-validation, to enhance their performance. It was observed that the gradient boost model was found to perform well with an accuracy of 90.5 % in diagnosing dementia from audio data when trained with spectral centroid, MFCC, and fundamental frequency (F0). Furthermore, the study explores the underlying factors that may lead ensemble models to achieve superior performance over deep learning models in specific cases, even though deep learning models are typically considered more effective for large-scale datasets.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"197 ","pages":"Article 111078"},"PeriodicalIF":6.3000,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in biology and medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010482525014301","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
A deterioration in cognitive function beyond what one might expect from normal aging characterizes the symptoms of dementia. It predominantly marks older adults, although it is not a normal part of aging. Dementia encompasses a range of symptoms that can include memory loss, impaired reasoning, personality changes, and difficulties with daily activities. One of the major difficulties that elderly people with dementia tend to face is communicating with other people to meet their daily needs. Diagnosing dementia involves a comprehensive evaluation of an individual's cognitive function, medical history, and other relevant factors. In this work, audio recordings of patients are used for diagnosing dementia at earlier stages. To do this, we take sound characteristics from the audio recordings, such as pitch, variations in pitch, loudness changes, how quickly the voice starts, and specific sound patterns. We then selected the best acoustic features using statistical methods to train ensemble models such as Random Forest, AdaBoost, XGBoost, and Gradient Boost. In addition to ensemble learning models, certain deep learning models like BiLSTM, LSTM, and CNN-LSTM are also trained with these features. The features selected for training include spectral centroid, MFCC, and fundamental frequency (F0). Further, both the ensemble learning models and the deep learning models underwent random search for hyperparameter tuning, along with regularization and cross-validation, to enhance their performance. It was observed that the gradient boost model was found to perform well with an accuracy of 90.5 % in diagnosing dementia from audio data when trained with spectral centroid, MFCC, and fundamental frequency (F0). Furthermore, the study explores the underlying factors that may lead ensemble models to achieve superior performance over deep learning models in specific cases, even though deep learning models are typically considered more effective for large-scale datasets.
期刊介绍:
Computers in Biology and Medicine is an international forum for sharing groundbreaking advancements in the use of computers in bioscience and medicine. This journal serves as a medium for communicating essential research, instruction, ideas, and information regarding the rapidly evolving field of computer applications in these domains. By encouraging the exchange of knowledge, we aim to facilitate progress and innovation in the utilization of computers in biology and medicine.