{"title":"Speech2Dementia: A Novel Deep Learning Framework Integrating Enhanced CNN and Large Language Models for Automatic Detection of Alzheimer's Dementia","authors":"Bandaru A. Chakravarthi, Gandla Shivakanth","doi":"10.1111/coin.70051","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Early diagnosis of Alzheimer's disease (AD) is important for early intervention, but current diagnostic tools tend to use unimodal methods, processing either speech or text separately. Although models such as the ComParE Baseline for audio and BERT-based text classifiers have been successful, they do not take advantage of the complementary strengths of both modalities, which restricts their diagnostic power. To overcome this, we suggest SPID-AD (Speech-Based Intelligent Detection of Alzheimer's Dementia), a multimodal deep-learning approach that combines linguistic and acoustic features for the automated detection of Alzheimer's. Our approach uses a BERT-based architecture to mine semantic patterns from transcripts and an augmented Convolutional Neural Network (CNN) to process Mel-spectrogram representations of speech. By combining these features in dense layers, the model retains language-related as well as auditory biomarkers of cognitive impairment. Assessed on the DementiaBank Pitt Corpus, SPID-AD has 95.6% classification accuracy, surpassing state-of-the-art models in precision, recall, and F1-score. The findings demonstrate the strength of multimodal analysis in detecting dementia speech patterns, providing a non-invasive, AI-based diagnostic tool that may assist clinicians in the early detection of Alzheimer's.</p>\n </div>","PeriodicalId":55228,"journal":{"name":"Computational Intelligence","volume":"41 2","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/coin.70051","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Early diagnosis of Alzheimer's disease (AD) is important for early intervention, but current diagnostic tools tend to use unimodal methods, processing either speech or text separately. Although models such as the ComParE Baseline for audio and BERT-based text classifiers have been successful, they do not take advantage of the complementary strengths of both modalities, which restricts their diagnostic power. To overcome this, we suggest SPID-AD (Speech-Based Intelligent Detection of Alzheimer's Dementia), a multimodal deep-learning approach that combines linguistic and acoustic features for the automated detection of Alzheimer's. Our approach uses a BERT-based architecture to mine semantic patterns from transcripts and an augmented Convolutional Neural Network (CNN) to process Mel-spectrogram representations of speech. By combining these features in dense layers, the model retains language-related as well as auditory biomarkers of cognitive impairment. Assessed on the DementiaBank Pitt Corpus, SPID-AD has 95.6% classification accuracy, surpassing state-of-the-art models in precision, recall, and F1-score. The findings demonstrate the strength of multimodal analysis in detecting dementia speech patterns, providing a non-invasive, AI-based diagnostic tool that may assist clinicians in the early detection of Alzheimer's.
期刊介绍:
This leading international journal promotes and stimulates research in the field of artificial intelligence (AI). Covering a wide range of issues - from the tools and languages of AI to its philosophical implications - Computational Intelligence provides a vigorous forum for the publication of both experimental and theoretical research, as well as surveys and impact studies. The journal is designed to meet the needs of a wide range of AI workers in academic and industrial research.