{"title":"Modality fusion using auxiliary tasks for dementia detection","authors":"Hangshou Shao, Yilin Pan, Yue Wang, Yijia Zhang","doi":"10.1016/j.csl.2025.101814","DOIUrl":null,"url":null,"abstract":"<div><div>Alzheimer’s disease is the leading cause of dementia that affects elderly individual’s speech and language abilities. In this paper, a <strong>F</strong>eature <strong>F</strong>usion Model with <strong>G</strong>uide Patterns (FFG) is designed as an acoustic- and linguistic-based dementia detection system, considering the limited publicly available data and modalities fusion inefficiency. Specifically, a multi-modal features interaction module composed of multiple co-attention layers is designed to improve multi-modal interaction between the acoustic and linguistic information embedded in the audio recordings. Given the limited audio recordings available in public datasets, guide patterns are introduced as auxiliary tasks to enhance the interaction between acoustic and linguistic information. Our proposed FFG model is evaluated with three publicly available datasets, namely, Pitt, ADReSS, and ADReSSo. Experimental results demonstrate that the FFG model can achieve superior resu lts on all three publicly available datasets. An exceptional performance of 85.85% and 84.30% accuracy was achieved on the Pitt and ADReSSo datasets. The ablation study demonstrated the efficiency of our proposed model.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"95 ","pages":"Article 101814"},"PeriodicalIF":3.1000,"publicationDate":"2025-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0885230825000397","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Alzheimer’s disease is the leading cause of dementia that affects elderly individual’s speech and language abilities. In this paper, a Feature Fusion Model with Guide Patterns (FFG) is designed as an acoustic- and linguistic-based dementia detection system, considering the limited publicly available data and modalities fusion inefficiency. Specifically, a multi-modal features interaction module composed of multiple co-attention layers is designed to improve multi-modal interaction between the acoustic and linguistic information embedded in the audio recordings. Given the limited audio recordings available in public datasets, guide patterns are introduced as auxiliary tasks to enhance the interaction between acoustic and linguistic information. Our proposed FFG model is evaluated with three publicly available datasets, namely, Pitt, ADReSS, and ADReSSo. Experimental results demonstrate that the FFG model can achieve superior resu lts on all three publicly available datasets. An exceptional performance of 85.85% and 84.30% accuracy was achieved on the Pitt and ADReSSo datasets. The ablation study demonstrated the efficiency of our proposed model.
期刊介绍:
Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language.
The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.