Meiwei Zhang , Yuwei Pan , Qiushi Cui , Yang Lü , Weihua Yu
{"title":"Multimodal LLM for enhanced Alzheimer’s Disease diagnosis: Interpretable feature extraction from Mini-Mental State Examination data","authors":"Meiwei Zhang , Yuwei Pan , Qiushi Cui , Yang Lü , Weihua Yu","doi":"10.1016/j.exger.2025.112812","DOIUrl":null,"url":null,"abstract":"<div><div>Alzheimer’s Disease (AD) poses a considerable global health challenge, necessitating early and accurate diagnostics. The Mini-Mental State Examination (MMSE) is widely used for initial screening, but its traditional application often underutilizes the rich multimodal data it generates, such as videos, images, and speech. Integrating these modalities with modern Large Language Models (LLMs) offers untapped potential for improved diagnostics. In this study, we propose a multimodal LLM framework fundamentally reinterprets MMSE data. Instead of relying on conventional, often limited MMSE features, proposed LLM acts as a sophisticated cognitive analyst, directly processing MMSE modalities. This deep multimodal understanding allows for the extraction of novel, high-level features that transcend traditional metrics. These are not merely visual or acoustic signals, but rich semantic representations imbued with cognitive insights gleaned by the LLM. We then construct an interpretable decision tree classifier and derive a succinct rule list, yielding transparent diagnostic pathways readily understandable by clinicians. Finally, framework integrates a counterfactual explanation module to provide individualized “what-if” analyses, illuminating how minimal feature changes could alter model outputs. Our empirical study on real-world clinical data achieves a diagnostic accuracy of approximately 6% percentage points improvements with diagnosing explanation, reinforcing the viability of our framework as a promising, interpretable, and scalable solution for early AD detection.</div></div>","PeriodicalId":94003,"journal":{"name":"Experimental gerontology","volume":"208 ","pages":"Article 112812"},"PeriodicalIF":4.3000,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Experimental gerontology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S053155652500141X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Alzheimer’s Disease (AD) poses a considerable global health challenge, necessitating early and accurate diagnostics. The Mini-Mental State Examination (MMSE) is widely used for initial screening, but its traditional application often underutilizes the rich multimodal data it generates, such as videos, images, and speech. Integrating these modalities with modern Large Language Models (LLMs) offers untapped potential for improved diagnostics. In this study, we propose a multimodal LLM framework fundamentally reinterprets MMSE data. Instead of relying on conventional, often limited MMSE features, proposed LLM acts as a sophisticated cognitive analyst, directly processing MMSE modalities. This deep multimodal understanding allows for the extraction of novel, high-level features that transcend traditional metrics. These are not merely visual or acoustic signals, but rich semantic representations imbued with cognitive insights gleaned by the LLM. We then construct an interpretable decision tree classifier and derive a succinct rule list, yielding transparent diagnostic pathways readily understandable by clinicians. Finally, framework integrates a counterfactual explanation module to provide individualized “what-if” analyses, illuminating how minimal feature changes could alter model outputs. Our empirical study on real-world clinical data achieves a diagnostic accuracy of approximately 6% percentage points improvements with diagnosing explanation, reinforcing the viability of our framework as a promising, interpretable, and scalable solution for early AD detection.