Yilin Pan , Jiabing Li , Yating Zhang , Zhuoran Tian , Yijia Zhang , Mingyu Lu
{"title":"基于语音的阿尔茨海默病纵向检测的时频因果隐马尔可夫模型","authors":"Yilin Pan , Jiabing Li , Yating Zhang , Zhuoran Tian , Yijia Zhang , Mingyu Lu","doi":"10.1016/j.csl.2025.101862","DOIUrl":null,"url":null,"abstract":"<div><div>Speech deterioration is an early indicator in individuals with Alzheimer’s disease (AD), with progression influenced by various factors, leading to unique trajectories for each individual. To facilitate automated longitudinal detection of AD using speech, we propose an enhanced Hidden Markov Model (HMM), termed the Time-Frequency Causal HMM (TF-CHMM), which models disease-causative acoustic features over time under the Markov property. The TF-CHMM integrates a parallel convolutional neural network as an encoder for spectrograms, extracting both time-domain and frequency-domain features from audio recordings linked to AD. Additionally, it incorporates personal attributes (e.g., age) and clinical diagnosis data (e.g., MMSE scores) as supplementary inputs, disentangling disease-related features from unrelated components through a sequential variational auto-encoder with causal inference. The TF-CHMM is evaluated using the Pitt Corpus, which includes annual visits for each subject with a variable number of longitudinal samples, comprising audio recordings, manual transcriptions, MMSE scores, and age information. Experimental results demonstrated the effectiveness of our designed system, achieving a competitive accuracy of 90.24% and an F1 score of 90.00%. An ablation study further highlighted the efficiency of the parallel convolutional kernels in extracting time–frequency information and emphasized the effectiveness of our longitudinal experimental setup in the AD detection system.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"95 ","pages":"Article 101862"},"PeriodicalIF":3.4000,"publicationDate":"2025-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Time–Frequency Causal Hidden Markov Model for speech-based Alzheimer’s disease longitudinal detection\",\"authors\":\"Yilin Pan , Jiabing Li , Yating Zhang , Zhuoran Tian , Yijia Zhang , Mingyu Lu\",\"doi\":\"10.1016/j.csl.2025.101862\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Speech deterioration is an early indicator in individuals with Alzheimer’s disease (AD), with progression influenced by various factors, leading to unique trajectories for each individual. To facilitate automated longitudinal detection of AD using speech, we propose an enhanced Hidden Markov Model (HMM), termed the Time-Frequency Causal HMM (TF-CHMM), which models disease-causative acoustic features over time under the Markov property. The TF-CHMM integrates a parallel convolutional neural network as an encoder for spectrograms, extracting both time-domain and frequency-domain features from audio recordings linked to AD. Additionally, it incorporates personal attributes (e.g., age) and clinical diagnosis data (e.g., MMSE scores) as supplementary inputs, disentangling disease-related features from unrelated components through a sequential variational auto-encoder with causal inference. The TF-CHMM is evaluated using the Pitt Corpus, which includes annual visits for each subject with a variable number of longitudinal samples, comprising audio recordings, manual transcriptions, MMSE scores, and age information. Experimental results demonstrated the effectiveness of our designed system, achieving a competitive accuracy of 90.24% and an F1 score of 90.00%. An ablation study further highlighted the efficiency of the parallel convolutional kernels in extracting time–frequency information and emphasized the effectiveness of our longitudinal experimental setup in the AD detection system.</div></div>\",\"PeriodicalId\":50638,\"journal\":{\"name\":\"Computer Speech and Language\",\"volume\":\"95 \",\"pages\":\"Article 101862\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2025-07-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Speech and Language\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0885230825000877\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0885230825000877","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Time–Frequency Causal Hidden Markov Model for speech-based Alzheimer’s disease longitudinal detection
Speech deterioration is an early indicator in individuals with Alzheimer’s disease (AD), with progression influenced by various factors, leading to unique trajectories for each individual. To facilitate automated longitudinal detection of AD using speech, we propose an enhanced Hidden Markov Model (HMM), termed the Time-Frequency Causal HMM (TF-CHMM), which models disease-causative acoustic features over time under the Markov property. The TF-CHMM integrates a parallel convolutional neural network as an encoder for spectrograms, extracting both time-domain and frequency-domain features from audio recordings linked to AD. Additionally, it incorporates personal attributes (e.g., age) and clinical diagnosis data (e.g., MMSE scores) as supplementary inputs, disentangling disease-related features from unrelated components through a sequential variational auto-encoder with causal inference. The TF-CHMM is evaluated using the Pitt Corpus, which includes annual visits for each subject with a variable number of longitudinal samples, comprising audio recordings, manual transcriptions, MMSE scores, and age information. Experimental results demonstrated the effectiveness of our designed system, achieving a competitive accuracy of 90.24% and an F1 score of 90.00%. An ablation study further highlighted the efficiency of the parallel convolutional kernels in extracting time–frequency information and emphasized the effectiveness of our longitudinal experimental setup in the AD detection system.
期刊介绍:
Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language.
The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.