{"title":"Automatically Calculated Context-Sensitive Features of Connected Speech Improve Prediction of Impairment in Alzheimer's Disease.","authors":"Graham Flick, Rachel Ostrand","doi":"10.1044/2025_JSLHR-24-00297","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Early detection is critical for effective management of Alzheimer's disease (AD) and other dementias. One promising approach for predicting AD status is to automatically calculate linguistic features from open-ended connected speech. Past work has focused on individual word-level features such as part of speech counts, total word production, and lexical richness, with less emphasis on measuring the relationship between words and the context in which they are produced. Here, we assessed whether linguistic features that take into account where a word was produced in the discourse context improved the ability to predict AD patients' Mini-Mental State Examination (MMSE) scores and classify AD patients from healthy control participants.</p><p><strong>Method: </strong>Seventeen linguistic features were automatically computed from transcriptions of spoken picture descriptions from individuals with probable or possible AD (<i>n</i> = 176 transcripts). This included 12 word-level features (e.g., part of speech counts) and five features capturing contextual word choices (linguistic surprisal, computed from a computational large language model, and properties of words produced following filled pauses). We examined whether (a) the full set jointly predicted MMSE scores, (b) the addition of contextual features improved prediction, and (c) linguistic features could classify AD patients (<i>n</i> = 130) versus healthy participants (<i>n</i> = 93).</p><p><strong>Results: </strong>Linguistic features accurately predicted MMSE scores in individuals with probable or possible AD and successfully identified up to 87% of AD participants versus healthy controls. Statistical models that contained linguistic surprisal (a contextual feature) performed better than those that included only word-level and demographic features. Overall, AD patients with lower MMSE scores produced more empty words, fewer nouns and definite articles, and words that were higher frequency yet more surprising given the previous context.</p><p><strong>Conclusion: </strong>These results provide novel evidence that metrics related to contextualized word choices, particularly the surprisal of an individual's words, capture variance in degree of cognitive decline in AD.</p>","PeriodicalId":520690,"journal":{"name":"Journal of speech, language, and hearing research : JSLHR","volume":" ","pages":"1-22"},"PeriodicalIF":2.2000,"publicationDate":"2025-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of speech, language, and hearing research : JSLHR","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1044/2025_JSLHR-24-00297","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: Early detection is critical for effective management of Alzheimer's disease (AD) and other dementias. One promising approach for predicting AD status is to automatically calculate linguistic features from open-ended connected speech. Past work has focused on individual word-level features such as part of speech counts, total word production, and lexical richness, with less emphasis on measuring the relationship between words and the context in which they are produced. Here, we assessed whether linguistic features that take into account where a word was produced in the discourse context improved the ability to predict AD patients' Mini-Mental State Examination (MMSE) scores and classify AD patients from healthy control participants.
Method: Seventeen linguistic features were automatically computed from transcriptions of spoken picture descriptions from individuals with probable or possible AD (n = 176 transcripts). This included 12 word-level features (e.g., part of speech counts) and five features capturing contextual word choices (linguistic surprisal, computed from a computational large language model, and properties of words produced following filled pauses). We examined whether (a) the full set jointly predicted MMSE scores, (b) the addition of contextual features improved prediction, and (c) linguistic features could classify AD patients (n = 130) versus healthy participants (n = 93).
Results: Linguistic features accurately predicted MMSE scores in individuals with probable or possible AD and successfully identified up to 87% of AD participants versus healthy controls. Statistical models that contained linguistic surprisal (a contextual feature) performed better than those that included only word-level and demographic features. Overall, AD patients with lower MMSE scores produced more empty words, fewer nouns and definite articles, and words that were higher frequency yet more surprising given the previous context.
Conclusion: These results provide novel evidence that metrics related to contextualized word choices, particularly the surprisal of an individual's words, capture variance in degree of cognitive decline in AD.