DYNAMIC CLASSIFICATION OF LATENT DISEASE PROGRESSION WITH AUXILIARY SURROGATE LABELS.

IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY
Annals of Applied Statistics Pub Date : 2026-03-01 Epub Date: 2026-03-20 DOI:10.1214/26-aoas2150
Zexi Cai, Donglin Zeng, Karen S Marder, Lawrence S Honig, Yuanjia Wang
{"title":"DYNAMIC CLASSIFICATION OF LATENT DISEASE PROGRESSION WITH AUXILIARY SURROGATE LABELS.","authors":"Zexi Cai, Donglin Zeng, Karen S Marder, Lawrence S Honig, Yuanjia Wang","doi":"10.1214/26-aoas2150","DOIUrl":null,"url":null,"abstract":"<p><p>Disease progression prediction based on patients' evolving health information is challenging when true disease states are unknown due to diagnostic capabilities or high costs. For example, the absence of gold-standard neurological diagnoses hinders distinguishing Alzheimer's disease (AD) from related conditions such as AD-related dementias (ADRDs), including Lewy body dementia (LBD). Combining temporally dependent surrogate labels and health markers may improve disease prediction. However, existing literature models informative surrogate labels and observed variables that reflect the underlying states using purely generative approaches, often posing unrealistic assumptions on the outcomes and suffering from misspecification thereof. We propose integrating the conventional hidden Markov model as a generative model with a time-varying discriminative classification model to simultaneously handle potentially misspecified surrogate labels and incorporate important markers of disease progression. We develop an adaptive forward-backward algorithm with subjective labels for estimation, and utilize the modified posterior and Viterbi algorithms to predict the progression of future states or new patients based on objective markers only. Importantly, the adaptation eliminates the need to model the marginal distribution of longitudinal markers, a requirement in traditional algorithms. Asymptotic properties are established, and significant improvements in finite samples are demonstrated via simulation studies. Analysis of the neuropathological dataset of the National Alzheimer's Coordinating Center (NACC) shows much improved accuracy in distinguishing LBD from AD.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"20 1","pages":"641-662"},"PeriodicalIF":1.4000,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13004507/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Applied Statistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1214/26-aoas2150","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/3/20 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0

Abstract

Disease progression prediction based on patients' evolving health information is challenging when true disease states are unknown due to diagnostic capabilities or high costs. For example, the absence of gold-standard neurological diagnoses hinders distinguishing Alzheimer's disease (AD) from related conditions such as AD-related dementias (ADRDs), including Lewy body dementia (LBD). Combining temporally dependent surrogate labels and health markers may improve disease prediction. However, existing literature models informative surrogate labels and observed variables that reflect the underlying states using purely generative approaches, often posing unrealistic assumptions on the outcomes and suffering from misspecification thereof. We propose integrating the conventional hidden Markov model as a generative model with a time-varying discriminative classification model to simultaneously handle potentially misspecified surrogate labels and incorporate important markers of disease progression. We develop an adaptive forward-backward algorithm with subjective labels for estimation, and utilize the modified posterior and Viterbi algorithms to predict the progression of future states or new patients based on objective markers only. Importantly, the adaptation eliminates the need to model the marginal distribution of longitudinal markers, a requirement in traditional algorithms. Asymptotic properties are established, and significant improvements in finite samples are demonstrated via simulation studies. Analysis of the neuropathological dataset of the National Alzheimer's Coordinating Center (NACC) shows much improved accuracy in distinguishing LBD from AD.

用辅助替代标签动态分类潜伏性疾病进展。
当由于诊断能力或高成本而无法确定真实疾病状态时,基于患者不断变化的健康信息进行疾病进展预测是具有挑战性的。例如,缺乏黄金标准的神经学诊断阻碍了将阿尔茨海默病(AD)与相关疾病(如AD相关痴呆(adrd),包括路易体痴呆(LBD))区分开来。结合暂时依赖的替代标签和健康标记可以改善疾病预测。然而,现有的文献模型使用纯粹的生成方法来反映潜在状态的替代标签和观察变量,经常对结果提出不切实际的假设,并且存在错误的说明。我们建议将传统的隐马尔可夫模型作为生成模型与时变判别分类模型相结合,以同时处理潜在的错误指定的替代标签并纳入疾病进展的重要标记。我们开发了一种带有主观标签的自适应前向后算法进行估计,并利用改进的后验和Viterbi算法仅基于客观标记来预测未来状态或新患者的进展。重要的是,这种自适应消除了传统算法中对纵向标记的边缘分布建模的需要。建立了渐近性质,并通过仿真研究证明了在有限样本下的显著改进。对国家阿尔茨海默病协调中心(NACC)神经病理学数据集的分析显示,区分LBD和AD的准确性大大提高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Annals of Applied Statistics
Annals of Applied Statistics 社会科学-统计学与概率论
CiteScore
3.10
自引率
5.60%
发文量
131
审稿时长
6-12 weeks
期刊介绍: Statistical research spans an enormous range from direct subject-matter collaborations to pure mathematical theory. The Annals of Applied Statistics, the newest journal from the IMS, is aimed at papers in the applied half of this range. Published quarterly in both print and electronic form, our goal is to provide a timely and unified forum for all areas of applied statistics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书