Zexi Cai, Donglin Zeng, Karen S Marder, Lawrence S Honig, Yuanjia Wang
{"title":"DYNAMIC CLASSIFICATION OF LATENT DISEASE PROGRESSION WITH AUXILIARY SURROGATE LABELS.","authors":"Zexi Cai, Donglin Zeng, Karen S Marder, Lawrence S Honig, Yuanjia Wang","doi":"10.1214/26-aoas2150","DOIUrl":"10.1214/26-aoas2150","url":null,"abstract":"<p><p>Disease progression prediction based on patients' evolving health information is challenging when true disease states are unknown due to diagnostic capabilities or high costs. For example, the absence of gold-standard neurological diagnoses hinders distinguishing Alzheimer's disease (AD) from related conditions such as AD-related dementias (ADRDs), including Lewy body dementia (LBD). Combining temporally dependent surrogate labels and health markers may improve disease prediction. However, existing literature models informative surrogate labels and observed variables that reflect the underlying states using purely generative approaches, often posing unrealistic assumptions on the outcomes and suffering from misspecification thereof. We propose integrating the conventional hidden Markov model as a generative model with a time-varying discriminative classification model to simultaneously handle potentially misspecified surrogate labels and incorporate important markers of disease progression. We develop an adaptive forward-backward algorithm with subjective labels for estimation, and utilize the modified posterior and Viterbi algorithms to predict the progression of future states or new patients based on objective markers only. Importantly, the adaptation eliminates the need to model the marginal distribution of longitudinal markers, a requirement in traditional algorithms. Asymptotic properties are established, and significant improvements in finite samples are demonstrated via simulation studies. Analysis of the neuropathological dataset of the National Alzheimer's Coordinating Center (NACC) shows much improved accuracy in distinguishing LBD from AD.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"20 1","pages":"641-662"},"PeriodicalIF":1.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13004507/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147500503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SEMIPARAMETRIC ANALYSIS OF INTERVAL-CENSORED DATA SUBJECT TO INACCURATE DIAGNOSES WITH A TERMINAL EVENT.","authors":"Yuhao Deng, Donglin Zeng, Yuanjia Wang","doi":"10.1214/25-aoas2134","DOIUrl":"10.1214/25-aoas2134","url":null,"abstract":"<p><p>Interval-censoring frequently occurs in studies of chronic diseases where disease status is inferred from intermittently collected biomarkers. Although many methods have been developed to analyze such data, they typically assume perfect disease diagnosis, which often does not hold in practice due to the inherent imperfect clinical diagnosis of cognitive functions or measurement errors of biomarkers such as cerebrospinal fluid. In this work, we introduce a semiparametric modeling framework using the Cox proportional hazards model to address interval-censored data in the presence of inaccurate disease diagnosis. Our model incorporates sensitivity and specificity of the diagnosis to account for uncertainty in whether the interval truly contains the disease onset. Furthermore, the framework accommodates scenarios involving a terminal event and when diagnosis is accurate, such as through postmortem analysis. We propose a nonparametric maximum likelihood estimation method for inference and develop an efficient EM algorithm to ensure computational feasibility. The regression coefficient estimators are shown to be asymptotically normal, achieving semiparametric efficiency bounds. We further validate our approach through extensive simulation studies and an application assessing Alzheimer's disease (AD) risk. We find that amyloid-beta is significantly associated with AD, but Tau is predictive of both AD and mortality.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"20 1","pages":"623-640"},"PeriodicalIF":1.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13004487/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147500496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A general framework for investigating neurodevelopment of brain functional networks using multisite and longitudinal neuroimaging.","authors":"Joshua Lukemire, Yaotian Wang, Ying Guo","doi":"10.1214/25-aoas2133","DOIUrl":"10.1214/25-aoas2133","url":null,"abstract":"<p><p>In recent years longitudinal, multi-site imaging studies have emerged as key tools for investigating brain function. These studies follow a large number of participants for an extended period, offering exciting opportunities to uncover brain functional network changes over time as a function of clinical and demographic covariates. However, these studies also introduce many statistical challenges such as site-effects and accounting for the heterogeneous nature of network differences between subjects. Robust statistical methods are highly needed to address these issues, but to date there has been little methods development addressing these problems in the context of data-driven brain network estimation. This work addresses this gap in the literature, introducing a general Bayesian framework, REMBRAiNDT, incorporating site- and subject-effects into the network decomposition, while also enabling covariate effect estimation and efficient information pooling across brain locations. We use our procedure to conduct a novel analysis of neurodevelopment among adolescents in the longitudinal, multi-site ABCD study. We find extensive evidence of increasing functional integration with age in networks associated with higher order cognitive processes. Our study is one of the first to examine neurodevelopment using blind source separation in the longitudinal ABCD study data, and the findings enrich earlier cross-sectional results on neurodevelopment.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"20 1","pages":"604-622"},"PeriodicalIF":1.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13008291/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147516374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Small Area Estimation of Education Levels in Low- and Middle-Income Countries.","authors":"Yunhan Wu, Ameer Dharamshi, Jon Wakefield","doi":"10.1214/25-aoas2135","DOIUrl":"10.1214/25-aoas2135","url":null,"abstract":"<p><p>Education is a key driver of social and economic mobility, yet disparities in attainment persist, particularly in low- and middle-income countries (LMICs). Existing indicators, such as mean years of schooling for adults aged 25 and older (MYS25) and expected years of schooling (EYS), offer a snapshot of an educational system, but lack either cohort-specific or temporal granularity. To address these limitations, we introduce the ultimate years of schooling (UYS)-a birth cohort-based metric targeting the final educational attainment of any individual cohort, including those with ongoing schooling trajectories. As with many attainment indicators, we propose to estimate UYS with cross-sectional household surveys. However, for younger cohorts, estimation fails, because these individuals are right-censored leading to severe downwards bias. To correct for this, we propose to re-frame educational attainment as a time-to-event process and deploy discrete-time survival models that explicitly account for censoring in the observations. At the national level, we estimate the parameters of the model using survey-weighted logistic regression, while for finer spatial resolutions, where sample sizes are smaller, we embed the discrete-time survival model within a Bayesian spatiotemporal framework to improve stability and precision. Applying our proposed methods to data from the 2022 Tanzania Demographic and Health Surveys, we estimate female educational trajectories corrected for censoring biases, and reveal substantial subnational disparities. By providing a dynamic, bias-corrected, and spatially disaggregated measure, our approach enhances education monitoring; it equips policymakers and researchers with a more precise tool for monitoring current progress towards education goals, and for designing future targeted policy interventions in LMICs.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"20 1","pages":"833-855"},"PeriodicalIF":1.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13090004/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147724220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Katherine R Paulson, Geir-Arne Fuglstad, Zehang Richard Li, Jonathan Wakefield
{"title":"TEMPORAL MODELS FOR ESTIMATION AND SHORT-TERM FORECASTING OF NEONATAL MORTALITY RATES IN SUB-SAHARAN AFRICA.","authors":"Katherine R Paulson, Geir-Arne Fuglstad, Zehang Richard Li, Jonathan Wakefield","doi":"10.1214/25-aoas2100","DOIUrl":"https://doi.org/10.1214/25-aoas2100","url":null,"abstract":"<p><p>Accurate estimation and forecasts for neonatal mortality rates (NMRs) in low- and middle-income countries is an urgent problem. Much of child mortality is preventable, and understanding temporal trends is of great interest when evaluating past performance and planning future policy or programming. In countries without robust vital registration, we rely on modeled estimates based on survey data to understand trends. A toolkit of compelling temporal models exists, but these methods have not been comprehensively evaluated for their application for the estimation of the NMR in low- and middle-income countries using household survey data. Using Demographic and Health Surveys (DHS) and Multiple Indicator Cluster Surveys (MICS) data from 41 countries in sub-Saharan Africa, we estimate and forecast the national-level NMR for 1970-2030 separately with random walk, auto-regressive, penalized spline, natural spline, and logit-linear latent temporal models. We examine the statistical behavior of these temporal models with both an out-of-sample analysis using the DHS and MICS data and a simulation study. We find that the second-order random walk and the penalized spline have the least bias, and short-term forecasts from the penalized spline tend to have narrower intervals with better out-of-sample performance. From the analysis of the NMR in sub-Saharan Africa, we estimate that 6 or fewer of the 41 countries included are on track to achieve the Sustainable Development Goals target of 12 neonatal deaths per 1000 live births by 2030.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"20 1","pages":"322-345"},"PeriodicalIF":1.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13096879/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147788192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"JOINT MODELING FOR LEARNING DECISION-MAKING DYNAMICS IN BEHAVIORAL EXPERIMENTS.","authors":"Yuan Bian, Xingche Guo, Yuanjia Wang","doi":"10.1214/25-aoas2112","DOIUrl":"10.1214/25-aoas2112","url":null,"abstract":"<p><p>Major depressive disorder (MDD), a leading cause of disability and mortality, is associated with reward-processing abnormalities and concentration issues. Motivated by the probabilistic reward task from the Establishing Moderators and Biosignatures of Antidepressant Response in Clinical Care (EMBARC) study, we propose a novel framework that integrates the reinforcement learning (RL) model and drift-diffusion model (DDM) to jointly analyze reward-based decision-making with response times. To account for emerging evidence suggesting that decision-making may alternate between multiple interleaved strategies, we model latent state switching using a hidden Markov model (HMM). In the engaged state, decisions follow an RL-DDM, simultaneously capturing reward processing, decision dynamics, and temporal structure. In contrast, in the lapsed state, decision-making is modeled using a simplified DDM, where specific parameters are fixed to approximate random guessing with equal probability. The proposed method is implemented using a computationally efficient generalized expectation-maximization (EM) algorithm with forward-backward procedures. Through extensive numerical studies, we demonstrate that our proposed method outperforms competing approaches across various reward-generating distributions, under both strategy-switching and non-switching scenarios, as well as in the presence of input perturbations. When applied to the EMBARC study, our framework reveals that MDD patients exhibit lower overall engagement than healthy controls and experience longer responses when they do engage. Additionally, we show that neuroimaging measures of brain activities are associated with decision-making characteristics in the engaged state but not in the lapsed state, providing evidence of brain-behavior association specific to the engaged state.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"19 4","pages":"3372-3393"},"PeriodicalIF":1.4,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12814034/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146012947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nam Hoai Nguyen, Seung Jun Shin, Elissa Dodd-Eaton, Jing Ning, Wenyi Wang
{"title":"PERSONALIZED RISK PREDICTION FOR CANCER SURVIVORS: A GENERALIZED BAYESIAN SEMI-PARAMETRIC MODEL OF RECURRENT EVENTS WITH COMPETING OUTCOMES.","authors":"Nam Hoai Nguyen, Seung Jun Shin, Elissa Dodd-Eaton, Jing Ning, Wenyi Wang","doi":"10.1214/25-AOAS2083","DOIUrl":"10.1214/25-AOAS2083","url":null,"abstract":"<p><p>Multiple primary cancers are increasingly more frequent due to improved survival of cancer patients. Characteristics of the first primary cancer largely impact the risk of developing subsequent primary cancers. Hence, model-based risk characterization of cancer survivors that captures patient-specific variables is needed for healthcare policy making. We propose a Bayesian semi-parametric framework, where the occurrence processes of the competing cancer types follow independent non-homogeneous Poisson processes and adjust for covariates including the type and age at diagnosis of the first primary. Applying this framework to a historically collected cohort with families presenting a highly enriched history of multiple primary tumors and diverse cancer types, we have derived a suite of age-to-onset penetrance curves for cancer survivors. This includes penetrance estimates for second primary lung cancer, potentially impactful to ongoing cancer screening decisions. Using Receiver Operating Characteristic (ROC) curves, we have validated the good predictive performance of our models in predicting second primary lung cancer, sarcoma, breast cancer, and all other cancers combined, with areas under the curves (AUCs) at 0.89, 0.91, 0.76 and 0.68, respectively. In conclusion, our framework provides covariate-adjusted quantitative risk assessment for cancer survivors, hence moving a step closer to personalized health management for this unique population.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"19 4","pages":"3091-3112"},"PeriodicalIF":1.4,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12955820/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147357368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rene Gutierrez, Aaron Scheffler, Rajarshi Guhaniyogi, Maria Luisa Gorno-Tempini, Maria Luisa Mandelli, Giovanni Battistella
{"title":"MULTI-OBJECT DATA INTEGRATION IN THE STUDY OF PRIMARY PROGRESSIVE APHASIA.","authors":"Rene Gutierrez, Aaron Scheffler, Rajarshi Guhaniyogi, Maria Luisa Gorno-Tempini, Maria Luisa Mandelli, Giovanni Battistella","doi":"10.1214/25-aoas2071","DOIUrl":"10.1214/25-aoas2071","url":null,"abstract":"<p><p>This article focuses on a multi-modal imaging data application where structural/anatomical information from gray matter (GM) and brain connectivity information in the form of a brain connectome network from functional magnetic resonance imaging (fMRI) are available for a number of subjects with different degrees of primary progressive aphasia (PPA), a neurodegenerative disorder (ND) measured through a speech rate measure on motor speech loss. The clinical/scientific goal in this study becomes the identification of brain regions of interest significantly related to the speech rate measure to gain insight into ND patterns. Viewing the brain connectome network and GM images as objects, we develop an integrated object response regression framework of network and GM images on the speech rate measure. A novel integrated prior formulation is proposed on network and structural image coefficients in order to exploit network information of the brain connectome while leveraging the interconnections among the two objects. The principled Bayesian framework allows the characterization of uncertainty in ascertaining a region being actively related to the speech rate measure. Our framework yields new insights into the relationship of brain regions associated with PPA, offering a deeper understanding of neuro-degenerative patterns of PPA. The supplementary file adds details about posterior computation and additional empirical results.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"19 4","pages":"3282-3303"},"PeriodicalIF":1.4,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12707422/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145776604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Leyao Zhang, Wen Wang, Mengtong Hu, Alan P Baptist, Peng Wang, Peter X K Song
{"title":"SUPERVISED LEARNING OF OUTCOME-RELEVANT ITEMS FROM A QUESTIONNAIRE VIA MIXED INTEGER OPTIMIZATION.","authors":"Leyao Zhang, Wen Wang, Mengtong Hu, Alan P Baptist, Peng Wang, Peter X K Song","doi":"10.1214/25-AOAS2093","DOIUrl":"10.1214/25-AOAS2093","url":null,"abstract":"<p><p>Questionnaires are among the oldest and most widely used instruments in practice to measure variables relevant to traits of interest that cannot be easily measured by physical devices, for example, depression. In many clinical settings, the scope of an existing questionnaire is often unfit to apply to a new study population, whose underlying characteristics are different from those of the original population used for the questionnaire's development and/or validation. Motivated by a cohort study of elderly asthma patients, we aim to examine associations between clinical outcomes and quality of life (QoL) measured by a QoL questionnaire. To increase comparability, we consider a supervised learning method to identify a subset of questions whose summary score is strongly associated with a specific clinical outcome under investigation. The resultant set of selected items gives an optimal summary metric of the questionnaire, which improves both statistical power and clinical interpretation. Our item extraction procedure is built upon the best subset algorithm implemented by a mixed integer programming, which enjoys both theoretical guarantee of selection consistency and flexibility of handling nonresponse missing data. Moreover, estimation uncertainty is analyzed by the means of noise perturbation. Our methodology is first evaluated by extensive simulation studies with comparisons to existing methods and then applied to derive tailored QoL scores adaptive to two clinical outcomes of lung function measure (FEV1) and asthma control test (ACT), respectively, among elderly people with persistent asthma.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"19 4","pages":"3157-3178"},"PeriodicalIF":1.4,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12869357/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146127254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TREE-REGULARIZED BAYESIAN LATENT CLASS ANALYSIS FOR IMPROVING WEAKLY SEPARATED DIETARY PATTERN SUBTYPING IN SMALL-SIZED SUBPOPULATIONS.","authors":"By Mengbing Li, Briana Stephenson, Zhenke Wu","doi":"10.1214/25-aoas2067","DOIUrl":"10.1214/25-aoas2067","url":null,"abstract":"<p><p>Dietary patterns synthesize multiple related diet components, which can be used by nutrition researchers to examine diet-disease relationships. Latent class models (LCMs) have been used to derive dietary patterns from dietary intake assessment, where each class profile represents the probabilities of exposure to a set of diet components. However, LCM-derived dietary patterns can exhibit strong similarities, or weak separation, resulting in numerical and inferential instabilities that challenge scientific interpretation. This issue is exacerbated in small-sized subpopulations. To address these issues, we provide a simple solution that empowers LCMs to improve dietary pattern estimation. We develop a tree-regularized Bayesian LCM that shares statistical strength between dietary patterns to make better estimates using limited data. This is achieved via a Dirichlet diffusion tree process that specifies a prior distribution for the unknown tree over classes. Dietary patterns that share proximity to one another in the tree are shrunk toward ancestral dietary patterns a priori, with the degree of shrinkage varying across prespecified food groups. Using dietary intake data from the Hispanic Community Health Study/Study of Latinos, we apply the proposed approach to a sample of 496 U.S. adults of South American ethnic background to identify and compare dietary patterns.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"19 4","pages":"3003-3022"},"PeriodicalIF":1.4,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12867110/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146121126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}