Annals of Applied Statistics最新文献_第3页

SEMIPARAMETRIC LINEAR REGRESSION WITH AN INTERVAL-CENSORED COVARIATE IN THE ATHEROSCLEROSIS RISK IN COMMUNITIES STUDY. 社区动脉粥样硬化风险研究中的半参数线性回归与区间截除协变量。

IF 1.3 4区数学

Annals of Applied Statistics Pub Date : 2024-09-01 DOI: 10.1214/24-aoas1881

Richard Sizelove, Donglin Zeng, Dan-Yu Lin

{"title":"SEMIPARAMETRIC LINEAR REGRESSION WITH AN INTERVAL-CENSORED COVARIATE IN THE ATHEROSCLEROSIS RISK IN COMMUNITIES STUDY.","authors":"Richard Sizelove, Donglin Zeng, Dan-Yu Lin","doi":"10.1214/24-aoas1881","DOIUrl":"10.1214/24-aoas1881","url":null,"abstract":"In longitudinal studies, investigators are often interested in understanding how the time since the occurrence of an intermediate event affects a future outcome. The intermediate event is often asymptomatic such that its occurrence is only known to lie in a time interval induced by periodic examinations. We propose a linear regression model that relates the time since the occurrence of the intermediate event to a continuous response at a future time point through a rectified linear unit activation function while formulating the distribution of the time to the occurrence of the intermediate event through the Cox proportional hazards model. We consider nonparametric maximum likelihood estimation with an arbitrary sequence of examination times for each subject. We present an EM algorithm that converges stably for arbitrary datasets. The resulting estimators of regression parameters are consistent, asymptotically normal, and asymptotically efficient. We assess the performance of the proposed methods through extensive simulation studies and provide an application to the Atherosclerosis Risk in Communities Study.","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 3","pages":"2295-2306"},"PeriodicalIF":1.3,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12272158/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144676400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

INTEGRATING MENDELIAN RANDOMIZATION WITH CAUSAL MEDIATION ANALYSES FOR CHARACTERIZING DIRECT AND INDIRECT EXPOSURE-TO-OUTCOME EFFECTS. 整合孟德尔随机化与因果中介分析，以表征直接和间接暴露对结果的影响。

IF 1.3 4区数学

Annals of Applied Statistics Pub Date : 2024-09-01 Epub Date: 2024-08-05 DOI: 10.1214/24-aoas1901

Fan Yang, Lin S Chen, Shahram Oveisgharan, Dawood Darbar, David A Bennett

{"title":"INTEGRATING MENDELIAN RANDOMIZATION WITH CAUSAL MEDIATION ANALYSES FOR CHARACTERIZING DIRECT AND INDIRECT EXPOSURE-TO-OUTCOME EFFECTS.","authors":"Fan Yang, Lin S Chen, Shahram Oveisgharan, Dawood Darbar, David A Bennett","doi":"10.1214/24-aoas1901","DOIUrl":"10.1214/24-aoas1901","url":null,"abstract":"Mendelian randomization (MR) assesses the total effect of exposure on outcome. With the rapidly increasing availability of summary statistics from genome-wide association studies (GWASs), MR leverages existing summary statistics and is widely used to study the causal effects among complex traits and diseases. The total effect in the population is a sum of indirect and direct effects. For complex disease outcomes with complicated etiologies, and/or for modifiable exposure traits, there may exist more than one pathway between exposure and outcome. The direct effect and the indirect effect via a mediator of interest could be of opposite directions, and the total effect estimates may not be informative for treatment and prevention decision-making or may be even misleading for different subgroups of patients. Causal mediation analysis delineates the indirect effect of exposure on outcome operating through the mediator and the direct effect transmitted through other mechanisms. However, causal mediation analysis often requires individual-level data measured on exposure, outcome, mediator and confounding variables, and the power of the mediation analysis is restricted by sample size. In this work, motivated by a study of the effects of atrial fibrillation (AF) on Alzheimer's dementia, we propose a framework for Integrative Mendelian randomization and Mediation Analysis (IMMA). The proposed method integrates the total effect estimates from MR analyses based on large-scale GWASs with the direct and indirect effect estimates from mediation analysis based on individual-level data of a limited sample size. We introduce a series of IMMA models, under the scenarios with or without exposure-mediator interaction and/or study heterogeneity. The proposed IMMA models improve the estimation and the power of inference on the direct and indirect effects in the population, as well as the characterization of the variation of effects. Our analyses showed a significant positive direct effect of AF on Alzheimer's dementia risk not through the use of the oral anticoagulant treatment and a significant indirect effect of AF-induced anticoagulant treatment in reducing Alzheimer's dementia risk. The results suggested potential Alzheimer's dementia risk prediction and prevention strategies for AF patients, and paved the way for future re-evaluation of anticoagulant treatment guidelines for AF patients. A sensitivity analysis was conducted to assess the sensitivity of the conclusions to a key assumption of the IMMA approach.","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 3","pages":"2656-2677"},"PeriodicalIF":1.3,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11845245/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143484582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A bootstrap model comparison test for identifying genes with context-specific patterns of genetic regulation. 用于识别具有特定遗传调控模式的基因的自举模型比较检验。

IF 1.3 4区数学

Annals of Applied Statistics Pub Date : 2024-09-01 Epub Date: 2024-08-05 DOI: 10.1214/23-aoas1859

Mykhaylo M Malakhov, Ben Dai, Xiaotong T Shen, Wei Pan

{"title":"A bootstrap model comparison test for identifying genes with context-specific patterns of genetic regulation.","authors":"Mykhaylo M Malakhov, Ben Dai, Xiaotong T Shen, Wei Pan","doi":"10.1214/23-aoas1859","DOIUrl":"10.1214/23-aoas1859","url":null,"abstract":"Understanding how genetic variation affects gene expression is essential for a complete picture of the functional pathways that give rise to complex traits. Although numerous studies have established that many genes are differentially expressed in distinct human tissues and cell types, no tools exist for identifying the genes whose expression is differentially regulated. Here we introduce DRAB (differential regulation analysis by bootstrapping), a gene-based method for testing whether patterns of genetic regulation are significantly different between tissues or other biological contexts. DRAB first leverages the elastic net to learn context-specific models of local genetic regulation and then applies a novel bootstrap-based model comparison test to check their equivalency. Unlike previous model comparison tests, our proposed approach can determine whether population-level models have equal predictive performance by accounting for the variability of feature selection and model training. We validated DRAB on mRNA expression data from a variety of human tissues in the Genotype-Tissue Expression (GTEx) Project. DRAB yielded biologically reasonable results and had sufficient power to detect genes with tissue-specific regulatory profiles while effectively controlling false positives. By providing a framework that facilitates the prioritization of differentially regulated genes, our study enables future discoveries on the genetic architecture of molecular phenotypes.","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 3","pages":"1840-1857"},"PeriodicalIF":1.3,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11484521/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142479813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AN INTEGRATIVE NETWORK-BASED MEDIATION MODEL (NMM) TO ESTIMATE MULTIPLE GENETIC EFFECTS ON OUTCOMES MEDIATED BY FUNCTIONAL CONNECTIVITY. 一个综合网络为基础的中介模型（nmm），以估计多种遗传效应的结果介导的功能连接。

IF 1.4 4区数学

Annals of Applied Statistics Pub Date : 2024-09-01 Epub Date: 2024-08-05 DOI: 10.1214/24-aoas1880

Wei Dai, Heping Zhang

{"title":"AN INTEGRATIVE NETWORK-BASED MEDIATION MODEL (NMM) TO ESTIMATE MULTIPLE GENETIC EFFECTS ON OUTCOMES MEDIATED BY FUNCTIONAL CONNECTIVITY.","authors":"Wei Dai, Heping Zhang","doi":"10.1214/24-aoas1880","DOIUrl":"10.1214/24-aoas1880","url":null,"abstract":"Functional connectivity of the brain, characterized by interconnected neural circuits across functional networks, is a cutting-edge feature in neuroimaging. It has the potential to mediate the effect of genetic variants on behavioral outcomes or diseases. Existing mediation analysis methods can evaluate the impact of genetics and brain structurefunction on cognitive behavior or disorders, but they tend to be limited to single genetic variants or univariate mediators, without considering cumulative genetic effects and the complex matrix and group and network structures of functional connectivity. To address this gap, the paper presents an integrative network-based mediation model (NMM) that estimates the effect of multiple genetic variants on behavioral outcomes or diseases mediated by functional connectivity. The model incorporates group information of inter-regions at broad network level and imposes low-rank and sparse assumptions to reflect the complex structures of functional connectivity and selecting network mediators simultaneously. We adopt block coordinate descent algorithm to implement a fast and efficient solution to our model. Simulation results indicate the efficacy of the model in selecting active mediators and reducing bias in effect estimation. With application to the Human Connectome Project Youth Adult (HCP-YA) study of 493 young adults, two genetic variants (rs769448 and rs769449) on the APOE4 gene are identified that lead to deficits in functional connectivity within visual networks and fluid intelligence.","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 3","pages":"2277-2294"},"PeriodicalIF":1.4,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11616023/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142787675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PATIENT RECRUITMENT USING ELECTRONIC HEALTH RECORDS UNDER SELECTION BIAS: A TWO-PHASE SAMPLING FRAMEWORK. 在选择偏差的情况下利用电子病历招募患者：两阶段抽样框架。

IF 1.3 4区数学

Annals of Applied Statistics Pub Date : 2024-09-01 Epub Date: 2024-08-05 DOI: 10.1214/23-aoas1860

Guanghao Zhang, Lauren J Beesley, Bhramar Mukherjee, X U Shi

{"title":"PATIENT RECRUITMENT USING ELECTRONIC HEALTH RECORDS UNDER SELECTION BIAS: A TWO-PHASE SAMPLING FRAMEWORK.","authors":"Guanghao Zhang, Lauren J Beesley, Bhramar Mukherjee, X U Shi","doi":"10.1214/23-aoas1860","DOIUrl":"10.1214/23-aoas1860","url":null,"abstract":"Electronic health records (EHRs) are increasingly recognized as a cost-effective resource for patient recruitment in clinical research. However, how to optimally select a cohort from millions of individuals to answer a scientific question of interest remains unclear. Consider a study to estimate the mean or mean difference of an expensive outcome. Inexpensive auxiliary covariates predictive of the outcome may often be available in patients' health records, presenting an opportunity to recruit patients selectively, which may improve efficiency in downstream analyses. In this paper we propose a two-phase sampling design that leverages available information on auxiliary covariates in EHR data. A key challenge in using EHR data for multiphase sampling is the potential selection bias, because EHR data are not necessarily representative of the target population. Extending existing literature on two-phase sampling design, we derive an optimal two-phase sampling method that improves efficiency over random sampling while accounting for the potential selection bias in EHR data. We demonstrate the efficiency gain from our sampling design via simulation studies and an application evaluating the prevalence of hypertension among U.S. adults leveraging data from the Michigan Genomics Initiative, a longitudinal biorepository in Michigan Medicine.","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 3","pages":"1858-1878"},"PeriodicalIF":1.3,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11323140/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141989442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A NONPARAMETRIC MIXED-EFFECTS MIXTURE MODEL FOR PATTERNS OF CLINICAL MEASUREMENTS ASSOCIATED WITH COVID-19. 与 covid-19 相关的临床测量模式的非参数混合效应混合物模型。

IF 1.3 4区数学

Annals of Applied Statistics Pub Date : 2024-09-01 Epub Date: 2024-08-05 DOI: 10.1214/23-aoas1871

Xiaoran Ma, Wensheng Guo, Mengyang Gu, Len Usvyat, Peter Kotanko, Yuedong Wang

{"title":"A NONPARAMETRIC MIXED-EFFECTS MIXTURE MODEL FOR PATTERNS OF CLINICAL MEASUREMENTS ASSOCIATED WITH COVID-19.","authors":"Xiaoran Ma, Wensheng Guo, Mengyang Gu, Len Usvyat, Peter Kotanko, Yuedong Wang","doi":"10.1214/23-aoas1871","DOIUrl":"10.1214/23-aoas1871","url":null,"abstract":"Some patients with COVID-19 show changes in signs and symptoms such as temperature and oxygen saturation days before being positively tested for SARS-CoV-2, while others remain asymptomatic. It is important to identify these subgroups and to understand what biological and clinical predictors are related to these subgroups. This information will provide insights into how the immune system may respond differently to infection and can further be used to identify infected individuals. We propose a flexible nonparametric mixed-effects mixture model that identifies risk factors and classifies patients with biological changes. We model the latent probability of biological changes using a logistic regression model and trajectories in the latent groups using smoothing splines. We developed an EM algorithm to maximize the penalized likelihood for estimating all parameters and mean functions. We evaluate our methods by simulations and apply the proposed model to investigate changes in temperature in a cohort of COVID-19-infected hemodialysis patients.","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 3","pages":"2080-2095"},"PeriodicalIF":1.3,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11460989/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142394985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

OUTCOME-GUIDED DISEASE SUBTYPING BY GENERATIVE MODEL AND WEIGHTED JOINT LIKELIHOOD IN TRANSCRIPTOMIC APPLICATIONS. 转录组学应用中生成模型和加权联合似然的结果导向疾病亚型。

IF 1.4 4区数学

Annals of Applied Statistics Pub Date : 2024-09-01 Epub Date: 2024-08-05 DOI: 10.1214/23-aoas1865

Yujia Li, Peng Liu, Wenjia Wang, Wei Zong, Yusi Fang, Zhao Ren, Lu Tang, Juan C Celedón, Steffi Oesterreich, George C Tseng

{"title":"OUTCOME-GUIDED DISEASE SUBTYPING BY GENERATIVE MODEL AND WEIGHTED JOINT LIKELIHOOD IN TRANSCRIPTOMIC APPLICATIONS.","authors":"Yujia Li, Peng Liu, Wenjia Wang, Wei Zong, Yusi Fang, Zhao Ren, Lu Tang, Juan C Celedón, Steffi Oesterreich, George C Tseng","doi":"10.1214/23-aoas1865","DOIUrl":"10.1214/23-aoas1865","url":null,"abstract":"With advances in high-throughput technology, molecular disease subtyping by high-dimensional omics data has been recognized as an effective approach for identifying subtypes of complex diseases with distinct disease mechanisms and prognoses. Conventional cluster analysis takes omics data as input and generates patient clusters with similar gene expression pattern. The omics data, however, usually contain multi-faceted cluster structures that can be defined by different sets of gene. If the gene set associated with irrelevant clinical variables (e.g., sex or age) dominates the clustering process, the resulting clusters may not capture clinically meaningful disease subtypes. This motivates the development of a clustering framework with guidance from a pre-specified disease outcome, such as lung function measurement or survival, in this paper. We propose two disease subtyping methods by omics data with outcome guidance using a generative model or a weighted joint likelihood. Both methods connect an outcome association model and a disease subtyping model by a latent variable of cluster labels. Compared to the generative model, weighted joint likelihood contains a data-driven weight parameter to balance the likelihood contributions from outcome association and gene cluster separation, which improves generalizability in independent validation but requires heavier computing. Extensive simulations and two real applications in lung disease and triple-negative breast cancer demonstrate superior disease subtyping performance of the outcome-guided clustering methods in terms of disease subtyping accuracy, gene selection and outcome association. Unlike existing clustering methods, the outcome-guided disease subtyping framework creates a new precision medicine paradigm to directly identify patient subgroups with clinical association.","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 3","pages":"1947-1964"},"PeriodicalIF":1.4,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12309773/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144755012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

QUANTILE REGRESSION DECOMPOSITION ANALYSIS OF DISPARITY RESEARCH USING COMPLEX SURVEY DATA: APPLICATION TO DISPARITIES IN BMI AND TELOMERE LENGTH BETWEEN U.S. MINORITY AND WHITE POPULATION GROUPS. 使用复杂调查数据的差异研究的分位数回归分解分析：应用于美国少数民族和白人群体之间的bmi和端粒长度差异。

IF 1.4 4区数学

Annals of Applied Statistics Pub Date : 2024-09-01 Epub Date: 2024-08-05 DOI: 10.1214/23-aoas1868

Hyokyoung G Hong, Barry I Graubard, Joseph L Gastwirth, Mi-Ok Kim

{"title":"QUANTILE REGRESSION DECOMPOSITION ANALYSIS OF DISPARITY RESEARCH USING COMPLEX SURVEY DATA: APPLICATION TO DISPARITIES IN BMI AND TELOMERE LENGTH BETWEEN U.S. MINORITY AND WHITE POPULATION GROUPS.","authors":"Hyokyoung G Hong, Barry I Graubard, Joseph L Gastwirth, Mi-Ok Kim","doi":"10.1214/23-aoas1868","DOIUrl":"10.1214/23-aoas1868","url":null,"abstract":"We develop a quantile regression decomposition (QRD) method for analyzing observed disparities (OD) between population groups in socioeconomic and health-related outcomes for complex survey data. The conventional decomposition approaches use the conditional mean regression to decompose the disparity into two parts, the part explained by the difference arising from the different distributions in the explanatory covariates and the remaining part, which is unexplained by the covariates. Many socioeconomic and health outcomes exhibit heteroscedastic distributions, where the magnitude of observed disparities varies across different quantiles of these outcomes. Thus, differences in the explanatory covariates may account for varying differences in the OD across the quantiles of the outcome. The QRD can identify where there are greater differences in the outcome distribution, for example, 90th quantile, and how important the covariates are in explaining those differences. Much socioeconomic and health research relies on complex surveys, such as the National Health and Nutrition Examination Survey (NHANES), that oversample individuals from disadvantaged/minority population groups in order to provide improved precision. QRD has not been extended to the complex survey setting. We improve the QRD approach proposed in Machado and Mata (2005) to yield more reliable estimates at the quantiles, where the data are sparse, and extend it to the complex survey setting. We also propose a perturbation-based variance estimation method. Simulation studies indicate that the estimates of the unexplained portions of the OD across quantiles are unbiased and the coverage of the confidence intervals are close to nominal value. This methodology is used to study disparities in body mass index (BMI) and telomere length between race/ethnic groups estimated from the NHANES data.","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 3","pages":"2012-2033"},"PeriodicalIF":1.4,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12456447/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145139184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

JOINT MODELING OF MULTISTATE AND NONPARAMETRIC MULTIVARIATE LONGITUDINAL DATA. 多态和非参数多变量纵向数据的联合建模。

IF 1.8 4区数学

Annals of Applied Statistics Pub Date : 2024-08-05 DOI: 10.1214/24-aoas1889

L U You,Falastin Salami,Carina Törn,Åke Lernmark,Roy Tamura

引用次数: 0

BAYESIAN NESTED LATENT CLASS MODELS FOR CAUSE-OF-DEATH ASSIGNMENT USING VERBAL AUTOPSIES ACROSS MULTIPLE DOMAINS. 利用多领域口头尸检的贝叶斯嵌套潜类模型确定死因。

IF 1.3 4区数学

Annals of Applied Statistics Pub Date : 2024-06-01 Epub Date: 2024-04-05 DOI: 10.1214/23-aoas1826

Zehang Richard Li, Zhenke Wu, Irena Chen, Samuel J Clark

{"title":"BAYESIAN NESTED LATENT CLASS MODELS FOR CAUSE-OF-DEATH ASSIGNMENT USING VERBAL AUTOPSIES ACROSS MULTIPLE DOMAINS.","authors":"Zehang Richard Li, Zhenke Wu, Irena Chen, Samuel J Clark","doi":"10.1214/23-aoas1826","DOIUrl":"10.1214/23-aoas1826","url":null,"abstract":"Understanding cause-specific mortality rates is crucial for monitoring population health and designing public health interventions. Worldwide, two-thirds of deaths do not have a cause assigned. Verbal autopsy (VA) is a well-established tool to collect information describing deaths outside of hospitals by conducting surveys to caregivers of a deceased person. It is routinely implemented in many low- and middle-income countries. Statistical algorithms to assign cause of death using VAs are typically vulnerable to the distribution shift between the data used to train the model and the target population. This presents a major challenge for analyzing VAs, as labeled data are usually unavailable in the target population. This article proposes a latent class model framework for VA data (LCVA) that jointly models VAs collected over multiple heterogeneous domains, assigns causes of death for out-of-domain observations and estimates cause-specific mortality fractions for a new domain. We introduce a parsimonious representation of the joint distribution of the collected symptoms using nested latent class models and develop a computationally efficient algorithm for posterior inference. We demonstrate that LCVA outperforms existing methods in predictive performance and scalability. Supplementary Material and reproducible analysis codes are available online. The R package LCVA implementing the method is available on GitHub (https://github.com/richardli/LCVA).","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 2","pages":"1137-1159"},"PeriodicalIF":1.3,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11484295/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142479812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0