{"title":"UTILIZING A CAPTURE-RECAPTURE STRATEGY TO ACCELERATE INFECTIOUS DISEASE SURVEILLANCE.","authors":"Lin Ge, Yuzi Zhang, Lance Waller, Robert Lyles","doi":"10.1214/24-aoas1927","DOIUrl":"10.1214/24-aoas1927","url":null,"abstract":"<p><p>Monitoring key elements of disease dynamics (e.g., prevalence, case counts) is of great importance in infectious disease prevention and control, as emphasized during the COVID-19 pandemic. To facilitate this effort, we propose a new capture-recapture (CRC) analysis strategy that adjusts for misclassification stemming from the use of easily administered but imperfect diagnostic test kits, such as rapid antigen test-kits or saliva tests. Our method is based on a recently proposed \"anchor stream\" design, whereby an existing voluntary surveillance data stream is augmented by a smaller and judiciously drawn random sample. It incorporates manufacturer-specified sensitivity and specificity parameters to account for imperfect diagnostic results in one or both data streams. For inference to accompany case count estimation, we improve upon traditional Wald-type confidence intervals by developing an adapted Bayesian credible interval for the CRC estimator that yields favorable frequentist coverage properties. When feasible, the proposed design and analytic strategy provides a more efficient solution than traditional CRC methods or random sampling-based bias-corrected estimation to monitor disease prevalence while accounting for misclassification. We demonstrate the benefits of this approach through simulation studies and a numerical example that underscore its potential utility in practice for economical disease monitoring among a registered closed population.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 4","pages":"3130-3145"},"PeriodicalIF":1.3,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12273866/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144676401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bonnie E Shook-Sa, Michael G Hudgens, Andrea K Knittel, Andrew Edmonds, Catalina Ramirez, Stephen R Cole, Mardge Cohen, Adebola Adedimeji, Tonya Taylor, Katherine G Michel, Andrea Kovacs, Jennifer Cohen, Jessica Donohue, Antonina Foster, Margaret A Fischl, Dustin Long, Adaora A Adimora
{"title":"EXPOSURE EFFECTS ON COUNT OUTCOMES WITH OBSERVATIONAL DATA, WITH APPLICATION TO INCARCERATED WOMEN.","authors":"Bonnie E Shook-Sa, Michael G Hudgens, Andrea K Knittel, Andrew Edmonds, Catalina Ramirez, Stephen R Cole, Mardge Cohen, Adebola Adedimeji, Tonya Taylor, Katherine G Michel, Andrea Kovacs, Jennifer Cohen, Jessica Donohue, Antonina Foster, Margaret A Fischl, Dustin Long, Adaora A Adimora","doi":"10.1214/24-aoas1874","DOIUrl":"10.1214/24-aoas1874","url":null,"abstract":"<p><p>Causal inference methods can be applied to estimate the effect of a point exposure or treatment on an outcome of interest using data from observational studies. For example, in the Women's Interagency HIV Study, it is of interest to understand the effects of incarceration on the number of sexual partners and the number of cigarettes smoked after incarceration. In settings like this where the outcome is a count, the estimand is often the causal mean ratio, i.e., the ratio of the counterfactual mean count under exposure to the counterfactual mean count under no exposure. This paper considers estimators of the causal mean ratio based on inverse probability of treatment weights, the parametric g-formula, and doubly robust estimation, each of which can account for overdispersion, zero-inflation, and heaping in the measured outcome. Methods are compared in simulations and are applied to data from the Women's Interagency HIV Study.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 3","pages":"2147-2165"},"PeriodicalIF":1.3,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11526847/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142570975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"BIVARIATE FUNCTIONAL PATTERNS OF LIFETIME MEDICARE COSTS AMONG ESRD PATIENTS.","authors":"Yue Wang, Bin Nan, John D Kalbfleisch","doi":"10.1214/24-aoas1897","DOIUrl":"10.1214/24-aoas1897","url":null,"abstract":"<p><p>In this work we study the lifetime Medicare spending patterns of patients with end-stage renal disease (ESRD). We extract the information of patients who started their ESRD services in 2007-2011 from the United States Renal Data System (USRDS). Patients are partitioned into three groups based on their kidney transplant status: 1-unwaitlisted and never transplanted, 2-waitlisted but never transplanted, and 3-waitlisted and then transplanted. To study their Medicare cost trajectories, we use a semiparametric regression model with both fixed and bivariate time-varying coefficients to compare groups 1 and 2, and a bivariate time-varying coefficient model with different starting times (time since the first ESRD service and time since the kidney transplant) to compare groups 2 and 3. In addition to demographics and other medical conditions, these regression models are conditional on the survival time, which ideally depict the lifetime Medicare spending patterns. For estimation, we extend the profile weighted least squares (PWLS) estimator to longitudinal data for the first comparison and propose a two-stage estimating method for the second comparison. We use sandwich variance estimators to construct confidence intervals and validate inference procedures through simulations. Our analysis of the Medicare claims data reveals that waitlisting is associated with a lower daily medical cost at the beginning of ESRD service among waitlisted patients which gradually increases over time. Averaging over lifespan, however, there is no difference between waitlisted and unwaitlisted groups. A kidney transplant, on the other hand, reduces the medical cost significantly after an initial spike.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 3","pages":"2596-2614"},"PeriodicalIF":1.3,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11488692/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142479908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SEMIPARAMETRIC LINEAR REGRESSION WITH AN INTERVAL-CENSORED COVARIATE IN THE ATHEROSCLEROSIS RISK IN COMMUNITIES STUDY.","authors":"Richard Sizelove, Donglin Zeng, Dan-Yu Lin","doi":"10.1214/24-aoas1881","DOIUrl":"10.1214/24-aoas1881","url":null,"abstract":"<p><p>In longitudinal studies, investigators are often interested in understanding how the time since the occurrence of an intermediate event affects a future outcome. The intermediate event is often asymptomatic such that its occurrence is only known to lie in a time interval induced by periodic examinations. We propose a linear regression model that relates the time since the occurrence of the intermediate event to a continuous response at a future time point through a rectified linear unit activation function while formulating the distribution of the time to the occurrence of the intermediate event through the Cox proportional hazards model. We consider nonparametric maximum likelihood estimation with an arbitrary sequence of examination times for each subject. We present an EM algorithm that converges stably for arbitrary datasets. The resulting estimators of regression parameters are consistent, asymptotically normal, and asymptotically efficient. We assess the performance of the proposed methods through extensive simulation studies and provide an application to the Atherosclerosis Risk in Communities Study.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 3","pages":"2295-2306"},"PeriodicalIF":1.3,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12272158/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144676400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"AN INTEGRATIVE NETWORK-BASED MEDIATION MODEL (NMM) TO ESTIMATE MULTIPLE GENETIC EFFECTS ON OUTCOMES MEDIATED BY FUNCTIONAL CONNECTIVITY.","authors":"Wei Dai, Heping Zhang","doi":"10.1214/24-aoas1880","DOIUrl":"10.1214/24-aoas1880","url":null,"abstract":"<p><p>Functional connectivity of the brain, characterized by interconnected neural circuits across functional networks, is a cutting-edge feature in neuroimaging. It has the potential to mediate the effect of genetic variants on behavioral outcomes or diseases. Existing mediation analysis methods can evaluate the impact of genetics and brain structurefunction on cognitive behavior or disorders, but they tend to be limited to single genetic variants or univariate mediators, without considering cumulative genetic effects and the complex matrix and group and network structures of functional connectivity. To address this gap, the paper presents an integrative network-based mediation model (NMM) that estimates the effect of multiple genetic variants on behavioral outcomes or diseases mediated by functional connectivity. The model incorporates group information of inter-regions at broad network level and imposes low-rank and sparse assumptions to reflect the complex structures of functional connectivity and selecting network mediators simultaneously. We adopt block coordinate descent algorithm to implement a fast and efficient solution to our model. Simulation results indicate the efficacy of the model in selecting active mediators and reducing bias in effect estimation. With application to the Human Connectome Project Youth Adult (HCP-YA) study of 493 young adults, two genetic variants (rs769448 and rs769449) on the <i>APOE4</i> gene are identified that lead to deficits in functional connectivity within visual networks and fluid intelligence.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 3","pages":"2277-2294"},"PeriodicalIF":1.3,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11616023/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142787675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fan Yang, Lin S Chen, Shahram Oveisgharan, Dawood Darbar, David A Bennett
{"title":"INTEGRATING MENDELIAN RANDOMIZATION WITH CAUSAL MEDIATION ANALYSES FOR CHARACTERIZING DIRECT AND INDIRECT EXPOSURE-TO-OUTCOME EFFECTS.","authors":"Fan Yang, Lin S Chen, Shahram Oveisgharan, Dawood Darbar, David A Bennett","doi":"10.1214/24-aoas1901","DOIUrl":"10.1214/24-aoas1901","url":null,"abstract":"<p><p>Mendelian randomization (MR) assesses the total effect of exposure on outcome. With the rapidly increasing availability of summary statistics from genome-wide association studies (GWASs), MR leverages existing summary statistics and is widely used to study the causal effects among complex traits and diseases. The total effect in the population is a sum of indirect and direct effects. For complex disease outcomes with complicated etiologies, and/or for modifiable exposure traits, there may exist more than one pathway between exposure and outcome. The direct effect and the indirect effect via a mediator of interest could be of opposite directions, and the total effect estimates may not be informative for treatment and prevention decision-making or may be even misleading for different subgroups of patients. Causal mediation analysis delineates the indirect effect of exposure on outcome operating through the mediator and the direct effect transmitted through other mechanisms. However, causal mediation analysis often requires individual-level data measured on exposure, outcome, mediator and confounding variables, and the power of the mediation analysis is restricted by sample size. In this work, motivated by a study of the effects of atrial fibrillation (AF) on Alzheimer's dementia, we propose a framework for Integrative Mendelian randomization and Mediation Analysis (IMMA). The proposed method integrates the total effect estimates from MR analyses based on large-scale GWASs with the direct and indirect effect estimates from mediation analysis based on individual-level data of a limited sample size. We introduce a series of IMMA models, under the scenarios with or without exposure-mediator interaction and/or study heterogeneity. The proposed IMMA models improve the estimation and the power of inference on the direct and indirect effects in the population, as well as the characterization of the variation of effects. Our analyses showed a significant positive direct effect of AF on Alzheimer's dementia risk not through the use of the oral anticoagulant treatment and a significant indirect effect of AF-induced anticoagulant treatment in reducing Alzheimer's dementia risk. The results suggested potential Alzheimer's dementia risk prediction and prevention strategies for AF patients, and paved the way for future re-evaluation of anticoagulant treatment guidelines for AF patients. A sensitivity analysis was conducted to assess the sensitivity of the conclusions to a key assumption of the IMMA approach.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 3","pages":"2656-2677"},"PeriodicalIF":1.3,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11845245/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143484582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mykhaylo M Malakhov, Ben Dai, Xiaotong T Shen, Wei Pan
{"title":"A bootstrap model comparison test for identifying genes with context-specific patterns of genetic regulation.","authors":"Mykhaylo M Malakhov, Ben Dai, Xiaotong T Shen, Wei Pan","doi":"10.1214/23-aoas1859","DOIUrl":"10.1214/23-aoas1859","url":null,"abstract":"<p><p>Understanding how genetic variation affects gene expression is essential for a complete picture of the functional pathways that give rise to complex traits. Although numerous studies have established that many genes are differentially expressed in distinct human tissues and cell types, no tools exist for identifying the genes whose expression is differentially regulated. Here we introduce DRAB (differential regulation analysis by bootstrapping), a gene-based method for testing whether patterns of genetic regulation are significantly different between tissues or other biological contexts. DRAB first leverages the elastic net to learn context-specific models of local genetic regulation and then applies a novel bootstrap-based model comparison test to check their equivalency. Unlike previous model comparison tests, our proposed approach can determine whether population-level models have equal predictive performance by accounting for the variability of feature selection and model training. We validated DRAB on mRNA expression data from a variety of human tissues in the Genotype-Tissue Expression (GTEx) Project. DRAB yielded biologically reasonable results and had sufficient power to detect genes with tissue-specific regulatory profiles while effectively controlling false positives. By providing a framework that facilitates the prioritization of differentially regulated genes, our study enables future discoveries on the genetic architecture of molecular phenotypes.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 3","pages":"1840-1857"},"PeriodicalIF":1.3,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11484521/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142479813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guanghao Zhang, Lauren J Beesley, Bhramar Mukherjee, X U Shi
{"title":"PATIENT RECRUITMENT USING ELECTRONIC HEALTH RECORDS UNDER SELECTION BIAS: A TWO-PHASE SAMPLING FRAMEWORK.","authors":"Guanghao Zhang, Lauren J Beesley, Bhramar Mukherjee, X U Shi","doi":"10.1214/23-aoas1860","DOIUrl":"10.1214/23-aoas1860","url":null,"abstract":"<p><p>Electronic health records (EHRs) are increasingly recognized as a cost-effective resource for patient recruitment in clinical research. However, how to optimally select a cohort from millions of individuals to answer a scientific question of interest remains unclear. Consider a study to estimate the mean or mean difference of an expensive outcome. Inexpensive auxiliary covariates predictive of the outcome may often be available in patients' health records, presenting an opportunity to recruit patients selectively, which may improve efficiency in downstream analyses. In this paper we propose a two-phase sampling design that leverages available information on auxiliary covariates in EHR data. A key challenge in using EHR data for multiphase sampling is the potential selection bias, because EHR data are not necessarily representative of the target population. Extending existing literature on two-phase sampling design, we derive an optimal two-phase sampling method that improves efficiency over random sampling while accounting for the potential selection bias in EHR data. We demonstrate the efficiency gain from our sampling design via simulation studies and an application evaluating the prevalence of hypertension among U.S. adults leveraging data from the Michigan Genomics Initiative, a longitudinal biorepository in Michigan Medicine.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 3","pages":"1858-1878"},"PeriodicalIF":1.3,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11323140/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141989442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaoran Ma, Wensheng Guo, Mengyang Gu, Len Usvyat, Peter Kotanko, Yuedong Wang
{"title":"A NONPARAMETRIC MIXED-EFFECTS MIXTURE MODEL FOR PATTERNS OF CLINICAL MEASUREMENTS ASSOCIATED WITH COVID-19.","authors":"Xiaoran Ma, Wensheng Guo, Mengyang Gu, Len Usvyat, Peter Kotanko, Yuedong Wang","doi":"10.1214/23-aoas1871","DOIUrl":"10.1214/23-aoas1871","url":null,"abstract":"<p><p>Some patients with COVID-19 show changes in signs and symptoms such as temperature and oxygen saturation days before being positively tested for SARS-CoV-2, while others remain asymptomatic. It is important to identify these subgroups and to understand what biological and clinical predictors are related to these subgroups. This information will provide insights into how the immune system may respond differently to infection and can further be used to identify infected individuals. We propose a flexible nonparametric mixed-effects mixture model that identifies risk factors and classifies patients with biological changes. We model the latent probability of biological changes using a logistic regression model and trajectories in the latent groups using smoothing splines. We developed an EM algorithm to maximize the penalized likelihood for estimating all parameters and mean functions. We evaluate our methods by simulations and apply the proposed model to investigate changes in temperature in a cohort of COVID-19-infected hemodialysis patients.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 3","pages":"2080-2095"},"PeriodicalIF":1.3,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11460989/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142394985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L U You,Falastin Salami,Carina Törn,Åke Lernmark,Roy Tamura
{"title":"JOINT MODELING OF MULTISTATE AND NONPARAMETRIC MULTIVARIATE LONGITUDINAL DATA.","authors":"L U You,Falastin Salami,Carina Törn,Åke Lernmark,Roy Tamura","doi":"10.1214/24-aoas1889","DOIUrl":"https://doi.org/10.1214/24-aoas1889","url":null,"abstract":"It is oftentimes the case in studies of disease progression that subjects can move into one of several disease states of interest. Multistate models are an indispensable tool to analyze data from such studies. The Environmental Determinants of Diabetes in the Young (TEDDY) is an observational study of at-risk children from birth to onset of type-1 diabetes (T1D) up through the age of 15. A joint model for simultaneous inference of multistate and multivariate nonparametric longitudinal data is proposed to analyze data and answer the research questions brought up in the study. The proposed method allows us to make statistical inferences, test hypotheses, and make predictions about future state occupation in the TEDDY study. The performance of the proposed method is evaluated by simulation studies. The proposed method is applied to the motivating example to demonstrate the capabilities of the method.","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"21 1","pages":"2444-2461"},"PeriodicalIF":1.8,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142259215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}