Annals of Applied Statistics最新文献

筛选
英文 中文
Fitting stochastic epidemic models to gene genealogies using linear noise approximation. 用线性噪声近似拟合随机流行病模型的基因谱系。
IF 1.8 4区 数学
Annals of Applied Statistics Pub Date : 2023-03-01 DOI: 10.1214/21-aoas1583
Mingwei Tang, Gytis Dudas, Trevor Bedford, Vladimir N Minin
{"title":"Fitting stochastic epidemic models to gene genealogies using linear noise approximation.","authors":"Mingwei Tang,&nbsp;Gytis Dudas,&nbsp;Trevor Bedford,&nbsp;Vladimir N Minin","doi":"10.1214/21-aoas1583","DOIUrl":"https://doi.org/10.1214/21-aoas1583","url":null,"abstract":"<p><p>Phylodynamics is a set of population genetics tools that aim at reconstructing demographic history of a population based on molecular sequences of individuals sampled from the population of interest. One important task in phylodynamics is to estimate changes in (effective) population size. When applied to infectious disease sequences such estimation of population size trajectories can provide information about changes in the number of infections. To model changes in the number of infected individuals, current phylodynamic methods use non-parametric approaches (e.g., Bayesian curve-fitting based on change-point models or Gaussian process priors), parametric approaches (e.g., based on differential equations), and stochastic modeling in conjunction with likelihood-free Bayesian methods. The first class of methods yields results that are hard to interpret epidemiologically. The second class of methods provides estimates of important epidemiological parameters, such as infection and removal/recovery rates, but ignores variation in the dynamics of infectious disease spread. The third class of methods is the most advantageous statistically, but relies on computationally intensive particle filtering techniques that limits its applications. We propose a Bayesian model that combines phylodynamic inference and stochastic epidemic models, and achieves computational tractability by using a linear noise approximation (LNA) - a technique that allows us to approximate probability densities of stochastic epidemic model trajectories. LNA opens the door for using modern Markov chain Monte Carlo tools to approximate the joint posterior distribution of the disease transmission parameters and of high dimensional vectors describing unobserved changes in the stochastic epidemic model compartment sizes (e.g., numbers of infectious and susceptible individuals). In a simulation study, we show that our method can successfully recover parameters of stochastic epidemic models. We apply our estimation technique to Ebola genealogies estimated using viral genetic data from the 2014 epidemic in Sierra Leone and Liberia.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"17 1","pages":"1-22"},"PeriodicalIF":1.8,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10237588/pdf/nihms-1891709.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9955586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Probabilistic HIV recency classification-a logistic regression without labeled individual level training data. 概率HIV近期分类——一种没有标记个人水平训练数据的逻辑回归。
IF 1.8 4区 数学
Annals of Applied Statistics Pub Date : 2023-03-01 Epub Date: 2023-01-24 DOI: 10.1214/22-aoas1618
Ben Sheng, Changcheng Li, Le Bao, Runze Li
{"title":"Probabilistic HIV recency classification-a logistic regression without labeled individual level training data.","authors":"Ben Sheng,&nbsp;Changcheng Li,&nbsp;Le Bao,&nbsp;Runze Li","doi":"10.1214/22-aoas1618","DOIUrl":"10.1214/22-aoas1618","url":null,"abstract":"<p><p>Accurate HIV incidence estimation based on individual recent infection status (recent vs long-term infection) is important for monitoring the epidemic, targeting interventions to those at greatest risk of new infection, and evaluating existing programs of prevention and treatment. Starting from 2015, the Population-based HIV Impact Assessment (PHIA) individual-level surveys are implemented in the most-affected countries in sub-Saharan Africa. PHIA is a nationally-representative HIV-focused survey that combines household visits with key questions and cutting-edge technologies such as biomarker tests for HIV antibody and HIV viral load which offer the unique opportunity of distinguishing between recent infection and long-term infection, and providing relevant HIV information by age, gender, and location. In this article, we propose a semi-supervised logistic regression model for estimating individual level HIV recency status. It incorporates information from multiple data sources - the PHIA survey where the true HIV recency status is unknown, and the cohort studies provided in the literature where the relationship between HIV recency status and the covariates are presented in the form of a contingency table. It also utilizes the national level HIV incidence estimates from the epidemiology model. Applying the proposed model to Malawi PHIA data, we demonstrate that our approach is more accurate for the individual level estimation and more appropriate for estimating HIV recency rates at aggregated levels than the current practice - the binary classification tree (BCT).</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"17 1","pages":"108-129"},"PeriodicalIF":1.8,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10577400/pdf/nihms-1886688.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41240660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A SPATIAL CAUSAL ANALYSIS OF WILDLAND FIRE-CONTRIBUTED PM2.5 USING NUMERICAL MODEL OUTPUT. 利用数值模型输出对野地火灾造成的 pm2.5 进行空间因果分析。
IF 1.3 4区 数学
Annals of Applied Statistics Pub Date : 2022-12-01 Epub Date: 2022-09-26 DOI: 10.1214/22-aoas1610
Alexandra Larsen, Shu Yang, Brian J Reich, Ana G Rappold
{"title":"A SPATIAL CAUSAL ANALYSIS OF WILDLAND FIRE-CONTRIBUTED PM<sub>2.5</sub> USING NUMERICAL MODEL OUTPUT.","authors":"Alexandra Larsen, Shu Yang, Brian J Reich, Ana G Rappold","doi":"10.1214/22-aoas1610","DOIUrl":"10.1214/22-aoas1610","url":null,"abstract":"<p><p>Wildland fire smoke contains hazardous levels of fine particulate matter (PM<sub>2.5</sub>), a pollutant shown to adversely effect health. Estimating fire attributable PM<sub>2.5</sub> concentrations is key to quantifying the impact on air quality and subsequent health burden. This is a challenging problem since only total PM<sub>2.5</sub> is measured at monitoring stations and both fire-attributable PM<sub>2.5</sub> and PM<sub>2.5</sub> from all other sources are correlated in space and time. We propose a framework for estimating fire-contributed PM<sub>2.5</sub> and PM<sub>2.5</sub> from all other sources using a novel causal inference framework and bias-adjusted chemical model representations of PM<sub>2.5</sub> under counterfactual scenarios. The chemical model representation of PM<sub>2.5</sub> for this analysis is simulated using Community Multiscale Air Quality Modeling System (CMAQ), run with and without fire emissions across the contiguous U.S. for the 2008-2012 wildfire seasons. The CMAQ output is calibrated with observations from monitoring sites for the same spatial domain and time period. We use a Bayesian model that accounts for spatial variation to estimate the effect of wildland fires on PM<sub>2.5</sub> and state assumptions under which the estimate has a valid causal interpretation. Our results include estimates of the contributions of wildfire smoke to PM<sub>2.5</sub> for the contiguous U.S. Additionally, we compute the health burden associated with the PM<sub>2.5</sub> attributable to wildfire smoke.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"16 4","pages":"2714-2731"},"PeriodicalIF":1.3,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10181852/pdf/nihms-1846188.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9468690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hierarchical resampling for bagging in multistudy prediction with applications to human neurochemical sensing. 多研究预测中的分级重采样(Hierarchical resampling for bagging),应用于人类神经化学传感。
IF 1.8 4区 数学
Annals of Applied Statistics Pub Date : 2022-12-01 Epub Date: 2022-09-26 DOI: 10.1214/21-aoas1574
Gabriel Loewinger, Prasad Patil, Kenneth T Kishida, Giovanni Parmigiani
{"title":"Hierarchical resampling for bagging in multistudy prediction with applications to human neurochemical sensing.","authors":"Gabriel Loewinger, Prasad Patil, Kenneth T Kishida, Giovanni Parmigiani","doi":"10.1214/21-aoas1574","DOIUrl":"10.1214/21-aoas1574","url":null,"abstract":"<p><p>We propose the \"study strap ensemble\", which combines advantages of two common approaches to fitting prediction models when multiple training datasets (\"studies\") are available: pooling studies and fitting one model versus averaging predictions from multiple models each fit to individual studies. The study strap ensemble fits models to bootstrapped datasets, or \"pseudo-studies.\" These are generated by resampling from multiple studies with a hierarchical resampling scheme that generalizes the randomized cluster bootstrap. The study strap is controlled by a tuning parameter that determines the proportion of observations to draw from each study. When the parameter is set to its lowest value, each pseudo-study is resampled from only a single study. When it is high, the study strap ignores the multi-study structure and generates pseudo-studies by merging the datasets and drawing observations like a standard bootstrap. We empirically show the optimal tuning value often lies in between, and prove that special cases of the study strap draw the merged dataset and the set of original studies as pseudo-studies. We extend the study strap approach with an ensemble weighting scheme that utilizes information in the distribution of the covariates of the test dataset. Our work is motivated by neuroscience experiments using real-time neurochemical sensing during awake behavior in humans. Current techniques to perform this kind of research require measurements from an electrode placed in the brain during awake neurosurgery and rely on prediction models to estimate neurotransmitter concentrations from the electrical measurements recorded by the electrode. These models are trained by combining multiple datasets that are collected <i>in vitro</i> under heterogeneous conditions in order to promote accuracy of the models when applied to data collected in the brain. A prevailing challenge is deciding how to combine studies or ensemble models trained on different studies to enhance model generalizability. Our methods produce marked improvements in simulations and in this application. All methods are available in the studyStrap CRAN package.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"16 4","pages":"2145-2165"},"PeriodicalIF":1.8,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9586160/pdf/nihms-1800688.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10733907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NETWORK DIFFERENTIAL CONNECTIVITY ANALYSIS. 网络差分连接性分析。
IF 1.3 4区 数学
Annals of Applied Statistics Pub Date : 2022-12-01 Epub Date: 2022-09-26 DOI: 10.1214/21-aoas1581
Sen Zhao, Ali Shojaie
{"title":"NETWORK DIFFERENTIAL CONNECTIVITY ANALYSIS.","authors":"Sen Zhao, Ali Shojaie","doi":"10.1214/21-aoas1581","DOIUrl":"10.1214/21-aoas1581","url":null,"abstract":"<p><p>Identifying differences in networks has become a canonical problem in many biological applications. Existing methods try to accomplish this goal by either directly comparing the estimated structures of two networks, or testing the null hypothesis that the covariance or inverse covariance matrices in two populations are identical. However, estimation approaches do not provide measures of uncertainty, e.g., <i>p</i>-values, whereas existing testing approaches could lead to misleading results, as we illustrate in this paper. To address these shortcomings, we propose a <i>qualitative</i> hypothesis testing framework, which tests whether the connectivity <i>structures</i> in the two networks are the same. our framework is especially appropriate if the goal is to identify nodes or edges that are differentially connected. No existing approach could test such hypotheses and provide corresponding measures of uncertainty. Theoretically, we show that under appropriate conditions, our proposal correctly controls the type-I error rate in testing the qualitative hypothesis. Empirically, we demonstrate the performance of our proposal using simulation studies and applications in cancer genomics.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"16 4","pages":"2166-2182"},"PeriodicalIF":1.3,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10569671/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41240659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semi-Supervised Non-Parametric Bayesian Modelling of Spatial Proteomics. 空间蛋白质组学的半监督非参数贝叶斯建模
IF 1.3 4区 数学
Annals of Applied Statistics Pub Date : 2022-12-01 DOI: 10.1214/22-AOAS1603
Oliver M Crook, Kathryn S Lilley, Laurent Gatto, Paul D W Kirk
{"title":"Semi-Supervised Non-Parametric Bayesian Modelling of Spatial Proteomics.","authors":"Oliver M Crook, Kathryn S Lilley, Laurent Gatto, Paul D W Kirk","doi":"10.1214/22-AOAS1603","DOIUrl":"10.1214/22-AOAS1603","url":null,"abstract":"<p><p>Understanding sub-cellular protein localisation is an essential component in the analysis of context specific protein function. Recent advances in quantitative mass-spectrometry (MS) have led to high resolution mapping of thousands of proteins to sub-cellular locations within the cell. Novel modelling considerations to capture the complex nature of these data are thus necessary. We approach analysis of spatial proteomics data in a non-parametric Bayesian framework, using K-component mixtures of Gaussian process regression models. The Gaussian process regression model accounts for correlation structure within a sub-cellular niche, with each mixture component capturing the distinct correlation structure observed within each niche. The availability of <i>marker proteins</i> (i.e. proteins with <i>a priori</i> known labelled locations) motivates a semi-supervised learning approach to inform the Gaussian process hyperparameters. We moreover provide an efficient Hamiltonian-within-Gibbs sampler for our model. Furthermore, we reduce the computational burden associated with inversion of covariance matrices by exploiting the structure in the covariance matrix. A tensor decomposition of our covariance matrices allows extended Trench and Durbin algorithms to be applied to reduce the computational complexity of inversion and hence accelerate computation. We provide detailed case-studies on <i>Drosophila</i> embryos and mouse pluripotent embryonic stem cells to illustrate the benefit of semi-supervised functional Bayesian modelling of the data.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"16 4","pages":""},"PeriodicalIF":1.3,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7613899/pdf/EMS143956.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9155886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AN OMNIBUS TEST FOR DETECTION OF SUBGROUP TREATMENT EFFECTS VIA DATA PARTITIONING. 通过数据分区检测亚组治疗效果的综合测试。
IF 1.8 4区 数学
Annals of Applied Statistics Pub Date : 2022-12-01 Epub Date: 2022-09-26 DOI: 10.1214/21-AOAS1589
Yifei Sun, Xuming He, Jianhua Hu
{"title":"AN OMNIBUS TEST FOR DETECTION OF SUBGROUP TREATMENT EFFECTS VIA DATA PARTITIONING.","authors":"Yifei Sun, Xuming He, Jianhua Hu","doi":"10.1214/21-AOAS1589","DOIUrl":"10.1214/21-AOAS1589","url":null,"abstract":"<p><p>Late-stage clinical trials have been conducted primarily to establish the efficacy of a new treatment in an intended population. A corollary of population heterogeneity in clinical trials is that a treatment might be effective for one or more subgroups, rather than for the whole population of interest. As an example, the phase III clinical trial of panitumumab in metastatic colorectal cancer patients failed to demonstrate its efficacy in the overall population, but a subgroup associated with tumor KRAS status was found to be promising (Peeters et al. (<i>Am. J. Clin. Oncol.</i> 28 (2010) 4706-4713)). As we search for such subgroups via data partitioning based on a large number of biomarkers, we need to guard against inflated type I error rates due to multiple testing. Commonly-used multiplicity adjustments tend to lose power for the detection of subgroup treatment effects. We develop an effective omnibus test to detect the existence of, at least, one subgroup treatment effect, allowing a large number of possible subgroups to be considered and possibly censored outcomes. Applied to the panitumumab trial data, the proposed test would confirm a significant subgroup treatment effect. Empirical studies also show that the proposed test is applicable to a variety of outcome variables and maintains robust statistical power.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"16 4","pages":"2266-2278"},"PeriodicalIF":1.8,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10381789/pdf/nihms-1919024.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9973657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CRITICAL WINDOW VARIABLE SELECTION FOR MIXTURES: ESTIMATING THE IMPACT OF MULTIPLE AIR POLLUTANTS ON STILLBIRTH. 混合物的关键窗口变量选择:估计多种空气污染物对死产的影响。
IF 1.8 4区 数学
Annals of Applied Statistics Pub Date : 2022-09-01 DOI: 10.1214/21-aoas1560
Joshua L Warren, Howard H Chang, Lauren K Warren, Matthew J Strickland, Lyndsey A Darrow, James A Mulholland
{"title":"CRITICAL WINDOW VARIABLE SELECTION FOR MIXTURES: ESTIMATING THE IMPACT OF MULTIPLE AIR POLLUTANTS ON STILLBIRTH.","authors":"Joshua L Warren,&nbsp;Howard H Chang,&nbsp;Lauren K Warren,&nbsp;Matthew J Strickland,&nbsp;Lyndsey A Darrow,&nbsp;James A Mulholland","doi":"10.1214/21-aoas1560","DOIUrl":"https://doi.org/10.1214/21-aoas1560","url":null,"abstract":"<p><p>Understanding the role of time-varying pollution mixtures on human health is critical as people are simultaneously exposed to multiple pollutants during their lives. For vulnerable subpopulations who have well-defined exposure periods (e.g., pregnant women), questions regarding critical windows of exposure to these mixtures are important for mitigating harm. We extend critical window variable selection (CWVS) to the multipollutant setting by introducing CWVS for mixtures (CWVSmix), a hierarchical Bayesian method that combines smoothed variable selection and temporally correlated weight parameters to: (i) identify critical windows of exposure to mixtures of time-varying pollutants, (ii) estimate the time-varying relative importance of each individual pollutant and their first order interactions within the mixture, and (iii) quantify the impact of the mixtures on health. Through simulation we show that CWVSmix offers the best balance of performance in each of these categories in comparison to competing methods. Using these approaches, we investigate the impact of exposure to multiple ambient air pollutants on the risk of stillbirth in New Jersey, 2005-2014. We find consistent elevated risk in gestational weeks 2, 16-17, and 20 for non-Hispanic Black mothers, with pollution mixtures dominated by ammonium (weeks 2, 17, 20), nitrate (weeks 2, 17), nitrogen oxides (weeks 2, 16), PM<sub>2.5</sub> (week 2), and sulfate (week 20). The method is available in the R package CWVSmix.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"16 3","pages":"1633-1652"},"PeriodicalIF":1.8,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9854390/pdf/nihms-1863002.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10124900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LARGE-SCALE MULTIVARIATE SPARSE REGRESSION WITH APPLICATIONS TO UK BIOBANK. 大规模多元稀疏回归在英国生物银行中的应用。
IF 1.8 4区 数学
Annals of Applied Statistics Pub Date : 2022-09-01 DOI: 10.1214/21-aoas1575
Junyang Qian, Yosuke Tanigawa, Ruilin Li, Robert Tibshirani, Manuel A Rivas, Trevor Hastie
{"title":"LARGE-SCALE MULTIVARIATE SPARSE REGRESSION WITH APPLICATIONS TO UK BIOBANK.","authors":"Junyang Qian,&nbsp;Yosuke Tanigawa,&nbsp;Ruilin Li,&nbsp;Robert Tibshirani,&nbsp;Manuel A Rivas,&nbsp;Trevor Hastie","doi":"10.1214/21-aoas1575","DOIUrl":"https://doi.org/10.1214/21-aoas1575","url":null,"abstract":"<p><p>In high-dimensional regression problems, often a relatively small subset of the features are relevant for predicting the outcome, and methods that impose sparsity on the solution are popular. When multiple correlated outcomes are available (multitask), reduced rank regression is an effective way to borrow strength and capture latent structures that underlie the data. Our proposal is motivated by the UK Biobank population-based cohort study, where we are faced with large-scale, ultrahigh-dimensional features, and have access to a large number of outcomes (phenotypes)-lifestyle measures, biomarkers, and disease outcomes. We are hence led to fit sparse reduced-rank regression models, using computational strategies that allow us to scale to problems of this size. We use a scheme that alternates between solving the sparse regression problem and solving the reduced rank decomposition. For the sparse regression component we propose a scalable iterative algorithm based on adaptive screening that leverages the sparsity assumption and enables us to focus on solving much smaller subproblems. The full solution is reconstructed and tested via an optimality condition to make sure it is a valid solution for the original problem. We further extend the method to cope with practical issues, such as the inclusion of confounding variables and imputation of missing values among the phenotypes. Experiments on both synthetic data and the UK Biobank data demonstrate the effectiveness of the method and the algorithm. We present multiSnpnet package, available at http://github.com/junyangq/multiSnpnet that works on top of PLINK2 files, which we anticipate to be a valuable tool for generating polygenic risk scores from human genetic studies.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"16 3","pages":"1891-1918"},"PeriodicalIF":1.8,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9454085/pdf/nihms-1830548.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9399257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A BAYESIAN HIERARCHICAL MODEL FOR COMBINING MULTIPLE DATA SOURCES IN POPULATION SIZE ESTIMATION. 在种群数量估计中结合多种数据源的贝叶斯分层模型。
IF 1.3 4区 数学
Annals of Applied Statistics Pub Date : 2022-09-01 Epub Date: 2022-07-19 DOI: 10.1214/21-AOAS1556
Jacob Parsons, Xiaoyue Niu, Le Bao
{"title":"A BAYESIAN HIERARCHICAL MODEL FOR COMBINING MULTIPLE DATA SOURCES IN POPULATION SIZE ESTIMATION.","authors":"Jacob Parsons, Xiaoyue Niu, Le Bao","doi":"10.1214/21-AOAS1556","DOIUrl":"10.1214/21-AOAS1556","url":null,"abstract":"<p><p>To combat the HIV/AIDS pandemic effectively, targeted interventions among certain key populations play a critical role. Examples of such key populations include sex workers, people who inject drugs, and men who have sex with men. While having accurate estimates for the size of these key populations is important, any attempt to directly contact or count members of these populations is difficult. As a result, indirect methods are used to produce size estimates. Multiple approaches for estimating the size of such populations have been suggested but often give conflicting results. It is, therefore, necessary to have a principled way to combine and reconcile these estimates. To this end, we present a Bayesian hierarchical model for estimating the size of key populations that combines multiple estimates from different sources of information. The proposed model makes use of multiple years of data and explicitly models the systematic error in the data sources used. We use the model to estimate the size of people who inject drugs in Ukraine. We evaluate the appropriateness of the model and compare the contribution of each data source to the final estimates.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"16 3","pages":"1550-1562"},"PeriodicalIF":1.3,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10150643/pdf/nihms-1889948.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9465730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信