arXiv: Methodology最新文献_第5页

Quantifying Observed Prior Impact 量化观察到的先前影响

arXiv: Methodology Pub Date : 2020-01-29 DOI: 10.1214/21-BA1271

David E. Jones, R. Trangucci, Yang Chen

{"title":"Quantifying Observed Prior Impact","authors":"David E. Jones, R. Trangucci, Yang Chen","doi":"10.1214/21-BA1271","DOIUrl":"https://doi.org/10.1214/21-BA1271","url":null,"abstract":"We distinguish two questions (i) how much information does the prior contain? and (ii) what is the effect of the prior? Several measures have been proposed for quantifying effective prior sample size, for example Clarke [1996] and Morita et al. [2008]. However, these measures typically ignore the likelihood for the inference currently at hand, and therefore address (i) rather than (ii). Since in practice (ii) is of great concern, Reimherr et al. [2014] introduced a new class of effective prior sample size measures based on prior-likelihood discordance. We take this idea further towards its natural Bayesian conclusion by proposing measures of effective prior sample size that not only incorporate the general mathematical form of the likelihood but also the specific data at hand. Thus, our measures do not average across datasets from the working model, but condition on the current observed data. Consequently, our measures can be highly variable, but we demonstrate that this is because the impact of a prior can be highly variable. Our measures are Bayes estimates of meaningful quantities and well communicate the extent to which inference is determined by the prior, or framed differently, the amount of effort saved due to having prior information. We illustrate our ideas through a number of examples including a Gaussian conjugate model (continuous observations), a Beta-Binomial model (discrete observations), and a linear regression model (two unknown parameters). Future work on further developments of the methodology and an application to astronomy are discussed at the end.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134494275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

A likelihood analysis of quantile-matching transformations 分位数匹配变换的似然分析

arXiv: Methodology Pub Date : 2020-01-11 DOI: 10.1093/biomet/asaa048

P. McCullagh, M. Tresoldi

引用次数: 4

FORECASTING MULTIPLE FUNCTIONAL TIME SERIES IN A GROUP STRUCTURE: AN APPLICATION TO MORTALITY 群结构中多功能时间序列的预测:在死亡率中的应用

arXiv: Methodology Pub Date : 2020-01-10 DOI: 10.1017/asb.2020.3

H. Shang, S. Haberman

引用次数: 10

Stochastic approximation EM for exploratory item factor analysis 探索性项目因子分析的随机近似EM

arXiv: Methodology Pub Date : 2019-12-29 DOI: 10.7282/t3-7k3j-6x67 10.1002/sim.8217

E. Geis

{"title":"Stochastic approximation EM for exploratory item factor analysis","authors":"E. Geis","doi":"10.7282/t3-7k3j-6x67 10.1002/sim.8217","DOIUrl":"https://doi.org/10.7282/t3-7k3j-6x67 10.1002/sim.8217","url":null,"abstract":"The stochastic approximation EM algorithm (SAEM) is described for the estimation of item and person parameters given test data coded as dichotomous or ordinal variables. The method hinges upon the eigenanalysis of missing variables sampled as augmented data; the augmented data approach was introduced by Albert's seminal work applying Gibbs sampling to Item Response Theory in 1992. Similar to maximum likelihood factor analysis, the factor structure in this Bayesian approach depends only on sufficient statistics, which are computed from the missing latent data. A second feature of the SAEM algorithm is the use of the Robbins-Monro procedure for establishing convergence. Contrary to Expectation Maximization methods where costly integrals must be calculated, this method is well-suited for highly multidimensional data, and an annealing method is implemented to prevent convergence to a local maximum likelihood. Multiple calculations of errors applied within this framework of Markov Chain Monte Carlo are presented to delineate the uncertainty of parameter estimates. Given the nature of EFA (exploratory factor analysis), an algorithm is formalized leveraging the Tracy-Widom distribution for the retention of factors extracted from an eigenanalysis of the sufficient statistic of the covariance of the augmented data matrix. Simulation conditions of dichotomous and polytomous data, from one to ten dimensions of factor loadings, are used to assess statistical accuracy and to gauge computational time of the EFA approach of this IRT-specific implementation of the SAEM algorithm. Finally, three applications of this methodology are also reported that demonstrate the effectiveness of the method for enabling timely analyses as well as substantive interpretations when this method is applied to real data.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122227502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Symmetric Prior for Multinomial Probit Models 多项式概率模型的对称先验

arXiv: Methodology Pub Date : 2019-12-21 DOI: 10.1214/20-ba1233

Lane F Burgette, David Puelz, P. R. Hahn

引用次数: 14

Nonparametric efficient causal mediation with intermediate confounders 具有中间混杂因素的非参数有效因果中介

arXiv: Methodology Pub Date : 2019-12-20 DOI: 10.1093/biomet/asaa085

Iv'an D'iaz, N. Hejazi, K. Rudolph, M. Laan

{"title":"Nonparametric efficient causal mediation with intermediate confounders","authors":"Iv'an D'iaz, N. Hejazi, K. Rudolph, M. Laan","doi":"10.1093/biomet/asaa085","DOIUrl":"https://doi.org/10.1093/biomet/asaa085","url":null,"abstract":"Interventional effects for mediation analysis were proposed as a solution to the lack of identifiability of natural (in)direct effects in the presence of a mediator-outcome confounder affected by exposure. We present a theoretical and computational study of the properties of the interventional (in)direct effect estimands based on the efficient influence fucntion (EIF) in the non-parametric statistical model. We use the EIF to develop two asymptotically optimal, non-parametric estimators that leverage data-adaptive regression for estimation of the nuisance parameters: a one-step estimator and a targeted minimum loss estimator. A free and open source texttt{R} package implementing our proposed estimators is made available on GitHub. We further present results establishing the conditions under which these estimators are consistent, multiply robust, $n^{1/2}$-consistent and efficient. We illustrate the finite-sample performance of the estimators and corroborate our theoretical results in a simulation study. We also demonstrate the use of the estimators in our motivating application to elucidate the mechanisms behind the unintended harmful effects that a housing intervention had on adolescent girls' risk behavior.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132651244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 28

Large-scale inference of correlation among mixed-type biological traits with phylogenetic multivariate probit models 基于系统发育多元概率模型的混合型生物性状相关性大尺度推断

arXiv: Methodology Pub Date : 2019-12-19 DOI: 10.1214/20-AOAS1394

Zhenyu Zhang, A. Nishimura, P. Bastide, X. Ji, R. Payne, P. Goulder, P. Lemey, M. Suchard

{"title":"Large-scale inference of correlation among mixed-type biological traits with phylogenetic multivariate probit models","authors":"Zhenyu Zhang, A. Nishimura, P. Bastide, X. Ji, R. Payne, P. Goulder, P. Lemey, M. Suchard","doi":"10.1214/20-AOAS1394","DOIUrl":"https://doi.org/10.1214/20-AOAS1394","url":null,"abstract":"Inferring concerted changes among biological traits along an evolutionary history remains an important yet challenging problem. Besides adjusting for spurious correlation induced from the shared history, the task also requires sufficient flexibility and computational efficiency to incorporate multiple continuous and discrete traits as data size increases. To accomplish this, we jointly model mixed-type traits by assuming latent parameters for binary outcome dimensions at the tips of an unknown tree informed by molecular sequences. This gives rise to a phylogenetic multivariate probit model. With large sample sizes, posterior computation under this model is problematic, as it requires repeated sampling from a high-dimensional truncated normal distribution. Current best practices employ multiple-try rejection sampling that suffers from slow-mixing and a computational cost that scales quadratically in sample size. We develop a new inference approach that exploits 1) the bouncy particle sampler (BPS) based on piecewise deterministic Markov processes to simultaneously sample all truncated normal dimensions, and 2) novel dynamic programming that reduces the cost of likelihood and gradient evaluations for BPS to linear in sample size. In an application with 535 HIV viruses and 24 traits that necessitates sampling from a 12,840-dimensional truncated normal, our method makes it possible to estimate the across-trait correlation and detect factors that affect the pathogen's capacity to cause disease. This inference framework is also applicable to a broader class of covariance structures beyond comparative biology.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121974911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24

More for less: predicting and maximizing genomic variant discovery via Bayesian nonparametrics 多花少:通过贝叶斯非参数预测和最大化基因组变异发现

arXiv: Methodology Pub Date : 2019-12-11 DOI: 10.1093/BIOMET/ASAB012

L. Masoero, F. Camerlenghi, S. Favaro, Tamara Broderick

{"title":"More for less: predicting and maximizing genomic variant discovery via Bayesian nonparametrics","authors":"L. Masoero, F. Camerlenghi, S. Favaro, Tamara Broderick","doi":"10.1093/BIOMET/ASAB012","DOIUrl":"https://doi.org/10.1093/BIOMET/ASAB012","url":null,"abstract":"While the cost of sequencing genomes has decreased dramatically in recent years, this expense often remains non-trivial. Under a fixed budget, then, scientists face a natural trade-off between quantity and quality; they can spend resources to sequence a greater number of genomes (quantity) or spend resources to sequence genomes with increased accuracy (quality). Our goal is to find the optimal allocation of resources between quantity and quality. Optimizing resource allocation promises to reveal as many new variations in the genome as possible, and thus as many new scientific insights as possible. In this paper, we consider the common setting where scientists have already conducted a pilot study to reveal variants in a genome and are contemplating a follow-up study. We introduce a Bayesian nonparametric methodology to predict the number of new variants in the follow-up study based on the pilot study. When experimental conditions are kept constant between the pilot and follow-up, we demonstrate on real data from the gnomAD project that our prediction is more accurate than three recent proposals, and competitive with a more classic proposal. Unlike existing methods, though, our method allows practitioners to change experimental conditions between the pilot and the follow-up. We demonstrate how this distinction allows our method to be used for (i) more realistic predictions and (ii) optimal allocation of a fixed budget between quality and quantity.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115013220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Distribution-free Pointwise Adjusted %-values for Functional Hypotheses 功能假设的无分布逐点调整%值

arXiv: Methodology Pub Date : 2019-12-01 DOI: 10.1007/978-3-030-47756-1_32

Meng Xu, P. Reiss

引用次数: 3

Minkowski Distances and Standardisation for Clustering and Classification on High-Dimensional Data 高维数据聚类与分类的闵可夫斯基距离与标准化

arXiv: Methodology Pub Date : 2019-11-29 DOI: 10.1007/978-981-15-2700-5_6

C. Hennig

引用次数: 0