{"title":"Quantifying Observed Prior Impact","authors":"David E. Jones, R. Trangucci, Yang Chen","doi":"10.1214/21-BA1271","DOIUrl":"https://doi.org/10.1214/21-BA1271","url":null,"abstract":"We distinguish two questions (i) how much information does the prior contain? and (ii) what is the effect of the prior? Several measures have been proposed for quantifying effective prior sample size, for example Clarke [1996] and Morita et al. [2008]. However, these measures typically ignore the likelihood for the inference currently at hand, and therefore address (i) rather than (ii). Since in practice (ii) is of great concern, Reimherr et al. [2014] introduced a new class of effective prior sample size measures based on prior-likelihood discordance. We take this idea further towards its natural Bayesian conclusion by proposing measures of effective prior sample size that not only incorporate the general mathematical form of the likelihood but also the specific data at hand. Thus, our measures do not average across datasets from the working model, but condition on the current observed data. Consequently, our measures can be highly variable, but we demonstrate that this is because the impact of a prior can be highly variable. Our measures are Bayes estimates of meaningful quantities and well communicate the extent to which inference is determined by the prior, or framed differently, the amount of effort saved due to having prior information. We illustrate our ideas through a number of examples including a Gaussian conjugate model (continuous observations), a Beta-Binomial model (discrete observations), and a linear regression model (two unknown parameters). Future work on further developments of the methodology and an application to astronomy are discussed at the end.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134494275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A likelihood analysis of quantile-matching transformations","authors":"P. McCullagh, M. Tresoldi","doi":"10.1093/biomet/asaa048","DOIUrl":"https://doi.org/10.1093/biomet/asaa048","url":null,"abstract":"Quantile matching is a strictly monotone transformation that sends the observed response values ${y_1, . . . , y_n}$ to the quantiles of a given target distribution. A likelihood based criterion is developed for comparing one target distribution with another in a linear-model setting.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"7 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132880871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FORECASTING MULTIPLE FUNCTIONAL TIME SERIES IN A GROUP STRUCTURE: AN APPLICATION TO MORTALITY","authors":"H. Shang, S. Haberman","doi":"10.1017/asb.2020.3","DOIUrl":"https://doi.org/10.1017/asb.2020.3","url":null,"abstract":"When modeling sub-national mortality rates, we should consider three features: (1) how to incorporate any possible correlation among sub-populations to potentially improve forecast accuracy through multi-population joint modeling; (2) how to reconcile sub-national mortality forecasts so that they aggregate adequately across various levels of a group structure; (3) among the forecast reconciliation methods, how to combine their forecasts to achieve improved forecast accuracy. To address these issues, we introduce an extension of grouped univariate functional time series method. We first consider a multivariate functional time series method to jointly forecast multiple related series. We then evaluate the impact and benefit of using forecast combinations among the forecast reconciliation methods. Using the Japanese regional age-specific mortality rates, we investigate one-step-ahead to 15-step-ahead point and interval forecast accuracies of our proposed extension and make recommendations.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133442635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
arXiv: MethodologyPub Date : 2019-12-29DOI: 10.7282/t3-7k3j-6x67 10.1002/sim.8217
E. Geis
{"title":"Stochastic approximation EM for exploratory item factor analysis","authors":"E. Geis","doi":"10.7282/t3-7k3j-6x67 10.1002/sim.8217","DOIUrl":"https://doi.org/10.7282/t3-7k3j-6x67 10.1002/sim.8217","url":null,"abstract":"The stochastic approximation EM algorithm (SAEM) is described for the estimation of item and person parameters given test data coded as dichotomous or ordinal variables. The method hinges upon the eigenanalysis of missing variables sampled as augmented data; the augmented data approach was introduced by Albert's seminal work applying Gibbs sampling to Item Response Theory in 1992. Similar to maximum likelihood factor analysis, the factor structure in this Bayesian approach depends only on sufficient statistics, which are computed from the missing latent data. A second feature of the SAEM algorithm is the use of the Robbins-Monro procedure for establishing convergence. Contrary to Expectation Maximization methods where costly integrals must be calculated, this method is well-suited for highly multidimensional data, and an annealing method is implemented to prevent convergence to a local maximum likelihood. Multiple calculations of errors applied within this framework of Markov Chain Monte Carlo are presented to delineate the uncertainty of parameter estimates. Given the nature of EFA (exploratory factor analysis), an algorithm is formalized leveraging the Tracy-Widom distribution for the retention of factors extracted from an eigenanalysis of the sufficient statistic of the covariance of the augmented data matrix. Simulation conditions of dichotomous and polytomous data, from one to ten dimensions of factor loadings, are used to assess statistical accuracy and to gauge computational time of the EFA approach of this IRT-specific implementation of the SAEM algorithm. Finally, three applications of this methodology are also reported that demonstrate the effectiveness of the method for enabling timely analyses as well as substantive interpretations when this method is applied to real data.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122227502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Symmetric Prior for Multinomial Probit Models","authors":"Lane F Burgette, David Puelz, P. R. Hahn","doi":"10.1214/20-ba1233","DOIUrl":"https://doi.org/10.1214/20-ba1233","url":null,"abstract":"Fitted probabilities from widely used Bayesian multinomial probit models can depend strongly on the choice of a base category, which is used to uniquely identify the parameters of the model. This paper proposes a novel identification strategy, and associated prior distribution for the model parameters, that renders the prior symmetric with respect to relabeling the outcome categories. The new prior permits an efficient Gibbs algorithm that samples rank-deficient covariance matrices without resorting to Metropolis-Hastings updates.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126465312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Nonparametric efficient causal mediation with intermediate confounders","authors":"Iv'an D'iaz, N. Hejazi, K. Rudolph, M. Laan","doi":"10.1093/biomet/asaa085","DOIUrl":"https://doi.org/10.1093/biomet/asaa085","url":null,"abstract":"Interventional effects for mediation analysis were proposed as a solution to the lack of identifiability of natural (in)direct effects in the presence of a mediator-outcome confounder affected by exposure. We present a theoretical and computational study of the properties of the interventional (in)direct effect estimands based on the efficient influence fucntion (EIF) in the non-parametric statistical model. We use the EIF to develop two asymptotically optimal, non-parametric estimators that leverage data-adaptive regression for estimation of the nuisance parameters: a one-step estimator and a targeted minimum loss estimator. A free and open source texttt{R} package implementing our proposed estimators is made available on GitHub. We further present results establishing the conditions under which these estimators are consistent, multiply robust, $n^{1/2}$-consistent and efficient. We illustrate the finite-sample performance of the estimators and corroborate our theoretical results in a simulation study. We also demonstrate the use of the estimators in our motivating application to elucidate the mechanisms behind the unintended harmful effects that a housing intervention had on adolescent girls' risk behavior.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132651244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhenyu Zhang, A. Nishimura, P. Bastide, X. Ji, R. Payne, P. Goulder, P. Lemey, M. Suchard
{"title":"Large-scale inference of correlation among mixed-type biological traits with phylogenetic multivariate probit models","authors":"Zhenyu Zhang, A. Nishimura, P. Bastide, X. Ji, R. Payne, P. Goulder, P. Lemey, M. Suchard","doi":"10.1214/20-AOAS1394","DOIUrl":"https://doi.org/10.1214/20-AOAS1394","url":null,"abstract":"Inferring concerted changes among biological traits along an evolutionary history remains an important yet challenging problem. Besides adjusting for spurious correlation induced from the shared history, the task also requires sufficient flexibility and computational efficiency to incorporate multiple continuous and discrete traits as data size increases. To accomplish this, we jointly model mixed-type traits by assuming latent parameters for binary outcome dimensions at the tips of an unknown tree informed by molecular sequences. This gives rise to a phylogenetic multivariate probit model. With large sample sizes, posterior computation under this model is problematic, as it requires repeated sampling from a high-dimensional truncated normal distribution. Current best practices employ multiple-try rejection sampling that suffers from slow-mixing and a computational cost that scales quadratically in sample size. We develop a new inference approach that exploits 1) the bouncy particle sampler (BPS) based on piecewise deterministic Markov processes to simultaneously sample all truncated normal dimensions, and 2) novel dynamic programming that reduces the cost of likelihood and gradient evaluations for BPS to linear in sample size. In an application with 535 HIV viruses and 24 traits that necessitates sampling from a 12,840-dimensional truncated normal, our method makes it possible to estimate the across-trait correlation and detect factors that affect the pathogen's capacity to cause disease. This inference framework is also applicable to a broader class of covariance structures beyond comparative biology.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121974911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. Masoero, F. Camerlenghi, S. Favaro, Tamara Broderick
{"title":"More for less: predicting and maximizing genomic variant discovery via Bayesian nonparametrics","authors":"L. Masoero, F. Camerlenghi, S. Favaro, Tamara Broderick","doi":"10.1093/BIOMET/ASAB012","DOIUrl":"https://doi.org/10.1093/BIOMET/ASAB012","url":null,"abstract":"While the cost of sequencing genomes has decreased dramatically in recent years, this expense often remains non-trivial. Under a fixed budget, then, scientists face a natural trade-off between quantity and quality; they can spend resources to sequence a greater number of genomes (quantity) or spend resources to sequence genomes with increased accuracy (quality). Our goal is to find the optimal allocation of resources between quantity and quality. Optimizing resource allocation promises to reveal as many new variations in the genome as possible, and thus as many new scientific insights as possible. In this paper, we consider the common setting where scientists have already conducted a pilot study to reveal variants in a genome and are contemplating a follow-up study. We introduce a Bayesian nonparametric methodology to predict the number of new variants in the follow-up study based on the pilot study. When experimental conditions are kept constant between the pilot and follow-up, we demonstrate on real data from the gnomAD project that our prediction is more accurate than three recent proposals, and competitive with a more classic proposal. Unlike existing methods, though, our method allows practitioners to change experimental conditions between the pilot and the follow-up. We demonstrate how this distinction allows our method to be used for (i) more realistic predictions and (ii) optimal allocation of a fixed budget between quality and quantity.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115013220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Minkowski Distances and Standardisation for Clustering and Classification on High-Dimensional Data","authors":"C. Hennig","doi":"10.1007/978-981-15-2700-5_6","DOIUrl":"https://doi.org/10.1007/978-981-15-2700-5_6","url":null,"abstract":"","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127923107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}