{"title":"Technical Validation of Plot Designs by Use of Deep Learning","authors":"Anne Helby Petersen, Claus Ekstrøm","doi":"10.1080/00031305.2023.2270649","DOIUrl":"https://doi.org/10.1080/00031305.2023.2270649","url":null,"abstract":"AbstractWhen does inspecting a certain graphical plot allow for an investigator to reach the right statistical conclusion? Visualizations are commonly used for various tasks in statistics – including model diagnostics and exploratory data analysis – and though attractive due to its intuitive nature, the lack of available methods for validating plots is a major drawback. We propose a new technical validation method for visual reasoning. Our method trains deep neural networks to distinguish between plots simulated under two different data generating mechanisms (null or alternative), and we use the classification accuracy as a technical validation score (TVS). The TVS measures the information content in the plots, and TVS values can be used to compare different plots or different choices of data generating mechanisms, thereby providing a meaningful scale that new visual reasoning procedures can be validated against. We apply the method to three popular diagnostic plots for linear regression, namely scatter plots, quantile-quantile plots and residual plots. We consider various types and degrees of misspecification, as well as different within-plot sample sizes. Our method produces TVSs that increase with increasing sample size and decrease with increasing difficulty, and hence the TVS is a meaningful measure of validity.Keywords: Deep learninggraphical inferencelinear regressionneural networkmodel diagnosticsvisualizationDisclaimerAs a service to authors and researchers we are providing this version of an accepted manuscript (AM). Copyediting, typesetting, and review of the resulting proofs will be undertaken on this manuscript before final publication of the Version of Record (VoR). During production and pre-press, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal relate to these versions also.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135854309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Phistogram","authors":"Adriana Verónica Blanc","doi":"10.1080/00031305.2023.2267639","DOIUrl":"https://doi.org/10.1080/00031305.2023.2267639","url":null,"abstract":"AbstractThis article introduces a new kind of histogram-based representation for univariate random variables, named the phistogram because of its perceptual qualities. The technique relies on shifted groupings of data, creating a color-gradient zone that evidences the uncertainty from smoothing and highlights sampling issues. In this way, the phistogram offers a deep and visually appealing perspective on the finite sample peculiarities, being capable of depicting the underlying distribution as well, thus becoming an useful complement to histograms and other statistical summaries. Although not limited to it, the present construction is derived from the equal-area histogram, a variant that differs conceptually from the traditional one. As such a distinction is not greatly emphasized in the literature, the graphical fundamentals are described in detail, and an alternative terminology is proposed to separate some concepts. Additionally, a compact notation is adopted to integrate the representation’s metadata into the graphic itself.Keywords: statistical graphicdata visualization toolperceptioncolor-gradient techniquesmoothing uncertaintyequal-area histogramDisclaimerAs a service to authors and researchers we are providing this version of an accepted manuscript (AM). Copyediting, typesetting, and review of the resulting proofs will be undertaken on this manuscript before final publication of the Version of Record (VoR). During production and pre-press, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal relate to these versions also.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135141198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Note on Monte Carlo Integration in High Dimensions","authors":"Yanbo Tang","doi":"10.1080/00031305.2023.2267637","DOIUrl":"https://doi.org/10.1080/00031305.2023.2267637","url":null,"abstract":"Monte Carlo integration is a commonly used technique to compute intractable integrals and is typically thought to perform poorly for very high-dimensional integrals. To show that this is not always the case, we examine Monte Carlo integration using techniques from the high-dimensional statistics literature by allowing the dimension of the integral to increase. In doing so, we derive non-asymptotic bounds for the relative and absolute error of the approximation for some general classes of functions through concentration inequalities. We provide concrete examples in which the magnitude of the number of points sampled needed to guarantee a consistent estimate varies between polynomial to exponential, and show that in theory arbitrarily fast or slow rates are possible. This demonstrates that the behaviour of Monte Carlo integration in high dimensions is not uniform. Through our methods we also obtain non-asymptotic confidence intervals which are valid regardless of the number of points sampled.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135141528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ambarish Chattopadhyay, Eric R. Cohn, José R. Zubizarreta
{"title":"One-step weighting to generalize and transport treatment effect estimates to a target population*","authors":"Ambarish Chattopadhyay, Eric R. Cohn, José R. Zubizarreta","doi":"10.1080/00031305.2023.2267598","DOIUrl":"https://doi.org/10.1080/00031305.2023.2267598","url":null,"abstract":"AbstractThe problems of generalization and transportation of treatment effect estimates from a study sample to a target population are central to empirical research and statistical methodology. In both randomized experiments and observational studies, weighting methods are often used with this objective. Traditional methods construct the weights by separately modeling the treatment assignment and study selection probabilities and then multiplying functions (e.g., inverses) of their estimates. In this work, we provide a justification and an implementation for weighting in a single step. We show a formal connection between this one-step method and inverse probability and inverse odds weighting. We demonstrate that the resulting estimator for the target average treatment effect is consistent, asymptotically Normal, multiply robust, and semiparametrically efficient. We evaluate the performance of the one-step estimator in a simulation study. We illustrate its use in a case study on the effects of physician racial diversity on preventive healthcare utilization among Black men in California. We provide R code implementing the methodology.Keywords: Causal inferenceGeneralizationTransportationRandomized experimentsObservational studiesWeighting methodsDisclaimerAs a service to authors and researchers we are providing this version of an accepted manuscript (AM). Copyediting, typesetting, and review of the resulting proofs will be undertaken on this manuscript before final publication of the Version of Record (VoR). During production and pre-press, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal relate to these versions also.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135141677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Causal quartets: Different ways to attain the same average treatment effect*","authors":"Andrew Gelman, Jessica Hullman, Lauren Kennedy","doi":"10.1080/00031305.2023.2267597","DOIUrl":"https://doi.org/10.1080/00031305.2023.2267597","url":null,"abstract":"AbstractThe average causal effect can often be best understood in the context of its variation. We demonstrate with two sets of four graphs, all of which represent the same average effect but with much different patterns of heterogeneity. As with the famous correlation quartet of Anscombe (1973), these graphs dramatize the way in which real-world variation can be more complex than simple numerical summaries. The graphs also give insight into why the average effect is often much smaller than anticipated.DisclaimerAs a service to authors and researchers we are providing this version of an accepted manuscript (AM). Copyediting, typesetting, and review of the resulting proofs will be undertaken on this manuscript before final publication of the Version of Record (VoR). During production and pre-press, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal relate to these versions also.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134975755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ANOVA and Mixed Models: A Short Introduction Using RLukas Meier, Boca Raton, FL: Chapman & Hall/CRC Press, 2023, xiv + 187 pp., $66.95(P), ISBN: 978-0-367-70420-9.","authors":"Brady T. West","doi":"10.1080/00031305.2023.2261817","DOIUrl":"https://doi.org/10.1080/00031305.2023.2261817","url":null,"abstract":"","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135902691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Missing data imputation with high-dimensional data","authors":"Alberto Brini, Edwin R. van den Heuvel","doi":"10.1080/00031305.2023.2259962","DOIUrl":"https://doi.org/10.1080/00031305.2023.2259962","url":null,"abstract":"AbstractImputation of missing data in high-dimensional datasets with more variables P than samples N, P≫N, is hampered by the data dimensionality. For multivariate imputation, the covariance matrix is ill conditioned and cannot be properly estimated. For fully conditional imputation, the regression models for imputation cannot include all the variables. Thus, the high dimension requires special imputation approaches. In this paper, we provide an overview and realistic comparisons of imputation approaches for high-dimensional data when applied to a linear mixed modelling (LMM) framework. We examine approaches from three different classes using simulation studies: multiple imputation with penalized regression, multiple imputation with recursive partitioning and predictive mean matching and multiple imputation with Principal Component Analysis (PCA). We illustrate the methods on a real case study where a multivariate outcome, i.e., an extracted set of correlated biomarkers from human urine samples, was collected and monitored over time and we discuss the proposed methods with more standard imputation techniques that could be applied by ignoring either the multivariate or the longitudinal dimension. Our simulations demonstrate the superiority of the recursive partitioning and predictive mean matching algorithm over the other methods in terms of bias, mean squared error and coverage of the LMM parameter estimates when compared to those obtained from a data analysis without missingness, although it comes at the expense of high computational costs. It is worthwhile reconsidering much faster methodologies like the one relying on PCA.Keywords: high-dimensional datalongitudinal datalinear mixed modelsmissing datamultiple imputationprincipal component analysispenalized regressionrecursive partitioningDisclaimerAs a service to authors and researchers we are providing this version of an accepted manuscript (AM). Copyediting, typesetting, and review of the resulting proofs will be undertaken on this manuscript before final publication of the Version of Record (VoR). During production and pre-press, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal relate to these versions also.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135829922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A First Course in Linear Model Theory, 2nd ed.Nalini Ravishanker, Zhiyi Chi, and Dipak K. Dey, Boca Raton, FL: Chapman & Hall/CRC Press, 2022, xvi + 513 pp., $110.00(H), ISBN: 978-1-439-85805-9.","authors":"Carlos Cinelli","doi":"10.1080/00031305.2023.2261819","DOIUrl":"https://doi.org/10.1080/00031305.2023.2261819","url":null,"abstract":"","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135902689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian Modeling and Computation in PythonOsvaldo A. Martin, Ravin Kumar, and Junpeng Lao, Boca Raton, FL: Chapman & Hall/CRC Press, 2022, xxii + 398 pp., $99.95(H), ISBN: 978-0-367-89436-8.","authors":"P. Richard Hahn","doi":"10.1080/00031305.2023.2261818","DOIUrl":"https://doi.org/10.1080/00031305.2023.2261818","url":null,"abstract":"","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135902687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Application of the Likelihood Ratio Test and the Cochran-Mantel-Haenszel Test to Discrimination Cases","authors":"Weiwen Miao, Joseph L. Gastwirth","doi":"10.1080/00031305.2023.2259969","DOIUrl":"https://doi.org/10.1080/00031305.2023.2259969","url":null,"abstract":"ABSTRACTIn practice, the ultimate outcome of many important discrimination cases, e.g. the Wal-Mart, Nike and Goldman-Sachs equal pay cases, is determined at the stage when the plaintiffs request that the case be certified as a class action. The primary statistical issue at this time is whether the employment practice in question leads to a common pattern of outcomes disadvantaging most plaintiffs. However, there are no formal procedures or government guidelines for checking whether an employment practice results in a common pattern of disparity. This paper proposes using the slightly modified likelihood ratio test and the one-sided Cochran-Mantel-Haenszel (CMH) test to examine data relevant to deciding whether this commonality requirement is satisfied. Data considered at the class certification stage from several actual cases are analyzed by the proposed procedures. The results often show that the employment practice at issue created a common pattern of disparity, however, based on the evidence presented to the courts, the class action requests were denied.KEYWORDS: Class actionCochran-Mantel-Haenszel testDisparate impactEmployment discriminationLikelihood ratio testStratified dataDisclaimerAs a service to authors and researchers we are providing this version of an accepted manuscript (AM). Copyediting, typesetting, and review of the resulting proofs will be undertaken on this manuscript before final publication of the Version of Record (VoR). During production and pre-press, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal relate to these versions also.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135394720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}