{"title":"Are You All Normal? It Depends!","authors":"Wanfang Chen, Marc G. Genton","doi":"10.1111/insr.12512","DOIUrl":"10.1111/insr.12512","url":null,"abstract":"<div>\u0000 \u0000 <p>The assumption of normality has underlain much of the development of statistics, including spatial statistics, and many tests have been proposed. In this work, we focus on the multivariate setting and first review the recent advances in multivariate normality tests for i.i.d. data, with emphasis on the skewness and kurtosis approaches. We show through simulation studies that some of these tests cannot be used directly for testing normality of spatial data. We further review briefly the few existing univariate tests under dependence (time or space), and then propose a new multivariate normality test for spatial data by accounting for the spatial dependence. The new test utilises the union-intersection principle to decompose the null hypothesis into intersections of univariate normality hypotheses for projection data, and it rejects the multivariate normality if any individual hypothesis is rejected. The individual hypotheses for univariate normality are conducted using a Jarque–Bera type test statistic that accounts for the spatial dependence in the data. We also show in simulation studies that the new test has a good control of the type I error and a high empirical power, especially for large sample sizes. We further illustrate our test on bivariate wind data over the Arabian Peninsula.</p>\u0000 </div>","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2022-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48771273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
James H. McVittie, Ana F. Best, David B. Wolfson, David A. Stephens, Julian Wolfson, David L. Buckeridge, Shahinaz M. Gadalla
{"title":"Survival Modelling for Data From Combined Cohorts: Opening the Door to Meta Survival Analyses and Survival Analysis Using Electronic Health Records","authors":"James H. McVittie, Ana F. Best, David B. Wolfson, David A. Stephens, Julian Wolfson, David L. Buckeridge, Shahinaz M. Gadalla","doi":"10.1111/insr.12510","DOIUrl":"10.1111/insr.12510","url":null,"abstract":"<div>\u0000 \u0000 <p>Non-parametric estimation of the survival function using observed failure time data depends on the underlying data generating mechanism, including the ways in which the data may be censored and/or truncated. For data arising from a single source or collected from a single cohort, a wide range of estimators have been proposed and compared in the literature. Often, however, it may be possible, and indeed advantageous, to combine and then analyse survival data that have been collected under different study designs. We review non-parametric survival analysis for data obtained by combining the most common types of cohort. We have two main goals: (i) to clarify the differences in the model assumptions and (ii) to provide a single lens through which some of the proposed estimators may be viewed. Our discussion is relevant to the meta-analysis of survival data obtained from different types of study, and to the modern era of electronic health records.</p>\u0000 </div>","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2022-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/insr.12510","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9490735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scalable Bayesian Multiple Changepoint Detection via Auxiliary Uniformisation","authors":"Lu Shaochuan","doi":"10.1111/insr.12511","DOIUrl":"10.1111/insr.12511","url":null,"abstract":"<div>\u0000 \u0000 <p>In this paper, we perform a sparse filtering recursion for efficient changepoint detection for discrete-time observations. We attach auxiliary event times to the chronologically ordered observations and formulate multiple changepoint problems of discrete-time observations into continuous-time observations. Ideally, both the computational and memory costs of the proposed auxiliary uniformisation forward-filtering backward-sampling algorithm can be quadratically scaled down to the number of changepoints instead of the number of observations, which would otherwise be prohibitive for a long sequence of observations. To avoid model bias, a time-varying changepoint recurrence rate across different segments is assumed to characterise diverse scales of run lengths of the changepoints. We demonstrate the methods through simulation studies and real data analysis.</p>\u0000 </div>","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2022-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46806947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Diagnostic Tests for the Necessity of Weight in Regression With Survey Data","authors":"Feng Wang, HaiYing Wang, Jun Yan","doi":"10.1111/insr.12509","DOIUrl":"10.1111/insr.12509","url":null,"abstract":"<div>\u0000 \u0000 <p>To weight or not to weight in regression analyses with survey data has been debated in the literature. The problem is essentially a tradeoff between the bias and the variance of the regression coefficient estimator. An array of diagnostic tests for informative weights have been developed. Nonetheless, studies comparing the performance of the tests, especially for finite samples, are scarce, and the theoretical equivalence of some tests has not been investigated. Focusing on the linear regression setting, we review a collection of such tests and propose enhanced versions of some of them that require an auxiliary regression model for the weight. Further, the equivalence of two popular tests is established which has not been reported before. In contrast to existing reviews with no empirical comparison, we compare the sizes and powers of the tests in simulation studies. The reviewed tests are applied to a regression analysis of the family expenditure using the data from the China Family Panel Study.</p>\u0000 </div>","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2022-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42527094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"From Pareto to Weibull – A Constructive Review of Distributions on ℝ+","authors":"Corinne Sinner, Yves Dominicy, Julien Trufin, Wout Waterschoot, Patrick Weber, Christophe Ley","doi":"10.1111/insr.12508","DOIUrl":"10.1111/insr.12508","url":null,"abstract":"<div>\u0000 \u0000 <p>Power laws and power laws with exponential cut-off are two distinct families of distributions on the positive real half-line. In the present paper, we propose a unified treatment of both families by building a family of distributions that interpolates between them, which we call Interpolating Family (IF) of distributions. Our original construction, which relies on techniques from statistical physics, provides a connection for hitherto unrelated distributions like the Pareto and Weibull distributions, and sheds new light on them. The IF also contains several distributions that are neither of power law nor of power law with exponential cut-off type. We calculate quantile-based properties, moments and modes for the IF. This allows us to review known properties of famous distributions on \u0000<math>\u0000 <msup>\u0000 <mrow>\u0000 <mi>ℝ</mi>\u0000 </mrow>\u0000 <mrow>\u0000 <mo>+</mo>\u0000 </mrow>\u0000 </msup></math> and to provide in a single sweep these characteristics for various less known (and new) special cases of our Interpolating Family.</p>\u0000 </div>","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46009017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using Survey Sampling Algorithms For Exact Inference in Logistic Regression","authors":"Louis-Paul Rivest, Serigne Abib Gaye","doi":"10.1111/insr.12507","DOIUrl":"10.1111/insr.12507","url":null,"abstract":"<div>\u0000 \u0000 <p>Several exact inference procedures for logistic regression require the simulation of a 0-1 dependent vector according to its conditional distribution, given the sufficient statistics for some nuisance parameters. This is viewed, in this work, as a sampling problem involving a population of \u0000<math>\u0000 <mi>n</mi></math> units, unequal selection probabilities and balancing constraints. The basis for this reformulation of exact inference is a proposition deriving the limit, as \u0000<math>\u0000 <mi>n</mi></math> goes to infinity, of the conditional distribution of the dependent vector given the logistic regression sufficient statistics. It is proposed to sample from this distribution using the cube sampling algorithm. The interest of this approach to exact inference is illustrated by tackling new problems. First it allows to carry out exact inference with continuous covariates. It is also useful for the investigation of a partial correlation between several 0-1 vectors. This is illustrated in an example dealing with presence-absence data in ecology.</p>\u0000 </div>","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2022-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42562594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Some Solutions Inspired by Survey Sampling Theory to Build Effective Clinical Trials","authors":"Yves Tillé","doi":"10.1111/insr.12498","DOIUrl":"https://doi.org/10.1111/insr.12498","url":null,"abstract":"The organisation of a design of experiments, for example, for the realisation of a clinical trial, is crucial. It is often desirable to balance designs so that the means of the covariates are approximately the same in the test and control groups. In survey sampling theory, balanced sampling and calibration are two techniques that improve the precision of estimates. In this paper, we show the links between the two areas. We begin by assessing the gain in precision between a balanced design and a simple random sampling for the least squares estimators and the estimator by differences. We compare rerandomisation techniques and the cube method in order to balance the design. We propose a new method, particularly efficient, which combines the cube method with multivariate matching. A set of simulations is carried out in order to evaluate the different methods. The interest of the calibration is shown even if the design is almost balanced. It is thus shown that tools used by survey statisticians can be useful for experimental designs and clinical trials.","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2022-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42827426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nengfeng Zhou, Zach Zhang, Vijayan N. Nair, Harsh Singhal, Jie Chen
{"title":"Bias, Fairness and Accountability with Artificial Intelligence and Machine Learning Algorithms","authors":"Nengfeng Zhou, Zach Zhang, Vijayan N. Nair, Harsh Singhal, Jie Chen","doi":"10.1111/insr.12492","DOIUrl":"10.1111/insr.12492","url":null,"abstract":"<div>\u0000 \u0000 <p>The advent of artificial intelligence (AI) and machine learning algorithms has led to opportunities as well as challenges in their use. In this overview paper, we begin with a discussion of bias and fairness issues that arise with the use of AI techniques, with a focus on supervised machine learning algorithms. We then describe the types and sources of data bias and discuss the nature of algorithmic unfairness. In addition, we provide a review of fairness metrics in the literature, discuss their limitations, and describe de-biasing (or mitigation) techniques in the model life cycle.</p>\u0000 </div>","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2022-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45085727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Communicating with Data: The Art of Writing for Data Science Deborah Nolan and Sara Stoudt Oxford University Press, 2021, vii + 331 pages, $45.95, paperback ISBN: 978-0-1988-6275-8","authors":"Kelly McConville","doi":"10.1111/insr.12496","DOIUrl":"10.1111/insr.12496","url":null,"abstract":"","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2022-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44294305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}